Age | Commit message (Collapse) | Author |
|
Tcmu populates the data area (used for communication with userspace) with
pages that are allocated by calling alloc_page(GFP_NOIO). Therefore
previous content of the allocated pages is exposed to user space. Avoid
this by adding __GFP_ZERO flag.
Zeroing the pages does (nearly) not affect tcmu throughput, because
allocated pages are re-used for the data transfers of later SCSI cmds.
Link: https://lore.kernel.org/r/20211013171606.25197-1-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Make use of the struct_size() helper instead of an open-coded version, in
order to avoid any potential type mistakes or integer overflows that, in
the worst scenario, could lead to heap overflows.
Link: https://github.com/KSPP/linux/issues/160
Link: https://lore.kernel.org/r/20210927224344.GA190701@embeddedor
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When running command pipelining for WRITE direction commands (e.g. tape
device write), userspace sends cmd completion to cmd ring before processing
write data. In that case userspace has to copy data before sending
completion, because cmd completion also implicitly releases the data buffer
in data area.
The new feature KEEP_BUF allows userspace to optionally keep the buffer
after completion by setting new bit TCMU_UFLAG_KEEP_BUF in
tcmu_cmd_entry_hdr->uflags. In that case buffer has to be released
explicitly by writing the cmd_id to new action item free_kept_buf.
All kept buffers are released during reset_ring and if userspace closes uio
device (tcmu_release).
Link: https://lore.kernel.org/r/20210713175021.20103-1-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers.
The major core change is a rework to drop the status byte handling
macros and the old bit shifted definitions and the rest of the updates
are minor fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits)
scsi: aha1740: Avoid over-read of sense buffer
scsi: arcmsr: Avoid over-read of sense buffer
scsi: ips: Avoid over-read of sense buffer
scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe()
scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame()
scsi: elx: libefc: Fix less than zero comparison of a unsigned int
scsi: elx: efct: Fix pointer error checking in debugfs init
scsi: elx: efct: Fix is_originator return code type
scsi: elx: efct: Fix link error for _bad_cmpxchg
scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel()
scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session()
scsi: elx: efct: Fix error handling in efct_hw_init()
scsi: elx: efct: Remove redundant initialization of variable lun
scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected"
scsi: lpfc: Fix build error in lpfc_scsi.c
scsi: target: iscsi: Remove redundant continue statement
scsi: qla4xxx: Remove redundant continue statement
scsi: ppa: Switch to use module_parport_driver()
scsi: imm: Switch to use module_parport_driver()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
...
|
|
drivers/target/target_core_user.c:1424:9-10: WARNING: return of 0/1 in function 'tcmu_handle_completions' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Link: https://lore.kernel.org/r/20210515230358.GA97544@60d1edce16e0
Fixes: 9814b55cde05 ("scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found")
CC: Bodo Stroesser <bostroesser@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit f5ce815f34bc ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N *
PAGE_SIZE") introduced xas_next() calls to iterate xarray elements. These
calls triggered the WARNING "suspicious RCU usage" at tcmu device set up
[1]. In the call stack of xas_next(), xas_load() was called. According to
its comment, this function requires "the xa_lock or the RCU lock".
To avoid the warning:
- Guard the small loop calling xas_next() in tcmu_get_empty_block with RCU
lock.
- In the large loop in tcmu_copy_data using RCU lock would possibly
disable preemtion for a long time (copy multi MBs). Therefore replace
XA_STATE, xas_set and xas_next with a single xa_load.
[1]
[ 1899.867091] =============================
[ 1899.871199] WARNING: suspicious RCU usage
[ 1899.875310] 5.13.0-rc1+ #41 Not tainted
[ 1899.879222] -----------------------------
[ 1899.883299] include/linux/xarray.h:1182 suspicious rcu_dereference_check() usage!
[ 1899.890940] other info that might help us debug this:
[ 1899.899082] rcu_scheduler_active = 2, debug_locks = 1
[ 1899.905719] 3 locks held by kworker/0:1/1368:
[ 1899.910161] #0: ffffa1f8c8b98738 ((wq_completion)target_submission){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580
[ 1899.920732] #1: ffffbd7040cd7e78 ((work_completion)(&q->sq.work)){+.+.}-{0:0}, at: process_one_work+0x1ee/0x580
[ 1899.931146] #2: ffffa1f8d1c99768 (&udev->cmdr_lock){+.+.}-{3:3}, at: tcmu_queue_cmd+0xea/0x160 [target_core_user]
[ 1899.941678] stack backtrace:
[ 1899.946093] CPU: 0 PID: 1368 Comm: kworker/0:1 Not tainted 5.13.0-rc1+ #41
[ 1899.953070] Hardware name: System manufacturer System Product Name/PRIME Z270-A, BIOS 1302 03/15/2018
[ 1899.962459] Workqueue: target_submission target_queued_submit_work [target_core_mod]
[ 1899.970337] Call Trace:
[ 1899.972839] dump_stack+0x6d/0x89
[ 1899.976222] xas_descend+0x10e/0x120
[ 1899.979875] xas_load+0x39/0x50
[ 1899.983077] tcmu_get_empty_blocks+0x115/0x1c0 [target_core_user]
[ 1899.989318] queue_cmd_ring+0x1da/0x630 [target_core_user]
[ 1899.994897] ? rcu_read_lock_sched_held+0x3f/0x70
[ 1899.999695] ? trace_kmalloc+0xa6/0xd0
[ 1900.003501] ? __kmalloc+0x205/0x380
[ 1900.007167] tcmu_queue_cmd+0x12f/0x160 [target_core_user]
[ 1900.012746] __target_execute_cmd+0x23/0xa0 [target_core_mod]
[ 1900.018589] transport_generic_new_cmd+0x1f3/0x370 [target_core_mod]
[ 1900.025046] transport_handle_cdb_direct+0x34/0x50 [target_core_mod]
[ 1900.031517] target_queued_submit_work+0x43/0xe0 [target_core_mod]
[ 1900.037837] process_one_work+0x268/0x580
[ 1900.041952] ? process_one_work+0x580/0x580
[ 1900.046195] worker_thread+0x55/0x3b0
[ 1900.049921] ? process_one_work+0x580/0x580
[ 1900.054192] kthread+0x143/0x160
[ 1900.057499] ? kthread_create_worker_on_cpu+0x40/0x40
[ 1900.062661] ret_from_fork+0x1f/0x30
Link: https://lore.kernel.org/r/20210519135440.26773-1-bostroesser@gmail.com
Fixes: f5ce815f34bc ("scsi: target: tcmu: Support DATA_BLOCK_SIZE = N * PAGE_SIZE")
Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The bit definition TCM_DEV_BIT_PLUGGED should correctly be named
TCMU_DEV_BIT_PLUGGED, since all other bits in the same bitfield have prefix
TCMU_.
Link: https://lore.kernel.org/r/20210512140654.31249-1-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If tcmu_handle_completions() finds an invalid cmd_id while looping over cmd
responses from userspace it sets TCMU_DEV_BIT_BROKEN and breaks the
loop. This means that it does further handling for the tcmu device.
Skip that handling by replacing 'break' with 'return'.
Additionally change tcmu_handle_completions() from unsigned int to bool,
since the value used in return already is bool.
Link: https://lore.kernel.org/r/20210423150123.24468-1-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Make data_pages_per_blk changeable similar to the way it is done for
max_data_area_mb. One can change the value by typing:
echo "data_pages_per_blk=N" >control
The value is printed when doing:
cat info
In addition, a new readonly attribute 'data_pages_per_blk' returns the
value on read.
Link: https://lore.kernel.org/r/20210324195758.2021-7-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replace DATA_PAGES_PER_BLK and DATA_BLOCK_SIZE with new struct elements
tcmu_dev->data_pages_per_blk and tcmu_dev->data_blk_size. These new
variables are still loaded with constant definition DATA_PAGES_PER_BLK_DEF
(= 1) and DATA_PAGES_PER_BLK_DEF * PAGE_SIZE.
There is no way yet to set the values via configfs.
Link: https://lore.kernel.org/r/20210324195758.2021-6-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is only one caller of tcmu_get_block_page left. Since it is a
one-liner, we can remove the function.
Link: https://lore.kernel.org/r/20210324195758.2021-5-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Change tcmu to support DATA_BLOCK_SIZE being a multiple of PAGE_SIZE. There
are two reasons why one would like to have a bigger DATA_BLOCK_SIZE:
1) If userspace - e.g. due to data compression, encryption or
deduplication - needs to have receive or transmit data in a consecutive
buffer, we can define DATA_BLOCK_SIZE to the maximum size of a SCSI
READ/WRITE to enforce that userspace sees just one consecutive
buffer. That way we can avoid the need for doing data copy in
userspace.
2) Using a bigger data block size can speed up command processing in
tcmu. The number of free data blocks to look up in bitmap is reduced
substantially. The lookup for data pages in radix_tree can be done more
efficiently if there are multiple pages in a data block. The maximum
number of IOVs to set up is lower so cmd entries in the ring become
smaller.
Link: https://lore.kernel.org/r/20210324195758.2021-4-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Rename some variables and definitions as a first preparation for
DATA_BLOCK_SIZE != PAGE_SIZE and add the new DATA_PAGES_PER_BLK definition
containing the number of pages per data block.
Rename tcmu_try_get_block_page() to tcmu_try_get_data_page(). Keep name
tcmu_get_block_page() since it will go away in a following commit when
there is only one caller left. Subsequent commits will then add full
support for DATA_PAGES_PER_BLK != 1, which also means DATA_BLOCK_SIZE =
DATA_PAGES_PER_BLK * PAGE_SIZE
Link: https://lore.kernel.org/r/20210324195758.2021-3-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Some definitions and members of struct tcmu_dev had misleading
names. Examples:
- ring_size was used for the size of mailbox + cmd ring + data area
- CMDR_SIZE was used for size of mailbox + cmd ring
I added the new definition MB_CMDR_SIZE (mailbox + command ring), changed
CMDR_SIZE to hold the size of the command ring only and replaced in struct
tcmu_dev the member ring_size with mmap_pages, because the member is now
used in tcmu_mmap() only, where we need page count, not size.
I also added the new struct tcmu_dev member 'cmdr' which is used to replace
some occurences of '(void *)mb + CMDR_OFF' with 'udev->cmdr' for better
readability.
Link: https://lore.kernel.org/r/20210324195758.2021-2-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In commit f7c89771d07d ("scsi: target: tcmu: Replace radix_tree with
XArray") the meaning of last parameter of tcmu_blocks_release() was
changed. So in the callers we should subtract 1 from the previous
parameter.
Unfortunately that change got lost at one of the two places where
tcmu_blocks_release() is called. That does not lead to any problems, but we
should adjust it anyway.
Link: https://lore.kernel.org/r/20210310184458.10741-1-bostroesser@gmail.com
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Especially when using tcmu with tcm_loop, memory allocations with
GFP_KERNEL for a LUN can cause write back to the same LUN.
So we have to use GFP_NOIO when allocation is done while handling commands
or while holding cmdr_lock.
Link: https://lore.kernel.org/r/20210305190009.32242-1-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
An attempt from Matthew Wilcox to replace radix-tree usage by XArray in
tcmu more than 1 year ago unfortunately got lost.
I rebased that work on latest tcmu and tested it.
Link: https://lore.kernel.org/r/20210224185335.13844-3-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
An attempt from Matthew Wilcox to replace IDR usage by XArray in tcmu more
than 1 year ago unfortunately got lost.
I rebased that work on latest tcmu and tested it.
Link: https://lore.kernel.org/r/20210224185335.13844-2-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch adds plug/unplug callouts for tcmu, so we can avoid the number
of times we switch to userspace. Using this driver with tcm_loop is a
common config, and dependng on the nr_hw_queues (nr_hw_queues=1 performs
much better) and fio jobs (lower num jobs around 4) this patch can increase
IOPS by only around 5-10% because we hit other issues like the big per tcmu
device mutex.
Link: https://lore.kernel.org/r/20210227170006.5077-24-michael.christie@oracle.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When user deletes a tcmu device via configFS, tcmu calls
uio_unregister_device(). During that call uio resets its pointer to struct
uio_info provided by tcmu. That means, after uio_unregister_device() uio
will no longer execute any of the callbacks tcmu had set in uio_info.
Especially, if userspace daemon still holds the corresponding uio device
open or mmap'ed while tcmu calls uio_unregister_device(), uio will not call
tcmu_release() when userspace finally closes and munmaps the uio device.
Since tcmu does refcounting for the tcmu device in tcmu_open() and
tcmu_release(), in the decribed case refcount does not drop to 0 and tcmu
does not free tcmu device's resources. In extreme cases this can cause
memory leaking of up to 1 GB for a single tcmu device.
After uio_unregister_device(), uio will reject every open, read, write,
mmap from userspace with -EOI. But userspace daemon can still access the
mmap'ed command ring and data area. Therefore tcmu should wait until
userspace munmaps the uio device before it frees the resources, as we don't
want to cause SIGSEGV or SIGBUS to user space.
That said, current refcounting during tcmu_open and tcmu_release does not
work correctly, and refcounting better should be done in the open and close
callouts of the vm_operations_struct, which tcmu assigns to each mmap of
the uio device (because it wants its own page fault handler).
This patch fixes the memory leak by removing refcounting from tcmu_open and
tcmu_close, and instead adding new tcmu_vma_open() and tcmu_vma_close()
handlers that only do refcounting.
Link: https://lore.kernel.org/r/20210218175039.7829-3-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch just moves one block of code containing some functions inside
target_core_user.c to avoid adding prototypes in next patch.
Link: https://lore.kernel.org/r/20210218175039.7829-2-bostroesser@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit a35129024e88 ("scsi: target: tcmu: Use priv pointer in se_cmd")
modified tcmu_free_cmd() to set NULL to priv pointer in se_cmd. However,
se_cmd can be already freed by work queue triggered in
target_complete_cmd(). This caused BUG KASAN use-after-free [1].
To fix the bug, do not touch priv pointer in tcmu_free_cmd(). Instead, set
NULL to priv pointer before target_complete_cmd() calls. Also, to avoid
unnecessary priv pointer change in tcmu_queue_cmd(), modify priv pointer in
the function only when tcmu_free_cmd() is not called.
[1]
BUG: KASAN: use-after-free in tcmu_handle_completions+0x1172/0x1770 [target_core_user]
Write of size 8 at addr ffff88814cf79a40 by task cmdproc-uio0/14842
CPU: 2 PID: 14842 Comm: cmdproc-uio0 Not tainted 5.11.0-rc2 #1
Hardware name: Supermicro Super Server/X10SRL-F, BIOS 3.2 11/22/2019
Call Trace:
dump_stack+0x9a/0xcc
? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
print_address_description.constprop.0+0x18/0x130
? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
kasan_report.cold+0x7f/0x10e
? tcmu_handle_completions+0x1172/0x1770 [target_core_user]
tcmu_handle_completions+0x1172/0x1770 [target_core_user]
? queue_tmr_ring+0x5d0/0x5d0 [target_core_user]
tcmu_irqcontrol+0x28/0x60 [target_core_user]
uio_write+0x155/0x230
? uio_vma_fault+0x460/0x460
? security_file_permission+0x4f/0x440
vfs_write+0x1ce/0x860
ksys_write+0xe9/0x1b0
? __ia32_sys_read+0xb0/0xb0
? syscall_enter_from_user_mode+0x27/0x70
? trace_hardirqs_on+0x1c/0x110
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fcf8b61905f
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 b9 fc ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 0c fd ff ff 48
RSP: 002b:00007fcf7b3e6c30 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcf8b61905f
RDX: 0000000000000004 RSI: 00007fcf7b3e6c78 RDI: 000000000000000c
RBP: 00007fcf7b3e6c80 R08: 0000000000000000 R09: 00007fcf7b3e6aa8
R10: 000000000b01c000 R11: 0000000000000293 R12: 00007ffe0c32a52e
R13: 00007ffe0c32a52f R14: 0000000000000000 R15: 00007fcf7b3e7640
Allocated by task 383:
kasan_save_stack+0x1b/0x40
____kasan_kmalloc.constprop.0+0x84/0xa0
kmem_cache_alloc+0x142/0x330
tcm_loop_queuecommand+0x2a/0x4e0 [tcm_loop]
scsi_queue_rq+0x12ec/0x2d20
blk_mq_dispatch_rq_list+0x30a/0x1db0
__blk_mq_do_dispatch_sched+0x326/0x830
__blk_mq_sched_dispatch_requests+0x2c8/0x3f0
blk_mq_sched_dispatch_requests+0xca/0x120
__blk_mq_run_hw_queue+0x93/0xe0
process_one_work+0x7b6/0x1290
worker_thread+0x590/0xf80
kthread+0x362/0x430
ret_from_fork+0x22/0x30
Freed by task 11655:
kasan_save_stack+0x1b/0x40
kasan_set_track+0x1c/0x30
kasan_set_free_info+0x20/0x30
____kasan_slab_free+0xec/0x120
slab_free_freelist_hook+0x53/0x160
kmem_cache_free+0xf4/0x5c0
target_release_cmd_kref+0x3ea/0x9e0 [target_core_mod]
transport_generic_free_cmd+0x28b/0x2f0 [target_core_mod]
target_complete_ok_work+0x250/0xac0 [target_core_mod]
process_one_work+0x7b6/0x1290
worker_thread+0x590/0xf80
kthread+0x362/0x430
ret_from_fork+0x22/0x30
Last potentially related work creation:
kasan_save_stack+0x1b/0x40
kasan_record_aux_stack+0xa3/0xb0
insert_work+0x48/0x2e0
__queue_work+0x4e8/0xdf0
queue_work_on+0x78/0x80
tcmu_handle_completions+0xad0/0x1770 [target_core_user]
tcmu_irqcontrol+0x28/0x60 [target_core_user]
uio_write+0x155/0x230
vfs_write+0x1ce/0x860
ksys_write+0xe9/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Second to last potentially related work creation:
kasan_save_stack+0x1b/0x40
kasan_record_aux_stack+0xa3/0xb0
insert_work+0x48/0x2e0
__queue_work+0x4e8/0xdf0
queue_work_on+0x78/0x80
tcm_loop_queuecommand+0x1c3/0x4e0 [tcm_loop]
scsi_queue_rq+0x12ec/0x2d20
blk_mq_dispatch_rq_list+0x30a/0x1db0
__blk_mq_do_dispatch_sched+0x326/0x830
__blk_mq_sched_dispatch_requests+0x2c8/0x3f0
blk_mq_sched_dispatch_requests+0xca/0x120
__blk_mq_run_hw_queue+0x93/0xe0
process_one_work+0x7b6/0x1290
worker_thread+0x590/0xf80
kthread+0x362/0x430
ret_from_fork+0x22/0x30
The buggy address belongs to the object at ffff88814cf79800 which belongs
to the cache tcm_loop_cmd_cache of size 896.
Link: https://lore.kernel.org/r/20210113024508.1264992-1-shinichiro.kawasaki@wdc.com
Fixes: a35129024e88 ("scsi: target: tcmu: Use priv pointer in se_cmd")
Cc: stable@vger.kernel.org # v5.9+
Acked-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull SCSI updates from James Bottomley:
"This consists of the usual driver updates (ufs, qla2xxx, smartpqi,
target, zfcp, fnic, mpt3sas, ibmvfc) plus a load of cleanups, a major
power management rework and a load of assorted minor updates.
There are a few core updates (formatting fixes being the big one) but
nothing major this cycle"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: mpt3sas: Update driver version to 36.100.00.00
scsi: mpt3sas: Handle trigger page after firmware update
scsi: mpt3sas: Add persistent MPI trigger page
scsi: mpt3sas: Add persistent SCSI sense trigger page
scsi: mpt3sas: Add persistent Event trigger page
scsi: mpt3sas: Add persistent Master trigger page
scsi: mpt3sas: Add persistent trigger pages support
scsi: mpt3sas: Sync time periodically between driver and firmware
scsi: qla2xxx: Update version to 10.02.00.104-k
scsi: qla2xxx: Fix device loss on 4G and older HBAs
scsi: qla2xxx: If fcport is undergoing deletion complete I/O with retry
scsi: qla2xxx: Fix the call trace for flush workqueue
scsi: qla2xxx: Fix flash update in 28XX adapters on big endian machines
scsi: qla2xxx: Handle aborts correctly for port undergoing deletion
scsi: qla2xxx: Fix N2N and NVMe connect retry failure
scsi: qla2xxx: Fix FW initialization error on big endian machines
scsi: qla2xxx: Fix crash during driver load on big endian machines
scsi: qla2xxx: Fix compilation issue in PPC systems
scsi: qla2xxx: Don't check for fw_started while posting NVMe command
scsi: qla2xxx: Tear down session if FW say it is down
...
|
|
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
|
scatter_data_area() and gather_data_area() are not easy to understand since
data is copied in nested loops over sg_list and tcmu dbi list. Since sg
list can contain only partly filled pages, the loop has to be prepared to
handle sg pages not matching dbi pages one by one.
Existing implementation uses kmap_atomic()/kunmap_atomic() due to
performance reasons. But instead of using these calls strictly nested for
sg and dpi pages, the code holds the mappings in an overlapping way, which
indeed is a bug that would trigger on archs using highmem.
The scatterlist lib contains the sg_miter_start/_next/_stop functions which
can be used to simplify such complicated loops.
The new code now processes the dbi list in the outer loop, while sg list is
handled by the inner one. That way the code can take advantage of the
sg_miter_* family calls.
Calling sg_miter_stop() after the end of the inner loop enforces strict
nesting of atomic kmaps.
Since the nested loops in scatter_/gather_data_area were very similar, I
replaced them by the new helper function tcmu_copy_data().
Link: https://lore.kernel.org/r/20201019115118.11949-1-bostroesser@gmail.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
|
|
Bulk of the genetlink users can use smaller ops, move them.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Corrects drivers/target/target_core_user.c:688:6: warning: 'page' may be
used uninitialized.
Link: https://lore.kernel.org/r/20200924001920.43594-1-john.p.donnelly@oracle.com
Fixes: 3c58f737231e ("scsi: target: tcmu: Optimize use of flush_dcache_page")
Cc: Mike Christie <michael.christie@oracle.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
scatter_data_area() has two purposes:
1) Create the iovs for the data area buffer of a SCSI cmd.
2) If there is data in DMA_TO_DEVICE direction, copy
the data from sg_list to data area buffer.
Both are done in a common loop.
In case of DMA_FROM_DEVICE data transfer, scatter_data_area() is called
with parameter copy_data = false. But this flag is just used to skip
memcpy() for data, while radix_tree_lookup still is called for every dbi of
the area area buffer, and kmap and kunmap are called for every page from
sg_list and data_area as well as flush_dcache_page() for the data area
pages. Since the only thing to do with copy_data = false would be to set
up the iovs, this is a noticeable overhead. Rework the iov creation in the
main loop of scatter_data_area() providing the new function
new_block_to_iov(). Based on this, create the short new function
tcmu_setup_iovs() that only writes the iovs with no overhead. This new
function is now called instead of scatter_data_area() for bidi buffers and
for data buffers in those cases where memcpy() would have been skipped.
Link: https://lore.kernel.org/r/20200910155041.17654-4-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
queue_cmd_ring() needs to check whether there is enough space in cmd ring
and data area for the cmd to queue.
Currently the sequence is:
1) Calculate size the cmd will occupy on the ring based on estimation of
needed iovs.
2) Check whether there is enough space on the ring based on size from 1)
3) Allocate buffers in data area.
4) Calculate number of iovs the command really needs while copying
incoming data (if any) to data area.
5) Re-calculate real size of cmd on ring based on real number of iovs.
6) Set up possible padding and cmd on the ring.
Step 1) must not underestimate the cmd size so use max possible number of
iovs for the given I/O data size. The resulting overestimation can be
really high so this sequence is not ideal. The earliest the real number of
iovs can be calculated is after data buffer allocation. Therefore rework
the code to implement the following sequence:
A) Allocate buffers on data area and calculate number of necessary iovs
during this.
B) Calculate real size of cmd on ring based on number of iovs.
C) Check whether there is enough space on the ring.
D) Set up possible padding and cmd on the ring.
The new sequence enforces the split of new function tcmu_alloc_data_space()
from is_ring_space_avail(). Using this function, change queue_cmd_ring()
according to the new sequence.
Change routines called by tcmu_alloc_data_space() to allow calculating and
returning the iov count. Remove counting of iovs in scatter_data_area().
Link: https://lore.kernel.org/r/20200910155041.17654-3-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Simplify code by joining tcmu_cmd_get_data_length() and
tcmu_cmd_get_block_cnt() into tcmu_cmd_set_block_cnts(). The new function
sets tcmu_cmd->dbi_cnt and also the new field tcmu_cmd->dbi_bidi_cnt which
is needed for further enhancements in following patches. Simplify some
code by using tcmu_cmd->dbi(_bidi)_cnt instead of calculation from length.
Please note: The calculation of the number of dbis needed for bidi was
wrong. It was based on the length of the first bidi sg only. I changed it
to correctly sum up entire length of all bidi sgs.
Link: https://lore.kernel.org/r/20200910155041.17654-2-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The tcmu 'global_max_data_area_mb' parameter in sysfs is missing a
newline. Add it.
Link: https://lore.kernel.org/r/1599132573-33818-1-git-send-email-wangxiongfeng2@huawei.com
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add "tmr_notification" configfs attribute to tcmu devices. If the default
value 0 is used, tcmu only removes aborted commands from qfull_queue. If
user changes tmr_notification to 1, additionally TMR notifications will be
written to the cmd ring.
Link: https://lore.kernel.org/r/20200726153510.13077-9-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch implements the tmr_notify callback for tcmu. When the callback
is called, tcmu checks the list of aborted commands it received as
parameter:
- aborted commands in the qfull_queue are removed from the queue and
target_complete_command is called
- from the cmd_ids of aborted commands currently uncompleted in cmd ring
it creates a list of aborted cmd_ids.
Finally a TMR notification is written to cmd ring containing TMR type and
cmd_id list. If there is no space in ring, the TMR notification is queued
on a TMR specific queue.
The TMR specific queue 'tmr_queue' can be seen as a extension of the cmd
ring. At the end of each iexecution of tcmu_complete_commands() we check
whether tmr_queue contains TMRs and try to move them onto the ring. If
tmr_queue is not empty after that, we don't call run_qfull_queue() because
commands must not overtake TMRs.
This way we guarantee that cmd_ids in TMR notification received by
userspace either match an active, not yet completed command or are no
longer valid due to userspace having complete some cmd_ids meanwhile.
New commands that were assigned to an aborted cmd_id will always appear on
the cmd ring _after_ the TMR.
Link: https://lore.kernel.org/r/20200726153510.13077-8-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
During cmd timeout handling in check_timedout_devices(), due to a race, it
can happen that tcmu_set_next_deadline() does not start a timer as
expected:
1) Either tcmu_check_expired_ring_cmd() checks the inflight_queue or
tcmu_check_expired_queue_cmd() checks the qfull_queue while jiffies has
the value X
2) At the end of the check the queue contains one remaining command with
deadline X (time_after(X, X) is false and thus the command is not
handled as being timed out).
3) After tcmu_check_expired_xxxxx_cmd() a timer interrupt happens and
jiffies is incremented to X+1.
4) Now tcmu_set_next_deadline() is called, but it skips the command, since
time_after(X+1, X) is true. Therefore tcmu_set_next_deadline() finds no
new deadline and stops the timer, which it shouldn't.
Since commands that time out are removed from inflight_queue or
qfull_queue, we don't need the check with time_after() in
tcmu_set_next_deadline() but can use the deadline from the first cmd in
the queue.
Additionally, replace the remaining time_after() calls in
tcmu_check_expired_xxxxx_cmd() with time_after_eq(), because it is not
useful to set the timeout to deadline but then check for jiffies being
greater than deadline.
Simplify the end of tcmu_handle_completions() and change the check for no
more pending commands from
mb->cmd_tail == mb->cmd_head
to
idr_is_empty(&udev->commands)
because the old check doesn't work correctly if paddings or in the future
TMRs are in the ring.
Finally tcmu_set_next_deadline() was shifted in the source as
preparation for later implementation of tmr_notify callback.
Link: https://lore.kernel.org/r/20200726153510.13077-7-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The new helper ring_insert_padding is split off from and then called by
queue_cmd_ring. It inserts a padding if necessary. The new helper will in
a subsequent patch be used during writing of TMR notifications to command
ring.
Link: https://lore.kernel.org/r/20200726153510.13077-6-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If tcmu receives an already aborted command, tcmu_queue_cmd() should reject
it.
Link: https://lore.kernel.org/r/20200726153510.13077-5-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We initialize and clean up the se_cmd's priv pointer under cmd_ring_lock to
point to the corresponding tcmu_cmd.
In the patch that implements tmr_notify callback in tcmu we will use the
priv pointer.
Link: https://lore.kernel.org/r/20200726153510.13077-4-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If tcmu_handle_completions() has to process a padding shorter than
sizeof(struct tcmu_cmd_entry), the current call to
tcmu_flush_dcache_range() with sizeof(struct tcmu_cmd_entry) as length
param is wrong and causes crashes on e.g. ARM, because
tcmu_flush_dcache_range() in this case calls
flush_dcache_page(vmalloc_to_page(start)); with start being an invalid
address above the end of the vmalloc'ed area.
The fix is to use the minimum of remaining ring space and sizeof(struct
tcmu_cmd_entry) as the length param.
The patch was tested on kernel 4.19.118.
See https://bugzilla.kernel.org/show_bug.cgi?id=208045#c10
Link: https://lore.kernel.org/r/20200629093756.8947-1-bstroesser@ts.fujitsu.com
Tested-by: JiangYu <lnsyyj@hotmail.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch fixes the following crash (see
https://bugzilla.kernel.org/show_bug.cgi?id=208045)
Process iscsi_trx (pid: 7496, stack limit = 0x0000000010dd111a)
CPU: 0 PID: 7496 Comm: iscsi_trx Not tainted 4.19.118-0419118-generic
#202004230533
Hardware name: Greatwall QingTian DF720/F601, BIOS 601FBE20 Sep 26 2019
pstate: 80400005 (Nzcv daif +PAN -UAO)
pc : flush_dcache_page+0x18/0x40
lr : is_ring_space_avail+0x68/0x2f8 [target_core_user]
sp : ffff000015123a80
x29: ffff000015123a80 x28: 0000000000000000
x27: 0000000000001000 x26: ffff000023ea5000
x25: ffffcfa25bbe08b8 x24: 0000000000000078
x23: ffff7e0000000000 x22: ffff000023ea5001
x21: ffffcfa24b79c000 x20: 0000000000000fff
x19: ffff7e00008fa940 x18: 0000000000000000
x17: 0000000000000000 x16: ffff2d047e709138
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: ffff2d047fbd0a40
x11: 0000000000000000 x10: 0000000000000030
x9 : 0000000000000000 x8 : ffffc9a254820a00
x7 : 00000000000013b0 x6 : 000000000000003f
x5 : 0000000000000040 x4 : ffffcfa25bbe08e8
x3 : 0000000000001000 x2 : 0000000000000078
x1 : ffffcfa25bbe08b8 x0 : ffff2d040bc88a18
Call trace:
flush_dcache_page+0x18/0x40
is_ring_space_avail+0x68/0x2f8 [target_core_user]
queue_cmd_ring+0x1f8/0x680 [target_core_user]
tcmu_queue_cmd+0xe4/0x158 [target_core_user]
__target_execute_cmd+0x30/0xf0 [target_core_mod]
target_execute_cmd+0x294/0x390 [target_core_mod]
transport_generic_new_cmd+0x1e8/0x358 [target_core_mod]
transport_handle_cdb_direct+0x50/0xb0 [target_core_mod]
iscsit_execute_cmd+0x2b4/0x350 [iscsi_target_mod]
iscsit_sequence_cmd+0xd8/0x1d8 [iscsi_target_mod]
iscsit_process_scsi_cmd+0xac/0xf8 [iscsi_target_mod]
iscsit_get_rx_pdu+0x404/0xd00 [iscsi_target_mod]
iscsi_target_rx_thread+0xb8/0x130 [iscsi_target_mod]
kthread+0x130/0x138
ret_from_fork+0x10/0x18
Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (f9400260)
---[ end trace 1e451c73f4266776 ]---
The solution is based on patch:
"scsi: target: tcmu: Optimize use of flush_dcache_page"
which restricts the use of tcmu_flush_dcache_range() to addresses from
vmalloc'ed areas only.
This patch now replaces the virt_to_page() call in
tcmu_flush_dcache_range() - which is wrong for vmalloced addrs - by
vmalloc_to_page().
The patch was tested on ARM with kernel 4.19.118 and 5.7.2
Link: https://lore.kernel.org/r/20200618131632.32748-3-bstroesser@ts.fujitsu.com
Tested-by: JiangYu <lnsyyj@hotmail.com>
Tested-by: Daniel Meyerholt <dxm523@gmail.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
(scatter|gather)_data_area() need to flush dcache after writing data to or
before reading data from a page in uio data area. The two routines are
able to handle data transfer to/from such a page in fragments and flush the
cache after each fragment was copied by calling the wrapper
tcmu_flush_dcache_range().
That means:
1) flush_dcache_page() can be called multiple times for the same page.
2) Calling flush_dcache_page() indirectly using the wrapper does not make
sense, because each call of the wrapper is for one single page only and
the calling routine already has the correct page pointer.
Change (scatter|gather)_data_area() such that, instead of calling
tcmu_flush_dcache_range() before/after each memcpy, it now calls
flush_dcache_page() before unmapping a page (when writing is complete for
that page) or after mapping a page (when starting to read the page).
After this change only calls to tcmu_flush_dcache_range() for addresses in
vmalloc'ed command ring are left over.
The patch was tested on ARM with kernel 4.19.118 and 5.7.2
Link: https://lore.kernel.org/r/20200618131632.32748-2-bstroesser@ts.fujitsu.com
Tested-by: JiangYu <lnsyyj@hotmail.com>
Tested-by: Daniel Meyerholt <dxm523@gmail.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Since commit 61fb24822166 ("scsi: target: tcmu: Userspace must not complete
queued commands") tcmu_cmd bit TCMU_CMD_BIT_INFLIGHT is set but never
checked. So we can remove it safely.
[mkp: fixed Mike's email address]
Link: https://lore.kernel.org/r/20200619173806.5016-1-bstroesser@ts.fujitsu.com
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull more SCSI updates from James Bottomley:
"This is the set of changes collected since just before the merge
window opened. It's mostly minor fixes in drivers.
The one non-driver set is the three optical disk (sr) changes where
two are error path fixes and one is a helper conversion.
The big driver change is the hpsa compat_alloc_userspace rework by Al
so he can kill the remaining user. This has been tested and acked by
the maintainer"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits)
scsi: acornscsi: Fix an error handling path in acornscsi_probe()
scsi: storvsc: Remove memset before memory freeing in storvsc_suspend()
scsi: cxlflash: Remove an unnecessary NULL check
scsi: ibmvscsi: Don't send host info in adapter info MAD after LPM
scsi: sr: Fix sr_probe() missing deallocate of device minor
scsi: sr: Fix sr_probe() missing mutex_destroy
scsi: st: Convert convert get_user_pages() --> pin_user_pages()
scsi: target: Rename target_setup_cmd_from_cdb() to target_cmd_parse_cdb()
scsi: target: Fix NULL pointer dereference
scsi: target: Initialize LUN in transport_init_se_cmd()
scsi: target: Factor out a new helper, target_cmd_init_cdb()
scsi: hpsa: hpsa_ioctl(): Tidy up a bit
scsi: hpsa: Get rid of compat_alloc_user_space()
scsi: hpsa: Don't bother with vmalloc for BIG_IOCTL_Command_struct
scsi: hpsa: Lift {BIG_,}IOCTL_Command_struct copy{in,out} into hpsa_ioctl()
scsi: ufs: Remove redundant urgent_bkop_lvl initialization
scsi: ufs: Don't update urgent bkops level when toggling auto bkops
scsi: qedf: Remove redundant initialization of variable rc
scsi: mpt3sas: Fix memset() in non-RDPQ mode
scsi: iscsi: Fix reference count leak in iscsi_boot_create_kobj
...
|
|
Pull SCSI updates from James Bottomley:
:This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
of other minor updates.
There are no major core changes in this series apart from a
refactoring in scsi_lib.c"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes
scsi: cxgb3i: Fix some leaks in init_act_open()
scsi: ibmvscsi: Make some functions static
scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim
scsi: ufs: Fix WriteBooster flush during runtime suspend
scsi: ufs: Fix index of attributes query for WriteBooster feature
scsi: ufs: Allow WriteBooster on UFS 2.2 devices
scsi: ufs: Remove unnecessary memset for dev_info
scsi: ufs-qcom: Fix scheduling while atomic issue
scsi: mpt3sas: Fix reply queue count in non RDPQ mode
scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
scsi: vhost: Notify TCM about the maximum sg entries supported per command
scsi: qla2xxx: Remove return value from qla_nvme_ls()
scsi: qla2xxx: Remove an unused function
scsi: iscsi: Register sysfs for iscsi workqueue
scsi: scsi_debug: Parser tables and code interaction
scsi: core: Refactor scsi_mq_setup_tags function
scsi: core: Fix incorrect usage of shost_for_each_device
scsi: qla2xxx: Fix endianness annotations in source files
...
|
|
1) If remaining ring space before the end of the ring is smaller then the
next cmd to write, tcmu writes a padding entry which fills the remaining
space at the end of the ring.
Then tcmu calls tcmu_flush_dcache_range() with the size of struct
tcmu_cmd_entry as data length to flush. If the space filled by the
padding was smaller then tcmu_cmd_entry, tcmu_flush_dcache_range() is
called for an address range reaching behind the end of the vmalloc'ed
ring.
tcmu_flush_dcache_range() in a loop calls
flush_dcache_page(virt_to_page(start)); for every page being part of the
range. On x86 the line is optimized out by the compiler, as
flush_dcache_page() is empty on x86.
But I assume the above can cause trouble on other architectures that
really have a flush_dcache_page(). For paddings only the header part of
an entry is relevant due to alignment rules the header always fits in
the remaining space, if padding is needed. So tcmu_flush_dcache_range()
can safely be called with sizeof(entry->hdr) as the length here.
2) After it has written a command to cmd ring, tcmu calls
tcmu_flush_dcache_range() using the size of a struct tcmu_cmd_entry as
data length to flush. But if a command needs many iovecs, the real size
of the command may be bigger then tcmu_cmd_entry, so a part of the
written command is not flushed then.
Link: https://lore.kernel.org/r/20200528193108.9085-1-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The pr_debug() dereferences "cmd" after we already freed it by calling
tcmu_free_cmd(cmd). The debug printk needs to be done earlier.
Link: https://lore.kernel.org/r/20200523101129.GB98132@mwanda
Fixes: 61fb24822166 ("scsi: target: tcmu: Userspace must not complete queued commands")
Reviewed-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When tcmu queues a new command - no matter whether in command ring or in
qfull_queue - a cmd_id from IDR udev->commands is assigned to the command.
If userspace sends a wrong command completion containing the cmd_id of a
command on the qfull_queue, tcmu_handle_completions() finds the command in
the IDR and calls tcmu_handle_completion() for it. This might do some nasty
things because commands in qfull_queue do not have a valid dbi list.
To fix this bug, we no longer add queued commands to the idr. Instead the
cmd_id is assign when a command is written to the command ring.
Due to this change I had to adapt the source code at several places where
up to now an idr_for_each had been done.
[mkp: fix checkpatch warnings]
Link: https://lore.kernel.org/r/20200518164833.12775-1-bstroesser@ts.fujitsu.com
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Currently in tcmu reservation commands are handled by core's pr
implementation (default) or completely rejected (emulate_pr set to 0). We
additionally want to be able to do full reservation handling in
userspace. Therefore we need a way to set TRANSPORT_FLAG_PASSTHROUGH_PGR.
The inverted flag is displayed by attribute pgr_support. Since we moved
the flag from transport/backend to se_device in the previous commit, we now
can make it changeable per device by allowing to write the attribute. The
new field transport_flags_changeable in transport/backend is used to reject
writing if not allowed for a backend.
Regarding ALUA we also want to be able to passthrough commands to userspace
in tcmu. Therefore we need TRANSPORT_FLAG_PASSTHROUGH_ALUA to be
changeable, because by setting it we can switch off all ALUA checks in
core. So we also set TRANSPORT_FLAG_PASSTHROUGH_ALUA in tcmu's
transport_flags_changeable.
Of course, ALUA and reservation handling in userspace will work only, if
session/nexus information is sent to userspace along with every
command. This will be object of a patch series announced by Mike Christie.
Link: https://lore.kernel.org/r/20200427150823.15350-5-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
pgr_support and alua_support device attributes show the inverted value of
the transport_flags:
* TRANSPORT_FLAG_PASSTHROUGH_PGR
* TRANSPORT_FLAG_PASSTHROUGH_ALUA
These attributes are per device, while the flags are per backend. Rename
the transport_flags in backend/transport to transport_flags_default and use
this value to initialize the new transport_flags field in the se_device
structure.
Now data and attribute both are per se_device.
Link: https://lore.kernel.org/r/20200427150823.15350-4-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
tcmu has not set TRANSPORT_FLAG_PASSTHROUGH_PGR. Therefore the in-core pr
emulation is active by default, but there are some attributes for
configuration missing. Add them.
Link: https://lore.kernel.org/r/20200427150823.15350-3-bstroesser@ts.fujitsu.com
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|