summaryrefslogtreecommitdiff
path: root/drivers/md/dm-zoned-target.c
AgeCommit message (Collapse)Author
2024-06-19block: move the zoned flag into the features fieldChristoph Hellwig
Move the zoned flags into the features field to reclaim a little bit of space. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-23-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-20dm: always manage discard support in terms of max_hw_discard_sectorsMike Snitzer
Commit 4f563a64732d ("block: add a max_user_discard_sectors queue limit") changed block core to set max_discard_sectors to: min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors) Since commit 1c0e720228ad ("dm: use queue_limits_set") it was reported dm-thinp was failing in a few fstests (generic/347 and generic/405) with the first WARN_ON_ONCE in dm_cell_key_has_valid_range() being reported, e.g.: WARNING: CPU: 1 PID: 30 at drivers/md/dm-bio-prison-v1.c:128 dm_cell_key_has_valid_range+0x3d/0x50 blk_set_stacking_limits() sets max_user_discard_sectors to UINT_MAX, so given how block core now sets max_discard_sectors (detailed above) it follows that blk_stack_limits() stacks up the underlying device's max_hw_discard_sectors and max_discard_sectors is set to match it. If max_hw_discard_sectors exceeds dm's BIO_PRISON_MAX_RANGE, then dm_cell_key_has_valid_range() will trigger the warning with: WARN_ON_ONCE(key->block_end - key->block_begin > BIO_PRISON_MAX_RANGE) Aside from this warning, the discard will fail. Fix this and other DM issues by governing discard support in terms of max_hw_discard_sectors instead of max_discard_sectors. Reported-by: Theodore Ts'o <tytso@mit.edu> Fixes: 1c0e720228ad ("dm: use queue_limits_set") Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-12-19block: remove support for the host aware zone modelChristoph Hellwig
When zones were first added the SCSI and ATA specs, two different models were supported (in addition to the drive managed one that is invisible to the host): - host managed where non-conventional zones there is strict requirement to write at the write pointer, or else an error is returned - host aware where a write point is maintained if writes always happen at it, otherwise it is left in an under-defined state and the sequential write preferred zones behave like conventional zones (probably very badly performing ones, though) Not surprisingly this lukewarm model didn't prove to be very useful and was finally removed from the ZBC and SBC specs (NVMe never implemented it). Due to to the easily disappearing write pointer host software could never rely on the write pointer to actually be useful for say recovery. Fortunately only a few HDD prototypes shipped using this model which never made it to mass production. Drop the support before it is too late. Note that any such host aware prototype HDD can still be used with Linux as we'll now treat it as a conventional HDD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-09-20dm zoned: free dmz->ddev array in dmz_put_zoned_devicesFedor Pchelkin
Commit 4dba12881f88 ("dm zoned: support arbitrary number of devices") made the pointers to additional zoned devices to be stored in a dynamically allocated dmz->ddev array. However, this array is not freed. Rename dmz_put_zoned_device to dmz_put_zoned_devices and fix it to free the dmz->ddev array when cleaning up zoned device information. Remove NULL assignment for all dmz->ddev elements and just free the dmz->ddev array instead. Found by Linux Verification Center (linuxtesting.org). Fixes: 4dba12881f88 ("dm zoned: support arbitrary number of devices") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-04-11dm: add helper macro for simple DM target module init and exitYangtao Li
Eliminate duplicate boilerplate code for simple modules that contain a single DM target driver without any additional setup code. Add a new module_dm() macro, which replaces the module_init() and module_exit() with template functions that call dm_register_target() and dm_unregister_target() respectively. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: avoid void function return statementsHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-06dm-zoned: cleanup dmz_fixup_devicesChristoph Hellwig
Use the bdev based helpers where applicable and move the zoned_dev into the scope where it is actually used. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: replace blkdev_nr_zones with bdev_nr_zonesChristoph Hellwig
Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03dm-zoned: don't set the discard_alignment queue limitChristoph Hellwig
The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by dm-zoned is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-02dm-zoned: remove the ->name field in struct dmz_devChristoph Hellwig
Just use the %pg format specifier to print the block device name directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-02-04block: pass a block_device to bio_clone_fastChristoph Hellwig
Pass a block_device to bio_clone_fast and __bio_clone_fast and give the functions more suitable names. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-09Merge tag 'for-5.16/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Add DM core support for emitting audit events through the audit subsystem. Also enhance both the integrity and crypt targets to emit events to via dm-audit. - Various other simple code improvements and cleanups. * tag 'for-5.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm table: log table creation error code dm: make workqueue names device-specific dm writecache: Make use of the helper macro kthread_run() dm crypt: Make use of the helper macro kthread_run() dm verity: use bvec_kmap_local in verity_for_bv_block dm log writes: use memcpy_from_bvec in log_writes_map dm integrity: use bvec_kmap_local in __journal_read_write dm integrity: use bvec_kmap_local in integrity_metadata dm: add add_disk() error handling dm: Remove redundant flush_workqueue() calls dm crypt: log aead integrity violations to audit subsystem dm integrity: log audit events for dm-integrity target dm: introduce audit event module for device mapper
2021-11-01dm: Remove redundant flush_workqueue() callsChristophe JAILLET
destroy_workqueue() already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant flush_workqueue() calls. This was generated with coccinelle: @@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-10-18dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding themChristoph Hellwig
Use the proper helpers to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211018101130.1838532-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-10dm: update target status functions to support IMA measurementTushar Sugandhi
For device mapper targets to take advantage of IMA's measurement capabilities, the status functions for the individual targets need to be updated to handle the status_type_t case for value STATUSTYPE_IMA. Update status functions for the following target types, to log their respective attributes to be measured using IMA. 01. cache 02. crypt 03. integrity 04. linear 05. mirror 06. multipath 07. raid 08. snapshot 09. striped 10. verity For rest of the targets, handle the STATUSTYPE_IMA case by setting the measurement buffer to NULL. For IMA to measure the data on a given system, the IMA policy on the system needs to be updated to have the following line, and the system needs to be restarted for the measurements to take effect. /etc/ima/ima-policy measure func=CRITICAL_DATA label=device-mapper template=ima-buf The measurements will be reflected in the IMA logs, which are located at: /sys/kernel/security/integrity/ima/ascii_runtime_measurements /sys/kernel/security/integrity/ima/binary_runtime_measurements These IMA logs can later be consumed by various attestation clients running on the system, and send them to external services for attesting the system. The DM target data measured by IMA subsystem can alternatively be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with DM_TABLE_STATUS_CMD. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-03-22dm table: Fix zoned model check and zone sectors checkShin'ichiro Kawasaki
Commit 24f6b6036c9e ("dm table: fix zoned iterate_devices based device capability checks") triggered dm table load failure when dm-zoned device is set up for zoned block devices and a regular device for cache. The commit inverted logic of two callback functions for iterate_devices: device_is_zoned_model() and device_matches_zone_sectors(). The logic of device_is_zoned_model() was inverted then all destination devices of all targets in dm table are required to have the expected zoned model. This is fine for dm-linear, dm-flakey and dm-crypt on zoned block devices since each target has only one destination device. However, this results in failure for dm-zoned with regular cache device since that target has both regular block device and zoned block devices. As for device_matches_zone_sectors(), the commit inverted the logic to require all zoned block devices in each target have the specified zone_sectors. This check also fails for regular block device which does not have zones. To avoid the check failures, fix the zone model check and the zone sectors check. For zone model check, introduce the new feature flag DM_TARGET_MIXED_ZONED_MODEL, and set it to dm-zoned target. When the target has this flag, allow it to have destination devices with any zoned model. For zone sectors check, skip the check if the destination device is not a zoned block device. Also add comments and improve an error message to clarify expectations to the two checks. Fixes: 24f6b6036c9e ("dm table: fix zoned iterate_devices based device capability checks") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-08-03Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "Good amount of cleanups and tech debt removals in here, and as a result, the diffstat shows a nice net reduction in code. - Softirq completion cleanups (Christoph) - Stop using ->queuedata (Christoph) - Cleanup bd claiming (Christoph) - Use check_events, moving away from the legacy media change (Christoph) - Use inode i_blkbits consistently (Christoph) - Remove old unused writeback congestion bits (Christoph) - Cleanup/unify submission path (Christoph) - Use bio_uninit consistently, instead of bio_disassociate_blkg (Christoph) - sbitmap cleared bits handling (John) - Request merging blktrace event addition (Jan) - sysfs add/remove race fixes (Luis) - blk-mq tag fixes/optimizations (Ming) - Duplicate words in comments (Randy) - Flush deferral cleanup (Yufen) - IO context locking/retry fixes (John) - struct_size() usage (Gustavo) - blk-iocost fixes (Chengming) - blk-cgroup IO stats fixes (Boris) - Various little fixes" * tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits) block: blk-timeout: delete duplicated word block: blk-mq-sched: delete duplicated word block: blk-mq: delete duplicated word block: genhd: delete duplicated words block: elevator: delete duplicated word and fix typos block: bio: delete duplicated words block: bfq-iosched: fix duplicated word iocost_monitor: start from the oldest usage index iocost: Fix check condition of iocg abs_vdebt block: Remove callback typedefs for blk_mq_ops block: Use non _rcu version of list functions for tag_set_list blk-cgroup: show global disk stats in root cgroup io.stat blk-cgroup: make iostat functions visible to stat printing block: improve discard bio alignment in __blkdev_issue_discard() block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers block: defer flush request no matter whether we have elevator block: make blk_timeout_init() static block: remove retry loop in ioc_release_fn() block: remove unnecessary ioc nested locking block: integrate bd_start_claiming into __blkdev_get ...
2020-07-08dm zoned: Fix zone reclaim triggerDamien Le Moal
Only triggering reclaim based on the percentage of unmapped cache zones can fail to detect cases where reclaim is needed, e.g. if the target has only 2 or 3 cache zones and only one unmapped cache zone, the percentage of free cache zones is higher than DMZ_RECLAIM_LOW_UNMAP_ZONES (30%) and reclaim does not trigger. This problem, combined with the fact that dmz_schedule_reclaim() is called from dmz_handle_bio() without the map lock held, leads to a race between zone allocation and dmz_should_reclaim() result. Depending on the workload applied, this race can lead to the write path waiting forever for a free zone without reclaim being triggered. Fix this by moving dmz_schedule_reclaim() inside dmz_alloc_zone() under the map lock. This results in checking the need for zone reclaim whenever a new data or buffer zone needs to be allocated. Also fix dmz_reclaim_percentage() to always return 0 if the number of unmapped cache (or random) zones is less than or equal to 1. Suggested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-07-01block: rename generic_make_request to submit_bio_noacctChristoph Hellwig
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-17dm zoned: assign max_io_len correctlyHou Tao
The unit of max_io_len is sector instead of byte (spotted through code review), so fix it. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: select reclaim zone based on device indexHannes Reinecke
per-device reclaim should select zones on that device only. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: support arbitrary number of devicesHannes Reinecke
Remove the hard-coded limit of two devices and support an unlimited number of additional zoned devices. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: move random and sequential zones into struct dmz_devHannes Reinecke
Random and sequential zones should be part of the respective device structure to make arbitration between devices possible. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: per-device reclaimHannes Reinecke
Instead of having one reclaim workqueue for the entire set we should be allocating a reclaim workqueue per device; doing so will reduce contention and should boost performance for a multi-device setup. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-06-05dm zoned: add device pointer to struct dm_zoneHannes Reinecke
Add a pointer, to the containing device, within struct dm_zone and kill dmz_zone_to_dev(). Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: separate random and cache zonesHannes Reinecke
Instead of lumping emulated zones together with random zones we should be handling them as separate 'cache' zones. This improves code readability and allows an easier implementation of different cache policies. Also add additional allocation flags, to separate the type (cache, random, or sequential) from the purpose (eg reclaim). Also switch the allocation policy to not use random zones as buffer zones if cache zones are present. This avoids a performance drop when all cache zones are used. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: Avoid 64-bit division error in dmz_fixup_devicesNathan Chancellor
When building arm32 allyesconfig: ld.lld: error: undefined symbol: __aeabi_uldivmod >>> referenced by dm-zoned-target.c >>> md/dm-zoned-target.o:(dmz_ctr) in archive drivers/built-in.a dmz_fixup_devices uses DIV_ROUND_UP with variables of type sector_t. As such, it should be using DIV_ROUND_UP_SECTOR_T, which handles this automatically. Fixes: 70978208ec91 ("dm zoned: metadata version 2") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: remove spurious newlines from debugging messagesHannes Reinecke
DMDEBUG will already add a newline to the logging messages, so we shouldn't be adding it to the message itself. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-20dm zoned: metadata version 2Hannes Reinecke
Implement handling for metadata version 2. The new metadata adds a label and UUID for the device mapper device, and additional UUID for the underlying block devices. It also allows for an additional regular drive to be used for emulating random access zones. The emulated zones will be placed logically in front of the zones from the zoned block device, causing the superblocks and metadata to be stored on that device. The first zone of the original zoned device will be used to hold another, tertiary copy of the metadata; this copy carries a generation number of 0 and is never updated; it's just used for identification. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-19dm zoned: replace 'target' pointer in the bio contextHannes Reinecke
Replace the 'target' pointer in the bio context with the device pointer as this is what's actually used. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-19dm zoned: remove 'dev' argument from reclaimHannes Reinecke
Use the dmz_zone_to_dev() mapping function to remove the 'dev' argument from reclaim. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: Introduce dmz_dev_is_dying() and dmz_check_dev()Hannes Reinecke
Introduce accessors dmz_dev_is_dying() and dmz_check_dev() to avoid having to reference the devices directly. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: introduce dmz_metadata_label() to format device nameHannes Reinecke
Introduce dmz_metadata_label() to format the device-mapper device name and use it instead of the device name of the underlying device. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: move fields from struct dmz_dev to dmz_metadataHannes Reinecke
Move fields from the device structure into the metadata structure and provide accessor functions. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: store zone id within the zone structure and kill dmz_id()Hannes Reinecke
Instead of calculating the zone index by the offset within the zone array store the index within the structure itself. With that the helper dmz_id() is pointless and can be replaced with accessing the ->id value directly. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: add 'message' callbackHannes Reinecke
Add callback for 'dmsetup message' to allow the reclaim process to be triggered manually. Eg. dmsetup message /dev/dm-X 0 message will start the reclaim process even if the default threshold of 50 percent of free random zones is not reached. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-05-15dm zoned: add 'status' callbackHannes Reinecke
Add callback to supply information for 'dmsetup status' and 'dmsetup table'. The output for 'dmsetup status' is 0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential where <nr_unmap_rnd> is the number of unmapped (ie free) random zones, <nr_rnd> the total number of random zones, <nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the total number of sequential zones. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-03-03dm: bump version of core and various targetsMike Snitzer
Changes made during the 5.6 cycle warrant bumping the version number for DM core and the targets modified by this commit. It should be noted that dm-thin, dm-crypt and dm-raid already had their target version bumped during the 5.6 merge window. Signed-off-by; Mike Snitzer <snitzer@redhat.com>
2020-02-27dm zoned: Fix reference counter initial value of chunk worksShin'ichiro Kawasaki
Dm-zoned initializes reference counters of new chunk works with zero value and refcount_inc() is called to increment the counter. However, the refcount_inc() function handles the addition to zero value as an error and triggers the warning as follows: refcount_t: addition on 0; use-after-free. WARNING: CPU: 7 PID: 1506 at lib/refcount.c:25 refcount_warn_saturate+0x68/0xf0 ... CPU: 7 PID: 1506 Comm: systemd-udevd Not tainted 5.4.0+ #134 ... Call Trace: dmz_map+0x2d2/0x350 [dm_zoned] __map_bio+0x42/0x1a0 __split_and_process_non_flush+0x14a/0x1b0 __split_and_process_bio+0x83/0x240 ? kmem_cache_alloc+0x165/0x220 dm_process_bio+0x90/0x230 ? generic_make_request_checks+0x2e7/0x680 dm_make_request+0x3e/0xb0 generic_make_request+0xcf/0x320 ? memcg_drain_all_list_lrus+0x1c0/0x1c0 submit_bio+0x3c/0x160 ? guard_bio_eod+0x2c/0x130 mpage_readpages+0x182/0x1d0 ? bdev_evict_inode+0xf0/0xf0 read_pages+0x6b/0x1b0 __do_page_cache_readahead+0x1ba/0x1d0 force_page_cache_readahead+0x93/0x100 generic_file_read_iter+0x83a/0xe40 ? __seccomp_filter+0x7b/0x670 new_sync_read+0x12a/0x1c0 vfs_read+0x9d/0x150 ksys_read+0x5f/0xe0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... After this warning, following refcount API calls for the counter all fail to change the counter value. Fix this by setting the initial reference counter value not zero but one for the new chunk works. Instead, do not call refcount_inc() via dmz_get_chunk_work() for the new chunks works. The failure was observed with linux version 5.4 with CONFIG_REFCOUNT_FULL enabled. Refcount rework was merged to linux version 5.5 by the commit 168829ad09ca ("Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"). After this commit, CONFIG_REFCOUNT_FULL was removed and the failure was observed regardless of kernel configuration. Linux version 4.20 merged the commit 092b5648760a ("dm zoned: target: use refcount_t for dm zoned reference counters"). Before this commit, dm zoned used atomic_t APIs which does not check addition to zero, then this fix is not necessary. Fixes: 092b5648760a ("dm zoned: target: use refcount_t for dm zoned reference counters") Cc: stable@vger.kernel.org # 5.4+ Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-12-03block: simplify blkdev_nr_zonesChristoph Hellwig
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead of the block_device and capacity. This also removes the need for __blkdev_nr_zones as all callers are outside the fast path and can deal with the additional branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-07dm zoned: reduce overhead of backing device checksDmitry Fomichev
Commit 75d66ffb48efb3 added backing device health checks and as a part of these checks, check_events() block ops template call is invoked in dm-zoned mapping path as well as in reclaim and flush path. Calling check_events() with ATA or SCSI backing devices introduces a blocking scsi_test_unit_ready() call being made in sd_check_events(). Even though the overhead of calling scsi_test_unit_ready() is small for ATA zoned devices, it is much larger for SCSI and it affects performance in a very negative way. Fix this performance regression by executing check_events() only in case of any I/O errors. The function dmz_bdev_is_dying() is modified to call only blk_queue_dying(), while calls to check_events() are made in a new helper function, dmz_check_bdev(). Reported-by: zhangxiaoxu <zhangxiaoxu5@huawei.com> Fixes: 75d66ffb48efb3 ("dm zoned: properly handle backing device failure") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-26dm zoned: fix invalid memory accessMikulas Patocka
Commit 75d66ffb48efb30f2dd42f041ba8b39c5b2bd115 ("dm zoned: properly handle backing device failure") triggers a coverity warning: *** CID 1452808: Memory - illegal accesses (USE_AFTER_FREE) /drivers/md/dm-zoned-target.c: 137 in dmz_submit_bio() 131 clone->bi_private = bioctx; 132 133 bio_advance(bio, clone->bi_iter.bi_size); 134 135 refcount_inc(&bioctx->ref); 136 generic_make_request(clone); >>> CID 1452808: Memory - illegal accesses (USE_AFTER_FREE) >>> Dereferencing freed pointer "clone". 137 if (clone->bi_status == BLK_STS_IOERR) 138 return -EIO; 139 140 if (bio_op(bio) == REQ_OP_WRITE && dmz_is_seq(zone)) 141 zone->wp_block += nr_blocks; 142 The "clone" bio may be processed and freed before the check "clone->bi_status == BLK_STS_IOERR" - so this check can access invalid memory. Fixes: 75d66ffb48efb3 ("dm zoned: properly handle backing device failure") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-15dm zoned: add SPDX license identifiersDmitry Fomichev
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-15dm zoned: properly handle backing device failureDmitry Fomichev
dm-zoned is observed to lock up or livelock in case of hardware failure or some misconfiguration of the backing zoned device. This patch adds a new dm-zoned target function that checks the status of the backing device. If the request queue of the backing device is found to be in dying state or the SCSI backing device enters offline state, the health check code sets a dm-zoned target flag prompting all further incoming I/O to be rejected. In order to detect backing device failures timely, this new function is called in the request mapping path, at the beginning of every reclaim run and before performing any metadata I/O. The proper way out of this situation is to do dmsetup remove <dm-zoned target> and recreate the target when the problem with the backing device is resolved. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-15dm zoned: improve error handling in i/o map codeDmitry Fomichev
Some errors are ignored in the I/O path during queueing chunks for processing by chunk works. Since at least these errors are transient in nature, it should be possible to retry the failed incoming commands. The fix - Errors that can happen while queueing chunks are carried upwards to the main mapping function and it now returns DM_MAPIO_REQUEUE for any incoming requests that can not be properly queued. Error logging/debug messages are added where needed. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-18dm zoned: Silence a static checker warningDan Carpenter
My static checker complains about this line from dmz_get_zoned_device() aligned_capacity = dev->capacity & ~(blk_queue_zone_sectors(q) - 1); The problem is that "aligned_capacity" and "dev->capacity" are sector_t type (which is a u64 under most configs) but blk_queue_zone_sectors(q) returns a u32 so the higher 32 bits in aligned_capacity are cleared to zero. This patch adds a cast to address the issue. Fixes: 114e025968b5 ("dm zoned: ignore last smaller runt zone") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-20dm: eliminate 'split_discard_bios' flag from DM target interfaceMike Snitzer
There is no need to have DM core split discards on behalf of a DM target now that blk_queue_split() handles splitting discards based on the queue_limits. A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-07dm zoned: Fix target BIO completion handlingDamien Le Moal
struct bioctx includes the ref refcount_t to track the number of I/O fragments used to process a target BIO as well as ensure that the zone of the BIO is kept in the active state throughout the lifetime of the BIO. However, since decrementing of this reference count is done in the target .end_io method, the function bio_endio() must be called multiple times for read and write target BIOs, which causes problems with the value of the __bi_remaining struct bio field for chained BIOs (e.g. the clone BIO passed by dm core is large and splits into fragments by the block layer), resulting in incorrect values and inconsistencies with the BIO_CHAIN flag setting. This is turn triggers the BUG_ON() call: BUG_ON(atomic_read(&bio->__bi_remaining) <= 0); in bio_remaining_done() called from bio_endio(). Fix this ensuring that bio_endio() is called only once for any target BIO by always using internal clone BIOs for processing any read or write target BIO. This allows reference counting using the target BIO context counter to trigger the target BIO completion bio_endio() call once all data, metadata and other zone work triggered by the BIO complete. Overall, this simplifies the code too as the target .end_io becomes unnecessary and differences between read and write BIO issuing and completion processing disappear. Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-26Merge tag 'for-4.20/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - The biggest change this cycle is to remove support for the legacy IO path (.request_fn) from request-based DM. Jens has already started preparing for complete removal of the legacy IO path in 4.21 but this earlier removal of support from DM has been coordinated with Jens (as evidenced by the commit being attributed to him). Making request-based DM exclussively blk-mq only cleans up that portion of DM core quite nicely. - Convert the thinp and zoned targets over to using refcount_t where applicable. - A couple fixes to the DM zoned target for refcounting and other races buried in the implementation of metadata block creation and use. - Small cleanups to remove redundant unlikely() around a couple WARN_ON_ONCE(). - Simplify how dm-ioctl copies from userspace, eliminating some potential for a malicious user trying to change the executed ioctl after its processing has begun. - Tweaked DM crypt target to use the DM device name when naming the various workqueues created for a particular DM crypt device (makes the N workqueues for a DM crypt device more easily understood and enhances user's accounting capabilities at a glance via "ps") - Small fixup to remove dead branch in DM writecache's memory_entry(). * tag 'for-4.20/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm writecache: remove disabled code in memory_entry() dm zoned: fix various dmz_get_mblock() issues dm zoned: fix metadata block ref counting dm raid: avoid bitmap with raid4/5/6 journal device dm crypt: make workqueue names device-specific dm: add dm_table_device_name() dm ioctl: harden copy_params()'s copy_from_user() from malicious users dm: remove unnecessary unlikely() around WARN_ON_ONCE() dm zoned: target: use refcount_t for dm zoned reference counters dm thin: use refcount_t for thin_c reference counting dm table: require that request-based DM be layered on blk-mq devices dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED dm: remove legacy request-based IO path
2018-10-25block: Introduce blkdev_nr_zones() helperDamien Le Moal
Introduce the blkdev_nr_zones() helper function to get the total number of zones of a zoned block device. This number is always 0 for a regular block device (q->limits.zoned == BLK_ZONED_NONE case). Replace hard-coded number of zones calculation in dmz_get_zoned_device() with a call to this helper. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>