diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2015-04-18 08:14:18 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2015-04-18 08:14:18 -0400 |
commit | afad97eee47c1f1f242202e2473929b4ef5d9f43 (patch) | |
tree | 31f68d70760234b582a28bd3f64311ff5307b7b1 /Documentation | |
parent | 04b7fe6a4a231871ef681bc95e08fe66992f7b1f (diff) | |
parent | 44c144f9c8e8fbd73ede2848da8253b3aae42ec2 (diff) |
Merge tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- the most extensive changes this cycle are the DM core improvements to
add full blk-mq support to request-based DM.
- disabled by default but user can opt-in with CONFIG_DM_MQ_DEFAULT
- depends on some blk-mq changes from Jens' for-4.1/core branch so
that explains why this pull is built on linux-block.git
- update DM to use name_to_dev_t() rather than open-coding a less
capable device parser.
- includes a couple small improvements to name_to_dev_t() that offer
stricter constraints that DM's code provided.
- improvements to the dm-cache "mq" cache replacement policy.
- a DM crypt crypt_ctr() error path fix and an async crypto deadlock
fix
- a small efficiency improvement for DM crypt decryption by leveraging
immutable biovecs
- add error handling modes for corrupted blocks to DM verity
- a new "log-writes" DM target from Josef Bacik that is meant for file
system developers to test file system integrity at particular points
in the life of a file system
- a few DM log userspace cleanups and fixes
- a few Documentation fixes (for thin, cache, crypt and switch)
* tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (34 commits)
dm crypt: fix missing error code return from crypt_ctr error path
dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
dm crypt: leverage immutable biovecs when decrypting on read
dm crypt: update URLs to new cryptsetup project page
dm: add log writes target
dm table: use bool function return values of true/false not 1/0
dm verity: add error handling modes for corrupted blocks
dm thin: remove stale 'trim' message documentation
dm delay: use msecs_to_jiffies for time conversion
dm log userspace base: fix compile warning
dm log userspace transfer: match wait_for_completion_timeout return type
dm table: fall back to getting device using name_to_dev_t()
init: stricter checking of major:minor root= values
init: export name_to_dev_t and mark name argument as const
dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr
dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
dm: add full blk-mq support to request-based DM
dm: impose configurable deadline for dm_request_fn's merge heuristic
dm sysfs: introduce ability to add writable attributes
dm: don't start current request if it would've merged with the previous
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-block-dm | 22 | ||||
-rw-r--r-- | Documentation/device-mapper/dm-crypt.txt | 4 | ||||
-rw-r--r-- | Documentation/device-mapper/log-writes.txt | 140 | ||||
-rw-r--r-- | Documentation/device-mapper/switch.txt | 4 | ||||
-rw-r--r-- | Documentation/device-mapper/thin-provisioning.txt | 3 | ||||
-rw-r--r-- | Documentation/device-mapper/verity.txt | 21 |
6 files changed, 185 insertions, 9 deletions
diff --git a/Documentation/ABI/testing/sysfs-block-dm b/Documentation/ABI/testing/sysfs-block-dm index 87ca5691e29b..f9f2339b9a0a 100644 --- a/Documentation/ABI/testing/sysfs-block-dm +++ b/Documentation/ABI/testing/sysfs-block-dm @@ -23,3 +23,25 @@ Description: Device-mapper device suspend state. Contains the value 1 while the device is suspended. Otherwise it contains 0. Read-only attribute. Users: util-linux, device-mapper udev rules + +What: /sys/block/dm-<num>/dm/rq_based_seq_io_merge_deadline +Date: March 2015 +KernelVersion: 4.1 +Contact: dm-devel@redhat.com +Description: Allow control over how long a request that is a + reasonable merge candidate can be queued on the request + queue. The resolution of this deadline is in + microseconds (ranging from 1 to 100000 usecs). + Setting this attribute to 0 (the default) will disable + request-based DM's merge heuristic and associated extra + accounting. This attribute is not applicable to + bio-based DM devices so it will only ever report 0 for + them. + +What: /sys/block/dm-<num>/dm/use_blk_mq +Date: March 2015 +KernelVersion: 4.1 +Contact: dm-devel@redhat.com +Description: Request-based Device-mapper blk-mq I/O path mode. + Contains the value 1 if the device is using blk-mq. + Otherwise it contains 0. Read-only attribute. diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt index ad697781f9ac..692171fe9da0 100644 --- a/Documentation/device-mapper/dm-crypt.txt +++ b/Documentation/device-mapper/dm-crypt.txt @@ -5,7 +5,7 @@ Device-Mapper's "crypt" target provides transparent encryption of block devices using the kernel crypto API. For a more detailed description of supported parameters see: -http://code.google.com/p/cryptsetup/wiki/DMCrypt +https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt Parameters: <cipher> <key> <iv_offset> <device path> \ <offset> [<#opt_params> <opt_params>] @@ -80,7 +80,7 @@ Example scripts =============== LUKS (Linux Unified Key Setup) is now the preferred way to set up disk encryption with dm-crypt using the 'cryptsetup' utility, see -http://code.google.com/p/cryptsetup/ +https://gitlab.com/cryptsetup/cryptsetup [[ #!/bin/sh diff --git a/Documentation/device-mapper/log-writes.txt b/Documentation/device-mapper/log-writes.txt new file mode 100644 index 000000000000..c10f30c9b534 --- /dev/null +++ b/Documentation/device-mapper/log-writes.txt @@ -0,0 +1,140 @@ +dm-log-writes +============= + +This target takes 2 devices, one to pass all IO to normally, and one to log all +of the write operations to. This is intended for file system developers wishing +to verify the integrity of metadata or data as the file system is written to. +There is a log_write_entry written for every WRITE request and the target is +able to take arbitrary data from userspace to insert into the log. The data +that is in the WRITE requests is copied into the log to make the replay happen +exactly as it happened originally. + +Log Ordering +============ + +We log things in order of completion once we are sure the write is no longer in +cache. This means that normal WRITE requests are not actually logged until the +next REQ_FLUSH request. This is to make it easier for userspace to replay the +log in a way that correlates to what is on disk and not what is in cache, to +make it easier to detect improper waiting/flushing. + +This works by attaching all WRITE requests to a list once the write completes. +Once we see a REQ_FLUSH request we splice this list onto the request and once +the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only +completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to +simulate the worst case scenario with regard to power failures. Consider the +following example (W means write, C means complete): + +W1,W2,W3,C3,C2,Wflush,C1,Cflush + +The log would show the following + +W3,W2,flush,W1.... + +Again this is to simulate what is actually on disk, this allows us to detect +cases where a power failure at a particular point in time would create an +inconsistent file system. + +Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as +they complete as those requests will obviously bypass the device cache. + +Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would +have all the DISCARD requests, and then the WRITE requests and then the FLUSH +request. Consider the following example: + +WRITE block 1, DISCARD block 1, FLUSH + +If we logged DISCARD when it completed, the replay would look like this + +DISCARD 1, WRITE 1, FLUSH + +which isn't quite what happened and wouldn't be caught during the log replay. + +Target interface +================ + +i) Constructor + + log-writes <dev_path> <log_dev_path> + + dev_path : Device that all of the IO will go to normally. + log_dev_path : Device where the log entries are written to. + +ii) Status + + <#logged entries> <highest allocated sector> + + #logged entries : Number of logged entries + highest allocated sector : Highest allocated sector + +iii) Messages + + mark <description> + + You can use a dmsetup message to set an arbitrary mark in a log. + For example say you want to fsck a file system after every + write, but first you need to replay up to the mkfs to make sure + we're fsck'ing something reasonable, you would do something like + this: + + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs + <run test> + + This would allow you to replay the log up to the mkfs mark and + then replay from that point on doing the fsck check in the + interval that you want. + + Every log has a mark at the end labeled "dm-log-writes-end". + +Userspace component +=================== + +There is a userspace tool that will replay the log for you in various ways. +It can be found here: https://github.com/josefbacik/log-writes + +Example usage +============= + +Say you want to test fsync on your file system. You would do something like +this: + +TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" +dmsetup create log --table "$TABLE" +mkfs.btrfs -f /dev/mapper/log +dmsetup message log 0 mark mkfs + +mount /dev/mapper/log /mnt/btrfs-test +<some test that does fsync at the end> +dmsetup message log 0 mark fsync +md5sum /mnt/btrfs-test/foo +umount /mnt/btrfs-test + +dmsetup remove log +replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync +mount /dev/sdb /mnt/btrfs-test +md5sum /mnt/btrfs-test/foo +<verify md5sum's are correct> + +Another option is to do a complicated file system operation and verify the file +system is consistent during the entire operation. You could do this with: + +TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" +dmsetup create log --table "$TABLE" +mkfs.btrfs -f /dev/mapper/log +dmsetup message log 0 mark mkfs + +mount /dev/mapper/log /mnt/btrfs-test +<fsstress to dirty the fs> +btrfs filesystem balance /mnt/btrfs-test +umount /mnt/btrfs-test +dmsetup remove log + +replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs +btrfsck /dev/sdb +replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ + --fsck "btrfsck /dev/sdb" --check fua + +And that will replay the log until it sees a FUA request, run the fsck command +and if the fsck passes it will replay to the next FUA, until it is completed or +the fsck command exists abnormally. diff --git a/Documentation/device-mapper/switch.txt b/Documentation/device-mapper/switch.txt index 8897d0494838..424835e57f27 100644 --- a/Documentation/device-mapper/switch.txt +++ b/Documentation/device-mapper/switch.txt @@ -47,8 +47,8 @@ consume far too much memory. Using this device-mapper switch target we can now build a two-layer device hierarchy: - Upper Tier – Determine which array member the I/O should be sent to. - Lower Tier – Load balance amongst paths to a particular member. + Upper Tier - Determine which array member the I/O should be sent to. + Lower Tier - Load balance amongst paths to a particular member. The lower tier consists of a single dm multipath device for each member. Each of these multipath devices contains the set of paths directly to diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt index 2f5173500bd9..4f67578b2954 100644 --- a/Documentation/device-mapper/thin-provisioning.txt +++ b/Documentation/device-mapper/thin-provisioning.txt @@ -380,9 +380,6 @@ then you'll have no access to blocks mapped beyond the end. If you load a target that is bigger than before, then extra blocks will be provisioned as and when needed. -If you wish to reduce the size of your thin device and potentially -regain some space then send the 'trim' message to the pool. - ii) Status <nr mapped sectors> <highest mapped sector> diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt index 9884681535ee..e15bc1a0fb98 100644 --- a/Documentation/device-mapper/verity.txt +++ b/Documentation/device-mapper/verity.txt @@ -11,6 +11,7 @@ Construction Parameters <data_block_size> <hash_block_size> <num_data_blocks> <hash_start_block> <algorithm> <digest> <salt> + [<#opt_params> <opt_params>] <version> This is the type of the on-disk hash format. @@ -62,6 +63,22 @@ Construction Parameters <salt> The hexadecimal encoding of the salt value. +<#opt_params> + Number of optional parameters. If there are no optional parameters, + the optional paramaters section can be skipped or #opt_params can be zero. + Otherwise #opt_params is the number of following arguments. + + Example of optional parameters section: + 1 ignore_corruption + +ignore_corruption + Log corrupted blocks, but allow read operations to proceed normally. + +restart_on_corruption + Restart the system when a corrupted block is discovered. This option is + not compatible with ignore_corruption and requires user space support to + avoid restart loops. + Theory of operation =================== @@ -125,7 +142,7 @@ block boundary) are the hash blocks which are stored a depth at a time The full specification of kernel parameters and on-disk metadata format is available at the cryptsetup project's wiki page - http://code.google.com/p/cryptsetup/wiki/DMVerity + https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity Status ====== @@ -142,7 +159,7 @@ Set up a device: A command line tool veritysetup is available to compute or verify the hash tree or activate the kernel device. This is available from -the cryptsetup upstream repository http://code.google.com/p/cryptsetup/ +the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/ (as a libcryptsetup extension). Create hash on the device: |