diff options
Diffstat (limited to 'Documentation/filesystems/iomap/operations.rst')
-rw-r--r-- | Documentation/filesystems/iomap/operations.rst | 713 |
1 files changed, 713 insertions, 0 deletions
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst new file mode 100644 index 000000000000..8e6c721d2330 --- /dev/null +++ b/Documentation/filesystems/iomap/operations.rst @@ -0,0 +1,713 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _iomap_operations: + +.. + Dumb style notes to maintain the author's sanity: + Please try to start sentences on separate lines so that + sentence changes don't bleed colors in diff. + Heading decorations are documented in sphinx.rst. + +========================= +Supported File Operations +========================= + +.. contents:: Table of Contents + :local: + +Below are a discussion of the high level file operations that iomap +implements. + +Buffered I/O +============ + +Buffered I/O is the default file I/O path in Linux. +File contents are cached in memory ("pagecache") to satisfy reads and +writes. +Dirty cache will be written back to disk at some point that can be +forced via ``fsync`` and variants. + +iomap implements nearly all the folio and pagecache management that +filesystems have to implement themselves under the legacy I/O model. +This means that the filesystem need not know the details of allocating, +mapping, managing uptodate and dirty state, or writeback of pagecache +folios. +Under the legacy I/O model, this was managed very inefficiently with +linked lists of buffer heads instead of the per-folio bitmaps that iomap +uses. +Unless the filesystem explicitly opts in to buffer heads, they will not +be used, which makes buffered I/O much more efficient, and the pagecache +maintainer much happier. + +``struct address_space_operations`` +----------------------------------- + +The following iomap functions can be referenced directly from the +address space operations structure: + + * ``iomap_dirty_folio`` + * ``iomap_release_folio`` + * ``iomap_invalidate_folio`` + * ``iomap_is_partially_uptodate`` + +The following address space operations can be wrapped easily: + + * ``read_folio`` + * ``readahead`` + * ``writepages`` + * ``bmap`` + * ``swap_activate`` + +``struct iomap_folio_ops`` +-------------------------- + +The ``->iomap_begin`` function for pagecache operations may set the +``struct iomap::folio_ops`` field to an ops structure to override +default behaviors of iomap: + +.. code-block:: c + + struct iomap_folio_ops { + struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos, + unsigned len); + void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied, + struct folio *folio); + bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap); + }; + +iomap calls these functions: + + - ``get_folio``: Called to allocate and return an active reference to + a locked folio prior to starting a write. + If this function is not provided, iomap will call + ``iomap_get_folio``. + This could be used to `set up per-folio filesystem state + <https://lore.kernel.org/all/20190429220934.10415-5-agruenba@redhat.com/>`_ + for a write. + + - ``put_folio``: Called to unlock and put a folio after a pagecache + operation completes. + If this function is not provided, iomap will ``folio_unlock`` and + ``folio_put`` on its own. + This could be used to `commit per-folio filesystem state + <https://lore.kernel.org/all/20180619164137.13720-6-hch@lst.de/>`_ + that was set up by ``->get_folio``. + + - ``iomap_valid``: The filesystem may not hold locks between + ``->iomap_begin`` and ``->iomap_end`` because pagecache operations + can take folio locks, fault on userspace pages, initiate writeback + for memory reclamation, or engage in other time-consuming actions. + If a file's space mapping data are mutable, it is possible that the + mapping for a particular pagecache folio can `change in the time it + takes + <https://lore.kernel.org/all/20221123055812.747923-8-david@fromorbit.com/>`_ + to allocate, install, and lock that folio. + + For the pagecache, races can happen if writeback doesn't take + ``i_rwsem`` or ``invalidate_lock`` and updates mapping information. + Races can also happen if the filesytem allows concurrent writes. + For such files, the mapping *must* be revalidated after the folio + lock has been taken so that iomap can manage the folio correctly. + + fsdax does not need this revalidation because there's no writeback + and no support for unwritten extents. + + Filesystems subject to this kind of race must provide a + ``->iomap_valid`` function to decide if the mapping is still valid. + If the mapping is not valid, the mapping will be sampled again. + + To support making the validity decision, the filesystem's + ``->iomap_begin`` function may set ``struct iomap::validity_cookie`` + at the same time that it populates the other iomap fields. + A simple validation cookie implementation is a sequence counter. + If the filesystem bumps the sequence counter every time it modifies + the inode's extent map, it can be placed in the ``struct + iomap::validity_cookie`` during ``->iomap_begin``. + If the value in the cookie is found to be different to the value + the filesystem holds when the mapping is passed back to + ``->iomap_valid``, then the iomap should considered stale and the + validation failed. + +These ``struct kiocb`` flags are significant for buffered I/O with iomap: + + * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``. + +Internal per-Folio State +------------------------ + +If the fsblock size matches the size of a pagecache folio, it is assumed +that all disk I/O operations will operate on the entire folio. +The uptodate (memory contents are at least as new as what's on disk) and +dirty (memory contents are newer than what's on disk) status of the +folio are all that's needed for this case. + +If the fsblock size is less than the size of a pagecache folio, iomap +tracks the per-fsblock uptodate and dirty state itself. +This enables iomap to handle both "bs < ps" `filesystems +<https://lore.kernel.org/all/20230725122932.144426-1-ritesh.list@gmail.com/>`_ +and large folios in the pagecache. + +iomap internally tracks two state bits per fsblock: + + * ``uptodate``: iomap will try to keep folios fully up to date. + If there are read(ahead) errors, those fsblocks will not be marked + uptodate. + The folio itself will be marked uptodate when all fsblocks within the + folio are uptodate. + + * ``dirty``: iomap will set the per-block dirty state when programs + write to the file. + The folio itself will be marked dirty when any fsblock within the + folio is dirty. + +iomap also tracks the amount of read and write disk IOs that are in +flight. +This structure is much lighter weight than ``struct buffer_head`` +because there is only one per folio, and the per-fsblock overhead is two +bits vs. 104 bytes. + +Filesystems wishing to turn on large folios in the pagecache should call +``mapping_set_large_folios`` when initializing the incore inode. + +Buffered Readahead and Reads +---------------------------- + +The ``iomap_readahead`` function initiates readahead to the pagecache. +The ``iomap_read_folio`` function reads one folio's worth of data into +the pagecache. +The ``flags`` argument to ``->iomap_begin`` will be set to zero. +The pagecache takes whatever locks it needs before calling the +filesystem. + +Buffered Writes +--------------- + +The ``iomap_file_buffered_write`` function writes an ``iocb`` to the +pagecache. +``IOMAP_WRITE`` or ``IOMAP_WRITE`` | ``IOMAP_NOWAIT`` will be passed as +the ``flags`` argument to ``->iomap_begin``. +Callers commonly take ``i_rwsem`` in either shared or exclusive mode +before calling this function. + +mmap Write Faults +~~~~~~~~~~~~~~~~~ + +The ``iomap_page_mkwrite`` function handles a write fault to a folio in +the pagecache. +``IOMAP_WRITE | IOMAP_FAULT`` will be passed as the ``flags`` argument +to ``->iomap_begin``. +Callers commonly take the mmap ``invalidate_lock`` in shared or +exclusive mode before calling this function. + +Buffered Write Failures +~~~~~~~~~~~~~~~~~~~~~~~ + +After a short write to the pagecache, the areas not written will not +become marked dirty. +The filesystem must arrange to `cancel +<https://lore.kernel.org/all/20221123055812.747923-6-david@fromorbit.com/>`_ +such `reservations +<https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/>`_ +because writeback will not consume the reservation. +The ``iomap_file_buffered_write_punch_delalloc`` can be called from a +``->iomap_end`` function to find all the clean areas of the folios +caching a fresh (``IOMAP_F_NEW``) delalloc mapping. +It takes the ``invalidate_lock``. + +The filesystem must supply a function ``punch`` to be called for +each file range in this state. +This function must *only* remove delayed allocation reservations, in +case another thread racing with the current thread writes successfully +to the same region and triggers writeback to flush the dirty data out to +disk. + +Zeroing for File Operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Filesystems can call ``iomap_zero_range`` to perform zeroing of the +pagecache for non-truncation file operations that are not aligned to +the fsblock size. +``IOMAP_ZERO`` will be passed as the ``flags`` argument to +``->iomap_begin``. +Callers typically hold ``i_rwsem`` and ``invalidate_lock`` in exclusive +mode before calling this function. + +Unsharing Reflinked File Data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Filesystems can call ``iomap_file_unshare`` to force a file sharing +storage with another file to preemptively copy the shared data to newly +allocate storage. +``IOMAP_WRITE | IOMAP_UNSHARE`` will be passed as the ``flags`` argument +to ``->iomap_begin``. +Callers typically hold ``i_rwsem`` and ``invalidate_lock`` in exclusive +mode before calling this function. + +Truncation +---------- + +Filesystems can call ``iomap_truncate_page`` to zero the bytes in the +pagecache from EOF to the end of the fsblock during a file truncation +operation. +``truncate_setsize`` or ``truncate_pagecache`` will take care of +everything after the EOF block. +``IOMAP_ZERO`` will be passed as the ``flags`` argument to +``->iomap_begin``. +Callers typically hold ``i_rwsem`` and ``invalidate_lock`` in exclusive +mode before calling this function. + +Pagecache Writeback +------------------- + +Filesystems can call ``iomap_writepages`` to respond to a request to +write dirty pagecache folios to disk. +The ``mapping`` and ``wbc`` parameters should be passed unchanged. +The ``wpc`` pointer should be allocated by the filesystem and must +be initialized to zero. + +The pagecache will lock each folio before trying to schedule it for +writeback. +It does not lock ``i_rwsem`` or ``invalidate_lock``. + +The dirty bit will be cleared for all folios run through the +``->map_blocks`` machinery described below even if the writeback fails. +This is to prevent dirty folio clots when storage devices fail; an +``-EIO`` is recorded for userspace to collect via ``fsync``. + +The ``ops`` structure must be specified and is as follows: + +``struct iomap_writeback_ops`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: c + + struct iomap_writeback_ops { + int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, + loff_t offset, unsigned len); + int (*prepare_ioend)(struct iomap_ioend *ioend, int status); + void (*discard_folio)(struct folio *folio, loff_t pos); + }; + +The fields are as follows: + + - ``map_blocks``: Sets ``wpc->iomap`` to the space mapping of the file + range (in bytes) given by ``offset`` and ``len``. + iomap calls this function for each dirty fs block in each dirty folio, + though it will `reuse mappings + <https://lore.kernel.org/all/20231207072710.176093-15-hch@lst.de/>`_ + for runs of contiguous dirty fsblocks within a folio. + Do not return ``IOMAP_INLINE`` mappings here; the ``->iomap_end`` + function must deal with persisting written data. + Do not return ``IOMAP_DELALLOC`` mappings here; iomap currently + requires mapping to allocated space. + Filesystems can skip a potentially expensive mapping lookup if the + mappings have not changed. + This revalidation must be open-coded by the filesystem; it is + unclear if ``iomap::validity_cookie`` can be reused for this + purpose. + This function must be supplied by the filesystem. + + - ``prepare_ioend``: Enables filesystems to transform the writeback + ioend or perform any other preparatory work before the writeback I/O + is submitted. + This might include pre-write space accounting updates, or installing + a custom ``->bi_end_io`` function for internal purposes, such as + deferring the ioend completion to a workqueue to run metadata update + transactions from process context. + This function is optional. + + - ``discard_folio``: iomap calls this function after ``->map_blocks`` + fails to schedule I/O for any part of a dirty folio. + The function should throw away any reservations that may have been + made for the write. + The folio will be marked clean and an ``-EIO`` recorded in the + pagecache. + Filesystems can use this callback to `remove + <https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/>`_ + delalloc reservations to avoid having delalloc reservations for + clean pagecache. + This function is optional. + +Pagecache Writeback Completion +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To handle the bookkeeping that must happen after disk I/O for writeback +completes, iomap creates chains of ``struct iomap_ioend`` objects that +wrap the ``bio`` that is used to write pagecache data to disk. +By default, iomap finishes writeback ioends by clearing the writeback +bit on the folios attached to the ``ioend``. +If the write failed, it will also set the error bits on the folios and +the address space. +This can happen in interrupt or process context, depending on the +storage device. + +Filesystems that need to update internal bookkeeping (e.g. unwritten +extent conversions) should provide a ``->prepare_ioend`` function to +set ``struct iomap_end::bio::bi_end_io`` to its own function. +This function should call ``iomap_finish_ioends`` after finishing its +own work (e.g. unwritten extent conversion). + +Some filesystems may wish to `amortize the cost of running metadata +transactions +<https://lore.kernel.org/all/20220120034733.221737-1-david@fromorbit.com/>`_ +for post-writeback updates by batching them. +They may also require transactions to run from process context, which +implies punting batches to a workqueue. +iomap ioends contain a ``list_head`` to enable batching. + +Given a batch of ioends, iomap has a few helpers to assist with +amortization: + + * ``iomap_sort_ioends``: Sort all the ioends in the list by file + offset. + + * ``iomap_ioend_try_merge``: Given an ioend that is not in any list and + a separate list of sorted ioends, merge as many of the ioends from + the head of the list into the given ioend. + ioends can only be merged if the file range and storage addresses are + contiguous; the unwritten and shared status are the same; and the + write I/O outcome is the same. + The merged ioends become their own list. + + * ``iomap_finish_ioends``: Finish an ioend that possibly has other + ioends linked to it. + +Direct I/O +========== + +In Linux, direct I/O is defined as file I/O that is issued directly to +storage, bypassing the pagecache. +The ``iomap_dio_rw`` function implements O_DIRECT (direct I/O) reads and +writes for files. + +.. code-block:: c + + ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops, + const struct iomap_dio_ops *dops, + unsigned int dio_flags, void *private, + size_t done_before); + +The filesystem can provide the ``dops`` parameter if it needs to perform +extra work before or after the I/O is issued to storage. +The ``done_before`` parameter tells the how much of the request has +already been transferred. +It is used to continue a request asynchronously when `part of the +request +<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c03098d4b9ad76bca2966a8769dcfe59f7f85103>`_ +has already been completed synchronously. + +The ``done_before`` parameter should be set if writes for the ``iocb`` +have been initiated prior to the call. +The direction of the I/O is determined from the ``iocb`` passed in. + +The ``dio_flags`` argument can be set to any combination of the +following values: + + * ``IOMAP_DIO_FORCE_WAIT``: Wait for the I/O to complete even if the + kiocb is not synchronous. + + * ``IOMAP_DIO_OVERWRITE_ONLY``: Perform a pure overwrite for this range + or fail with ``-EAGAIN``. + This can be used by filesystems with complex unaligned I/O + write paths to provide an optimised fast path for unaligned writes. + If a pure overwrite can be performed, then serialisation against + other I/Os to the same filesystem block(s) is unnecessary as there is + no risk of stale data exposure or data loss. + If a pure overwrite cannot be performed, then the filesystem can + perform the serialisation steps needed to provide exclusive access + to the unaligned I/O range so that it can perform allocation and + sub-block zeroing safely. + Filesystems can use this flag to try to reduce locking contention, + but a lot of `detailed checking + <https://lore.kernel.org/linux-ext4/20230314130759.642710-1-bfoster@redhat.com/>`_ + is required to do it `correctly + <https://lore.kernel.org/linux-ext4/20230810165559.946222-1-bfoster@redhat.com/>`_. + + * ``IOMAP_DIO_PARTIAL``: If a page fault occurs, return whatever + progress has already been made. + The caller may deal with the page fault and retry the operation. + If the caller decides to retry the operation, it should pass the + accumulated return values of all previous calls as the + ``done_before`` parameter to the next call. + +These ``struct kiocb`` flags are significant for direct I/O with iomap: + + * ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``. + + * ``IOCB_SYNC``: Ensure that the device has persisted data to disk + before completing the call. + In the case of pure overwrites, the I/O may be issued with FUA + enabled. + + * ``IOCB_HIPRI``: Poll for I/O completion instead of waiting for an + interrupt. + Only meaningful for asynchronous I/O, and only if the entire I/O can + be issued as a single ``struct bio``. + + * ``IOCB_DIO_CALLER_COMP``: Try to run I/O completion from the caller's + process context. + See ``linux/fs.h`` for more details. + +Filesystems should call ``iomap_dio_rw`` from ``->read_iter`` and +``->write_iter``, and set ``FMODE_CAN_ODIRECT`` in the ``->open`` +function for the file. +They should not set ``->direct_IO``, which is deprecated. + +If a filesystem wishes to perform its own work before direct I/O +completion, it should call ``__iomap_dio_rw``. +If its return value is not an error pointer or a NULL pointer, the +filesystem should pass the return value to ``iomap_dio_complete`` after +finishing its internal work. + +Return Values +------------- + +``iomap_dio_rw`` can return one of the following: + + * A non-negative number of bytes transferred. + + * ``-ENOTBLK``: Fall back to buffered I/O. + iomap itself will return this value if it cannot invalidate the page + cache before issuing the I/O to storage. + The ``->iomap_begin`` or ``->iomap_end`` functions may also return + this value. + + * ``-EIOCBQUEUED``: The asynchronous direct I/O request has been + queued and will be completed separately. + + * Any of the other negative error codes. + +Direct Reads +------------ + +A direct I/O read initiates a read I/O from the storage device to the +caller's buffer. +Dirty parts of the pagecache are flushed to storage before initiating +the read io. +The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DIRECT`` with +any combination of the following enhancements: + + * ``IOMAP_NOWAIT``, as defined previously. + +Callers commonly hold ``i_rwsem`` in shared mode before calling this +function. + +Direct Writes +------------- + +A direct I/O write initiates a write I/O to the storage device from the +caller's buffer. +Dirty parts of the pagecache are flushed to storage before initiating +the write io. +The pagecache is invalidated both before and after the write io. +The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DIRECT | +IOMAP_WRITE`` with any combination of the following enhancements: + + * ``IOMAP_NOWAIT``, as defined previously. + + * ``IOMAP_OVERWRITE_ONLY``: Allocating blocks and zeroing partial + blocks is not allowed. + The entire file range must map to a single written or unwritten + extent. + The file I/O range must be aligned to the filesystem block size + if the mapping is unwritten and the filesystem cannot handle zeroing + the unaligned regions without exposing stale contents. + +Callers commonly hold ``i_rwsem`` in shared or exclusive mode before +calling this function. + +``struct iomap_dio_ops:`` +------------------------- +.. code-block:: c + + struct iomap_dio_ops { + void (*submit_io)(const struct iomap_iter *iter, struct bio *bio, + loff_t file_offset); + int (*end_io)(struct kiocb *iocb, ssize_t size, int error, + unsigned flags); + struct bio_set *bio_set; + }; + +The fields of this structure are as follows: + + - ``submit_io``: iomap calls this function when it has constructed a + ``struct bio`` object for the I/O requested, and wishes to submit it + to the block device. + If no function is provided, ``submit_bio`` will be called directly. + Filesystems that would like to perform additional work before (e.g. + data replication for btrfs) should implement this function. + + - ``end_io``: This is called after the ``struct bio`` completes. + This function should perform post-write conversions of unwritten + extent mappings, handle write failures, etc. + The ``flags`` argument may be set to a combination of the following: + + * ``IOMAP_DIO_UNWRITTEN``: The mapping was unwritten, so the ioend + should mark the extent as written. + + * ``IOMAP_DIO_COW``: Writing to the space in the mapping required a + copy on write operation, so the ioend should switch mappings. + + - ``bio_set``: This allows the filesystem to provide a custom bio_set + for allocating direct I/O bios. + This enables filesystems to `stash additional per-bio information + <https://lore.kernel.org/all/20220505201115.937837-3-hch@lst.de/>`_ + for private use. + If this field is NULL, generic ``struct bio`` objects will be used. + +Filesystems that want to perform extra work after an I/O completion +should set a custom ``->bi_end_io`` function via ``->submit_io``. +Afterwards, the custom endio function must call +``iomap_dio_bio_end_io`` to finish the direct I/O. + +DAX I/O +======= + +Some storage devices can be directly mapped as memory. +These devices support a new access mode known as "fsdax" that allows +loads and stores through the CPU and memory controller. + +fsdax Reads +----------- + +A fsdax read performs a memcpy from storage device to the caller's +buffer. +The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DAX`` with any +combination of the following enhancements: + + * ``IOMAP_NOWAIT``, as defined previously. + +Callers commonly hold ``i_rwsem`` in shared mode before calling this +function. + +fsdax Writes +------------ + +A fsdax write initiates a memcpy to the storage device from the caller's +buffer. +The ``flags`` value for ``->iomap_begin`` will be ``IOMAP_DAX | +IOMAP_WRITE`` with any combination of the following enhancements: + + * ``IOMAP_NOWAIT``, as defined previously. + + * ``IOMAP_OVERWRITE_ONLY``: The caller requires a pure overwrite to be + performed from this mapping. + This requires the filesystem extent mapping to already exist as an + ``IOMAP_MAPPED`` type and span the entire range of the write I/O + request. + If the filesystem cannot map this request in a way that allows the + iomap infrastructure to perform a pure overwrite, it must fail the + mapping operation with ``-EAGAIN``. + +Callers commonly hold ``i_rwsem`` in exclusive mode before calling this +function. + +fsdax mmap Faults +~~~~~~~~~~~~~~~~~ + +The ``dax_iomap_fault`` function handles read and write faults to fsdax +storage. +For a read fault, ``IOMAP_DAX | IOMAP_FAULT`` will be passed as the +``flags`` argument to ``->iomap_begin``. +For a write fault, ``IOMAP_DAX | IOMAP_FAULT | IOMAP_WRITE`` will be +passed as the ``flags`` argument to ``->iomap_begin``. + +Callers commonly hold the same locks as they do to call their iomap +pagecache counterparts. + +fsdax Truncation, fallocate, and Unsharing +------------------------------------------ + +For fsdax files, the following functions are provided to replace their +iomap pagecache I/O counterparts. +The ``flags`` argument to ``->iomap_begin`` are the same as the +pagecache counterparts, with ``IOMAP_DAX`` added. + + * ``dax_file_unshare`` + * ``dax_zero_range`` + * ``dax_truncate_page`` + +Callers commonly hold the same locks as they do to call their iomap +pagecache counterparts. + +fsdax Deduplication +------------------- + +Filesystems implementing the ``FIDEDUPERANGE`` ioctl must call the +``dax_remap_file_range_prep`` function with their own iomap read ops. + +Seeking Files +============= + +iomap implements the two iterating whence modes of the ``llseek`` system +call. + +SEEK_DATA +--------- + +The ``iomap_seek_data`` function implements the SEEK_DATA "whence" value +for llseek. +``IOMAP_REPORT`` will be passed as the ``flags`` argument to +``->iomap_begin``. + +For unwritten mappings, the pagecache will be searched. +Regions of the pagecache with a folio mapped and uptodate fsblocks +within those folios will be reported as data areas. + +Callers commonly hold ``i_rwsem`` in shared mode before calling this +function. + +SEEK_HOLE +--------- + +The ``iomap_seek_hole`` function implements the SEEK_HOLE "whence" value +for llseek. +``IOMAP_REPORT`` will be passed as the ``flags`` argument to +``->iomap_begin``. + +For unwritten mappings, the pagecache will be searched. +Regions of the pagecache with no folio mapped, or a !uptodate fsblock +within a folio will be reported as sparse hole areas. + +Callers commonly hold ``i_rwsem`` in shared mode before calling this +function. + +Swap File Activation +==================== + +The ``iomap_swapfile_activate`` function finds all the base-page aligned +regions in a file and sets them up as swap space. +The file will be ``fsync()``'d before activation. +``IOMAP_REPORT`` will be passed as the ``flags`` argument to +``->iomap_begin``. +All mappings must be mapped or unwritten; cannot be dirty or shared, and +cannot span multiple block devices. +Callers must hold ``i_rwsem`` in exclusive mode; this is already +provided by ``swapon``. + +File Space Mapping Reporting +============================ + +iomap implements two of the file space mapping system calls. + +FS_IOC_FIEMAP +------------- + +The ``iomap_fiemap`` function exports file extent mappings to userspace +in the format specified by the ``FS_IOC_FIEMAP`` ioctl. +``IOMAP_REPORT`` will be passed as the ``flags`` argument to +``->iomap_begin``. +Callers commonly hold ``i_rwsem`` in shared mode before calling this +function. + +FIBMAP (deprecated) +------------------- + +``iomap_bmap`` implements FIBMAP. +The calling conventions are the same as for FIEMAP. +This function is only provided to maintain compatibility for filesystems +that implemented FIBMAP prior to conversion. +This ioctl is deprecated; do **not** add a FIBMAP implementation to +filesystems that do not have it. +Callers should probably hold ``i_rwsem`` in shared mode before calling +this function, but this is unclear. |