summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-01mm: return an unsigned int from __do_page_cache_readaheadChristoph Hellwig
We never return an error, so switch to returning an unsigned int. Most callers already did implicit casts to an unsigned type, and the one that didn't can be simplified now. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01mm: give the 'ret' variable a better name __do_page_cache_readaheadChristoph Hellwig
It counts the number of pages acted on, so name it nr_pages to make that obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01block: add a lower-level bio_add_page interfaceChristoph Hellwig
For the upcoming removal of buffer heads in XFS we need to keep track of the number of outstanding writeback requests per page. For this we need to know if bio_add_page merged a region with the previous bvec or not. Instead of adding additional arguments this refactors bio_add_page to be implemented using three lower level helpers which users like XFS can use directly if they care about the merge decisions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01xfs: fix error handling in xfs_refcount_insert()Dave Chinner
generic/475 fired an assert failure just after the filesystem was shut down: XFS: Assertion failed: fs_is_ok, file: fs/xfs/libxfs/xfs_refcount.c, line: 182 ..... Call Trace: xfs_refcount_insert+0x151/0x190 xfs_refcount_adjust_extents.constprop.11+0x9c/0x470 xfs_refcount_adjust.constprop.10+0xb0/0x270 xfs_refcount_finish_one+0x25a/0x420 xfs_trans_log_finish_refcount_update+0x2a/0x40 xfs_refcount_update_finish_item+0x35/0xa0 xfs_defer_finish+0x15e/0x4d0 xfs_reflink_remap_extent+0x1bc/0x610 xfs_reflink_remap_blocks+0x6e/0x280 xfs_reflink_remap_range+0x311/0x530 vfs_clone_file_range+0x119/0x200 .... If xfs_btree_insert() returns an error, the corruption check fires instead of passing the error back the caller. The corruption check should be after we've checked for an error, not before, thereby avoiding assert failures if the filesystem shuts down during a refcount btree record insert. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01xfs: fix xfs_rtalloc_rec unitsDarrick J. Wong
All the realtime allocation functions deal with space on the rtdev in units of realtime extents. However, struct xfs_rtalloc_rec confusingly uses the word 'block' in the name, even though they're really extents. Fix the naming problem and fix all the unit handling problems in the two existing users. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-06-01xfs: strengthen rtalloc query range checksDarrick J. Wong
Strengthen the rtalloc range query checks to make sure that the keys do not run off the end of the realtime device inappropriately. Note that the query range functions require units of rt extents, not blocks, despite the type name. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-06-01xfs: xfs_rtbuf_get should check the bmapi_read resultsDarrick J. Wong
The xfs_rtbuf_get function should check the block mapping it gets back from bmapi_read. If there are no mappings or the mapping isn't a real extent, we should return -EFSCORRUPTED rather than trying to read a garbage value. We also require realtime bitmap blocks to be real, written allocations. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-06-01xfs: xfs_rtword_t should be unsigned, not signedDarrick J. Wong
xfs_rtword_t is used for bit manipulations in the realtime bitmap file. Since we're performing bit shifts with this type, we don't want sign extension and we don't want to be left shifting negative quantities because that's undefined behavior. This also shuts up these UBSAN warnings: UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_rtbitmap.c:833:48 signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-05-31dax: change bdev_dax_supported() to support boolean returnsDave Jiang
The function return values are confusing with the way the function is named. We expect a true or false return value but it actually returns 0/-errno. This makes the code very confusing. Changing the return values to return a bool where if DAX is supported then return true and no DAX support returns false. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-31fs: allow per-device dax status checking for filesystemsDarrick J. Wong
Change bdev_dax_supported so it takes a bdev parameter. This enables multi-device filesystems like xfs to check that a dax device can work for the particular filesystem. Once that's in place, actually fix all the parts of XFS where we need to be able to distinguish between datadev and rtdev. This patch fixes the problem where we screw up the dax support checking in xfs if the datadev and rtdev have different dax capabilities. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> [rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases] Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-05-30xfs: repair superblocksDarrick J. Wong
If one of the backup superblocks is found to differ seriously from superblock 0, write out a fresh copy from the in-core sb. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to attach quotas to inodesDarrick J. Wong
Add a helper routine to attach quota information to inodes that are about to undergo repair. If that fails, we need to schedule a quotacheck for the next mount but allow the corrupted metadata repair to continue. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: recover AG btree roots from rmap dataDarrick J. Wong
Add a helper function to help us recover btree roots from the rmap data. Callers pass in a list of rmap owner codes, buffer ops, and magic numbers. We iterate the rmap records looking for owner matches, and then read the matching blocks to see if the magic number & uuid match. If so, we then read-verify the block, and if that passes then we retain a pointer to the block with the highest level, assuming that by the end of the call we will have found the root. This will be used to reset the AGF/AGI btree root fields during their rebuild procedures. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to dispose of old btree blocks after a repairDarrick J. Wong
Now that we've plumbed in the ability to construct a list of dead btree blocks following a repair, add more helpers to dispose of them. This is done by examining the rmapbt -- if the btree was the only owner we can free the block, otherwise it's crosslinked and we can only remove the rmapbt record. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to collect and sift btree block pointers during repairDarrick J. Wong
Add some helpers to assemble a list of fs block extents. Generally, repair functions will iterate the rmapbt to make a list (1) of all extents owned by the nominal owner of the metadata structure; then they will iterate all other structures with the same rmap owner to make a list (2) of active blocks; and finally we have a subtraction function to subtract all the blocks in (2) from (1), with the result that (1) is now a list of blocks that were owned by the old btree and must be disposed. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-30xfs: add helpers to allocate and initialize fresh btree rootsDarrick J. Wong
Add a pair of helper functions to allocate and initialize fresh btree roots. The repair functions will use these as part of recreating corrupted metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-05-30xfs: add helpers to deal with transaction allocation and rollingDarrick J. Wong
For repairs, we need to reserve at least as many blocks as we think we're going to need to rebuild the data structure, and we're going to need some helpers to roll transactions while maintaining locks on the AG headers so that other threads cannot wander into the middle of a repair. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2018-05-30xfs: grab the per-ag structure whenever relevantDarrick J. Wong
Grab and hold the per-AG data across a scrub run whenever relevant. This helps us avoid repeated trips through rcu and the radix tree in the repair code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-29fs: xfs: Change return type to vm_fault_tSouptick Joarder
Use new return type vm_fault_t for fault handlers. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-29xfs: fix inobt magic number checkDarrick J. Wong
In commit a6a781a58befcbd467c ("xfs: have buffer verifier functions report failing address") the bad magic number return was ported incorrectly. Fixes: a6a781a58befcbd467ce843af4eaca3906aa1f08 Reported-by: syzbot+08ab33be0178b76851c8@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-05-29fs: clear writeback errors in inode_init_alwaysDarrick J. Wong
In inode_init_always(), we clear the inode mapping flags, which clears any retained error (AS_EIO, AS_ENOSPC) bits. Unfortunately, we do not also clear wb_err, which means that old mapping errors can leak through to new inodes. This is crucial for the XFS inode allocation path because we recycle old in-core inodes and we do not want error state from an old file to leak into the new file. This bug was discovered by running generic/036 and generic/047 in a loop and noticing that the EIOs generated by the collision of direct and buffered writes in generic/036 would survive the remount between 036 and 047, and get reported to the fsyncs (on different files!) in generic/047. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-17iomap: don't allow holes in swapfilesOmar Sandoval
generic_swapfile_activate() doesn't allow holes, so we should be consistent here. This is also a bit safer: if the user creates a swapfile with, say, truncate -s $SIZE followed by mkswap, they should really get an error and not much less swap space than they expected. swapon(8) will error out before calling swapon(2) if the file has holes, anyways. Fixes: 9d93388b0afe ("iomap: add a swapfile activation function") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-16iomap: provide more useful errors for invalid swap filesOmar Sandoval
Currently, for an invalid swap file, we print the same error message regardless of the reason. This isn't very useful for an admin, who will likely want to know why exactly they can't use their swap file. So, let's add specific error messages for each reason, and also move the bdev check after the flags checks, since the latter are more fundamental. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-16xfs: implement online get/set fs labelEric Sandeen
The GET ioctl is trivial, just return the current label. The SET ioctl is more involved: It transactionally modifies the superblock to write a new filesystem label to the primary super. A new variant of xfs_sync_sb then writes the superblock buffer immediately to disk so that the change is visible from userspace. It then invalidates any page cache that userspace might have previously read on the block device so that i.e. blkid can see the change immediately, and updates all secondary superblocks as userspace relable does. Signed-off-by: Eric Sandeen <sandeen@redhat.com> [darrick: use dchinner's new xfs_update_secondary_sbs function] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-16fs: copy BTRFS_IOC_[SG]ET_FSLABEL to vfsEric Sandeen
This retains 256 chars as the maximum size through the interface, which is the btrfs limit and AFAIK exceeds any other filesystem's maximum label size. This just copies the ioctl for now and leaves it in place for btrfs for the time being. A later patch will allow btrfs to use the new common ioctl definition, but it may be sent after this is merged. (Note, Reviewed-by's were originally given for the combined vfs+btrfs patch, some license taken here.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: factor the ag length extension code into libxfsDave Chinner
Growfs currently manually codes the extension of the last AG in a filesytem during the growfs process. Factor that out of the growfs code and move it into libxfs along with teh rest of the AG header modification code. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: move growfs core to libxfsDave Chinner
So it can be shared with userspace (e.g. mkfs) easily. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: rework secondary superblock updates in growfsDave Chinner
Right now we wait until we've committed changes to the primary superblock before we initialise any of the new secondary superblocks. This means that if we have any write errors for new secondary superblocks we end up with garbage in place rather than zeros or even an "in progress" superblock to indicate a grow operation is being done. To ensure we can write the secondary superblocks, initialise them earlier in the same loop that initialises the AG headers. We stamp the new secondary superblocks here with the old geometry, but set the "sb_inprogress" field to indicate that updates are being done to the superblock so they cannot be used. This will result in the secondary superblock fields being updated or triggering errors that will abort the grow before we commit any permanent changes. This also means we can change the update mechanism of the secondary superblocks. We know that we are going to wholly overwrite the information in the struct xfs_sb in the buffer, so there's no point reading it from disk. Just allocate an uncached buffer, zero it in memory, stamp the new superblock structure in it and write it out. If we fail to write it out, then we'll leave the existing sb (old or new w/ inprogress) on disk for repair to deal with later. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: separate secondary sb update in growfsDave Chinner
This happens after all the transactions to update the superblock occur, and errors need to be handled slightly differently. Seperate out the code into it's own function, and clean up the error goto stack in the core growfs code as it is now much simpler. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: make imaxpct changes in growfs separateDave Chinner
When growfs changes the imaxpct value of the filesystem, it runs through all the "change size" growfs code, whether it needs to or not. Separate out changing imaxpct into it's own function and transaction to simplify the rest of the growfs code. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: turn ag header initialisation into a table driven operationDave Chinner
There's still more cookie cutter code in setting up each AG header. Separate all the variables into a simple structure and iterate a table of header definitions to initialise everything. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: factor ag btree root block initialisationDave Chinner
Cookie cutter code, easily factored. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: convert growfs AG header init to use buffer listsDave Chinner
We currently write all new AG headers synchronously, which can be slow for large grow operations. All we really need to do is ensure all the headers are on disk before we run the growfs transaction, so convert this to a buffer list and a delayed write operation. We block waiting for the delayed write buffer submission to complete, so this will fulfill the requirement to have all the buffers written correctly before proceeding. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: factor out AG header initialisation from growfs coreDave Chinner
The intialisation of new AG headers is mostly common with the userspace mkfs code and growfs in the kernel, so start factoring it out so we can move it to libxfs and use it in both places. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: one-shot cached buffersDave Chinner
For the new growfs work, we want to ensure that we serialise secondary superblock updates with other operations (e.g. scrub) correctly, but we don't want to cache the buffers for long term reuse. We need cached buffers for serialisation, however. To solve this, introduce a "oneshot" buffer which will be marshalled through the cache but then released once the last current reference goes away. If the buffer is already cached, then we ignore the "one-shot" behaviour and leave the buffer in the state it was prior to the one-shot command being run. This means we don't perturb either the working set or existing cached buffer state by a one-shot operation. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: implement the metadata repair ioctl flagDarrick J. Wong
Plumb in the pieces necessary to make the "scrub" subfunction of the scrub ioctl actually work. This means that we make the IFLAG_REPAIR flag to the scrub ioctl actually do something, and we add an errortag knob so that xfstests can force the kernel to rebuild a metadata structure even if there's nothing wrong with it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: create tracepoints for online repairDarrick J. Wong
These tracepoints will be used to debug the online repair routines. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: teach xfs_bmapi_remap to accept some bmapi flagsDarrick J. Wong
Teach xfs_bmapi_remap how to map in unwritten extent and to skip rmap updates. This enables us to rebuild real and unwritten extents from the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: make xfs_bmapi_remapi work with attribute forksDarrick J. Wong
Add a new flags argument to xfs_bmapi_remapi so that we can pass BMAPI flags into the function. This enables us to pass in BMAPI_ATTRFORK so that we can remap things into the attribute fork. Eventually the online repair code will use this to rebuild attribute forks, so make it non-static. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: hoist xfs_scrub_agfl_walk to libxfs as xfs_agfl_walkDarrick J. Wong
This function is basically a generic AGFL block iterator, so promote it to libxfs ahead of online repair wanting to use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: avoid ABBA deadlock when scrubbing parent pointersDarrick J. Wong
In normal operation, the XFS convention is to take an inode's iolock and then allocate a transaction. However, when scrubbing parent inodes this is inverted -- we allocated the transaction to do the scrub, and now we're trying to grab the parent's iolock. This can lead to ABBA deadlocks: some thread grabbed the parent's iolock and is waiting for space for a transaction while our parent scrubber is sitting on a transaction trying to get the parent's iolock. Therefore, convert all iolock attempts to use trylock; if that fails, they can use the existing mechanisms to back off and try again. The ABBA deadlock didn't happen with a non-repair scrub because the transactions don't reserve any space, but repair scrubs require reservation in order to update metadata. However, any other concurrent metadata update (e.g. directory create in the parent) could also induce this deadlock with the parent scrubber. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: scrub the data fork of the realtime inodesDarrick J. Wong
The realtime bitmap and summary inodes live on the metadata device, so we can scrub their data forks with the regular scrubbers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: quota scrub should use bmapbtd scrubberDarrick J. Wong
Replace the quota scrubber's open-coded data fork scrubber with a redirected call to the bmapbtd scrubber. This strengthens the quota scrub to include all the cross-referencing that it does. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: don't continue scrub if already corruptDarrick J. Wong
If we've already decided that something is corrupt, we might as well abort all the loops and exit as quickly as possible. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: refactor quota limits initializationDarrick J. Wong
Replace all the if (!error) weirdness with helper functions that follow our regular coding practices, and factor out the ternary expression soup. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: superblock scrub should use short-lived buffersDarrick J. Wong
Secondary superblocks are rarely used, so create a helper to read a given non-primary AG's superblock and ensure that it won't stick around hogging memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: skip scrub xref if corruption already notedDarrick J. Wong
Don't bother looking for cross-referencing problems if the metadata is already corrupt or we've already found a cross-referencing problem. Since we added a helper function for flags testing, convert existing users to use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: clear sb->s_fs_info on mount failureDave Chinner
We recently had an oops reported on a 4.14 kernel in xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage and so the m_perag_tree lookup walked into lala land. Essentially, the machine was under memory pressure when the mount was being run, xfs_fs_fill_super() failed after allocating the xfs_mount and attaching it to sb->s_fs_info. It then cleaned up and freed the xfs_mount, but the sb->s_fs_info field still pointed to the freed memory. Hence when the superblock shrinker then ran it fell off the bad pointer. With the superblock shrinker problem fixed at teh VFS level, this stale s_fs_info pointer is still a problem - we use it unconditionally in ->put_super when the superblock is being torn down, and hence we can still trip over it after a ->fill_super call failure. Hence we need to clear s_fs_info if xfs-fs_fill_super() fails, and we need to check if it's valid in the places it can potentially be dereferenced after a ->fill_super failure. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: add mount delay debug optionDave Chinner
Similar to log_recovery_delay, this delay occurs between the VFS superblock being initialised and the xfs_mount being fully initialised. It also poisons the per-ag radix tree node so that it can be used for triggering shrinker races during mount such as the following: <run memory pressure workload in background> $ cat dirty-mount.sh #! /bin/bash umount -f /dev/pmem0 mkfs.xfs -f /dev/pmem0 mount /dev/pmem0 /mnt/test rm -f /mnt/test/foo xfs_io -fxc "pwrite 0 4k" -c fsync -c "shutdown" /mnt/test/foo umount /dev/pmem0 # let's crash it now! echo 30 > /sys/fs/xfs/debug/mount_delay mount /dev/pmem0 /mnt/test echo 0 > /sys/fs/xfs/debug/mount_delay umount /dev/pmem0 $ sudo ./dirty-mount.sh ..... [ 60.378118] CPU: 3 PID: 3577 Comm: fs_mark Tainted: G D W 4.16.0-rc5-dgc #440 [ 60.378120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 60.378124] RIP: 0010:radix_tree_next_chunk+0x76/0x320 [ 60.378127] RSP: 0018:ffffc9000276f4f8 EFLAGS: 00010282 [ 60.383670] RAX: a5a5a5a5a5a5a5a4 RBX: 0000000000000010 RCX: 000000000000001a [ 60.385277] RDX: 0000000000000000 RSI: ffffc9000276f540 RDI: 0000000000000000 [ 60.386554] RBP: 0000000000000000 R08: 0000000000000000 R09: a5a5a5a5a5a5a5a5 [ 60.388194] R10: 0000000000000006 R11: 0000000000000001 R12: ffffc9000276f598 [ 60.389288] R13: 0000000000000040 R14: 0000000000000228 R15: ffff880816cd6458 [ 60.390827] FS: 00007f5c124b9740(0000) GS:ffff88083fc00000(0000) knlGS:0000000000000000 [ 60.392253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 60.393423] CR2: 00007f5c11bba0b8 CR3: 000000035580e001 CR4: 00000000000606e0 [ 60.394519] Call Trace: [ 60.395252] radix_tree_gang_lookup_tag+0xc4/0x130 [ 60.395948] xfs_perag_get_tag+0x37/0xf0 [ 60.396522] xfs_reclaim_inodes_count+0x32/0x40 [ 60.397178] xfs_fs_nr_cached_objects+0x11/0x20 [ 60.397837] super_cache_count+0x35/0xc0 [ 60.399159] shrink_slab.part.66+0xb1/0x370 [ 60.400194] shrink_node+0x7e/0x1a0 [ 60.401058] try_to_free_pages+0x199/0x470 [ 60.402081] __alloc_pages_slowpath+0x3a1/0xd20 [ 60.403729] __alloc_pages_nodemask+0x1c3/0x200 [ 60.404941] cache_grow_begin+0x20b/0x2e0 [ 60.406164] fallback_alloc+0x160/0x200 [ 60.407088] kmem_cache_alloc+0x111/0x4e0 [ 60.408038] ? xfs_buf_rele+0x61/0x430 [ 60.408925] kmem_zone_alloc+0x61/0xe0 [ 60.409965] xfs_inode_alloc+0x24/0x1d0 ..... Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: factor out nodiscard helpersBrian Foster
The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>