summaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)Author
2019-07-31gfs2: Inode dirtying fixAndreas Gruenbacher
With the recent iomap write page reclaim deadlock fix, it turns out that the GLF_DIRTY flag isn't always set when it needs to be anymore: previously, this happened as a side effect of always adding the inode buffer head to the current transaction with gfs2_trans_add_meta, but this isn't happening consistently anymore. Fix by removing an additional unnecessary gfs2_trans_add_meta call and by setting the GLF_DIRTY flag in gfs2_iomap_end. (The GLF_DIRTY flag causes inode_go_sync to flush the transaction log when syncing out the glock of that inode. When the flag isn't set, inode_go_sync will skip inodes, including ones with an i_state of I_DIRTY_PAGES, which will lead to cluster incoherency.) In addition, in gfs2_iomap_page_done, if the metadata has changed, mark the inode as I_DIRTY_DATASYNC to have the inode added to the current transaction: we don't expect metadata to change here, but let's err on the safe side. Fixes: d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock"); Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-12Merge tag 'vfs-fix-ioctl-checking-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong: "Here's a patch series that sets up common parameter checking functions for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations. The goal here is to reduce the amount of behaviorial variance between the filesystems where those ioctls originated (ext2 and XFS, respectively) and everybody else. - Standardize parameter checking for the SETFLAGS and FSSETXATTR ioctls (which were the file attribute setters for ext4 and xfs and have now been hoisted to the vfs) - Only allow the DAX flag to be set on files and directories" * tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: vfs: only allow FSSETXATTR to set DAX flag on files and dirs vfs: teach vfs_ioc_fssetxattr_check to check extent size hints vfs: teach vfs_ioc_fssetxattr_check to check project id info vfs: create a generic checking function for FS_IOC_FSSETXATTR vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
2019-07-12Merge tag 'driver-core-5.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the "big" driver core and debugfs changes for 5.3-rc1 It's a lot of different patches, all across the tree due to some api changes and lots of debugfs cleanups. Other than the debugfs cleanups, in this set of changes we have: - bus iteration function cleanups - scripts/get_abi.pl tool to display and parse Documentation/ABI entries in a simple way - cleanups to Documenatation/ABI/ entries to make them parse easier due to typos and other minor things - default_attrs use for some ktype users - driver model documentation file conversions to .rst - compressed firmware file loading - deferred probe fixes All of these have been in linux-next for a while, with a bunch of merge issues that Stephen has been patient with me for" * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits) debugfs: make error message a bit more verbose orangefs: fix build warning from debugfs cleanup patch ubifs: fix build warning after debugfs cleanup patch driver: core: Allow subsystems to continue deferring probe drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT arch_topology: Remove error messages on out-of-memory conditions lib: notifier-error-inject: no need to check return value of debugfs_create functions swiotlb: no need to check return value of debugfs_create functions ceph: no need to check return value of debugfs_create functions sunrpc: no need to check return value of debugfs_create functions ubifs: no need to check return value of debugfs_create functions orangefs: no need to check return value of debugfs_create functions nfsd: no need to check return value of debugfs_create functions lib: 842: no need to check return value of debugfs_create functions debugfs: provide pr_fmt() macro debugfs: log errors when something goes wrong drivers: s390/cio: Fix compilation warning about const qualifiers drivers: Add generic helper to match by of_node driver_find_device: Unify the match function with class_find_device() bus_find_device: Unify the match callback with class_find_device ...
2019-07-10Merge tag 'gfs2-for-5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: "Some relatively minor changes for gfs2: - An initial batch of obvious cleanups and fixes from Bob's recovery patch queue. - Two iomap conversion patches and some cleanups from Christoph Hellwig. - A cosmetic cleanup from Kefeng Wang (Huawei). - Another minor fix and cleanup by me" * tag 'gfs2-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Remove unused gfs2_iomap_alloc argument gfs2: don't use buffer_heads in gfs2_allocate_page_backing gfs2: use iomap_bmap instead of generic_block_bmap gfs2: mark stuffed_readpage static gfs2: merge gfs2_writepage_common into gfs2_writepage gfs2: merge gfs2_writeback_aops and gfs2_ordered_aops gfs2: remove the unused gfs2_stuffed_write_end function gfs2: use page_offset in gfs2_page_mkwrite gfs2: replace more printk with calls to fs_info and friends gfs2: dump fsid when dumping glock problems gfs2: simplify gfs2_freeze by removing case gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN gfs2: Warn when a journal replay overwrites a rgrp with buffers gfs2: log which portion of the journal is replayed gfs2: eliminate tr_num_revoke_rm gfs2: kthread and remount improvements gfs2: Use IS_ERR_OR_NULL gfs2: Clean up freeing struct gfs2_sbd
2019-07-10Merge tag 'iomap-5.3-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap updates from Darrick Wong: "There are a few fixes for gfs2 but otherwise it's pretty quiet so far. - Only mark inode dirty at the end of writing to a file (instead of once for every page written). - Fix for an accounting error in the page_done callback" * tag 'iomap-5.3-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: fix page_done callback for short writes fs: fold __generic_write_end back into generic_write_end iomap: don't mark the inode dirty in iomap_write_end
2019-07-04gfs2: Remove unused gfs2_iomap_alloc argumentAndreas Gruenbacher
Remove the unused flags argument of gfs2_iomap_alloc. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03gfs2: don't use buffer_heads in gfs2_allocate_page_backingChristoph Hellwig
Rewrite gfs2_allocate_page_backing to call gfs2_iomap_get_alloc and operate on struct iomap directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03gfs2: use iomap_bmap instead of generic_block_bmapChristoph Hellwig
No need to indirect through get_blocks and buffer_heads when we can just use the iomap version. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03gfs2: mark stuffed_readpage staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03gfs2: merge gfs2_writepage_common into gfs2_writepageChristoph Hellwig
There is no need to keep these two functions separate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03gfs2: merge gfs2_writeback_aops and gfs2_ordered_aopsChristoph Hellwig
The only difference between the two is that gfs2_ordered_aops sets the set_page_dirty method to __set_page_dirty_buffers, but given that __set_page_dirty_buffers is the default, if no method is set, there is no need to to do that. Merge the two sets of operations into one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03gfs2: remove the unused gfs2_stuffed_write_end functionChristoph Hellwig
This function was overlooked when the write_begin and write_end address space operations were removed as part of gfs2's iomap conversion. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03gfs2: use page_offset in gfs2_page_mkwriteChristoph Hellwig
Without casting page->index to a guaranteed 64-bit type, the value might be treated as 32-bit on 32-bit platforms and thus get truncated. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-01vfs: create a generic checking and prep function for FS_IOC_SETFLAGSDarrick J. Wong
Create a generic function to check incoming FS_IOC_SETFLAGS flag values and later prepare the inode for updates so that we can standardize the implementations that follow ext4's flag values. Note that the efivarfs implementation no longer fails a no-op SETFLAGS without CAP_LINUX_IMMUTABLE since that's the behavior in ext*. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2019-06-27iomap: don't mark the inode dirty in iomap_write_endAndreas Gruenbacher
Marking the inode dirty for each page copied into the page cache can be very inefficient for file systems that use the VFS dirty inode tracking, and is completely pointless for those that don't use the VFS dirty inode tracking. So instead, only set an iomap flag when changing the in-core inode size, and open code the rest of __generic_write_end. Partially based on code from Christoph Hellwig. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-27gfs2: replace more printk with calls to fs_info and friendsBob Peterson
This patch replaces a few leftover printk errors with calls to fs_info and similar, so that the file system having the error is properly logged. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: dump fsid when dumping glock problemsBob Peterson
Before this patch, if a glock error was encountered, the glock with the problem was dumped. But sometimes you may have lots of file systems mounted, and that doesn't tell you which file system it was for. This patch adds a new boolean parameter fsid to the dump_glock family of functions. For non-error cases, such as dumping the glocks debugfs file, the fsid is not dumped in order to keep lock dumps and glocktop as clean as possible. For all error cases, such as GLOCK_BUG_ON, the file system id is now printed. This will make it easier to debug. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: simplify gfs2_freeze by removing caseBob Peterson
Function gfs2_freeze had a case statement that simply checked the error code, but the break statements just made the logic hard to read. This patch simplifies the logic in favor of a simple if. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWNBob Peterson
Before this patch, the superblock flag indicating when a file system is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to the more obvious SDF_WITHDRAWN. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Warn when a journal replay overwrites a rgrp with buffersBob Peterson
This patch adds some instrumentation in gfs2's journal replay that indicates when we're about to overwrite a rgrp for which we already have a valid buffer_head. When this problem occurs, it's a situation in which this node has been granted a rgrp glock and subsequently read in buffer_heads for it, and possibly even made changes to the rgrp bits and/or allocation values. But now another node has failed and forced us to replay its journal, but its journal contains a copy of the same rgrp, without a revoke, which means we're about to overwrite a rgrp that we now rightfully own, with an obsolete copy. That is always a problem. It means the other node (which failed and left its journal to be replayed) failed to flush out its rgrp buffers, write out the revoke, and invalidate its copy before it released the glock to our possession. No node should ever release a glock until its metadata has been written to the journal and revoked and invalidated.. We also kludge around the problem and refuse to replace our good copy with the journals bad copy by not marking the buffer dirty, but never do it silently. That's wallpapering over a larger problem that still exists. IOW, if this situation can happen to this node, it can also happen to a different node and we wouldn't even know it or be able to circumvent it: Suppose we have a 3-node cluster: Node 1 fails, leaving an obsolete rgrp block in its journal without a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is released and starts making changes, allocating and freeing blocks from the rgrp, etc. Node 3 replays the journal from node 1, oblivious and unaware that it's about to overwrite node 2's changes. So we still need to be vocal and log the error to make it apparent that a corruption path still exists in gfs2. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: log which portion of the journal is replayedBob Peterson
When a journal is replayed, gfs2 logs a message similar to: jid=X: Replaying journal... This patch adds the tail and block number so that the range of the replayed block is also printed. These values will match the values shown if the journal is dumped with gfs2_edit -p journalX. The resulting output looks something like this: jid=1: Replaying journal...0x28b7 to 0x2beb This will allow us to better debug file system corruption problems. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: eliminate tr_num_revoke_rmBob Peterson
For its journal processing, gfs2 kept track of the number of buffers added and removed on a per-transaction basis. These values are used to calculate space needed in the journal. But while these calculations make sense for the number of buffers, they make no sense for revokes. Revokes are managed in their own list, linked from the superblock. So it's entirely unnecessary to keep separate per-transaction counts for revokes added and removed. A single count will do the same job. Therefore, this patch combines the transaction revokes into a single count. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: kthread and remount improvementsBob Peterson
Before this patch, gfs2 saved the pointers to the two daemon threads (logd and quotad) in the superblock, but they were never cleared, even if the threads were stopped (e.g. on remount -o ro). That meant that certain error conditions (like a withdrawn file system) could race. For example, xfstests generic/361 caused an IO error during remount -o ro, which caused the kthreads to be stopped, then the error flagged. Later, when the test unmounted the file system, it would try to stop the threads a second time with kthread_stop. This patch does two things: First, every time it stops the threads it zeroes out the thread pointer, and also checks whether it's NULL before trying to stop it. Second, in function gfs2_remount_fs, it was returning if an error was logged by either of the two functions for gfs2_make_fs_ro and _rw, which caused it to bypass the online uevent at the bottom of the function. This removes that bypass in favor of just running the whole function, then returning the error. That way, unmounts and remounts won't hang forever. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Use IS_ERR_OR_NULLKefeng Wang
Use IS_ERR_OR_NULL where appropriate. (Several more places converted by Andreas.) Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27gfs2: Clean up freeing struct gfs2_sbdAndreas Gruenbacher
Add a free_sbd function for freeing a struct gfs2_sbd. Use that for freeing a super-block descriptor, either directly or via kobject_put. Free sd_lkstats inside the kobject release function: that way, gfs2_put_super will no longer leak sd_lkstats. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-14Merge tag 'gfs2-v5.2.fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix rounding error in gfs2_iomap_page_prepare" * tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix rounding error in gfs2_iomap_page_prepare
2019-06-14gfs2: Fix rounding error in gfs2_iomap_page_prepareAndreas Gruenbacher
The pos and len arguments to the iomap page_prepare callback are not block aligned, so we need to take that into account when computing the number of blocks. Fixes: d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-13gfs2: replace ktype default_attrs with default_groupsKimberly Brown
The kobj_type default_attrs field is being replaced by the default_groups field. Replace the default_attrs field in gfs2_ktype with default_groups. Use the ATTRIBUTE_GROUPS macro to create gfs2_groups. Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-08Merge tag 'spdx-5.2-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Another round of SPDX header file fixes for 5.2-rc4 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being added, based on the text in the files. We are slowly chipping away at the 700+ different ways people tried to write the license text. All of these were reviewed on the spdx mailing list by a number of different people. We now have over 60% of the kernel files covered with SPDX tags: $ ./scripts/spdxcheck.py -v 2>&1 | grep Files Files checked: 64533 Files with SPDX: 40392 Files with errors: 0 I think the majority of the "easy" fixups are now done, it's now the start of the longer-tail of crazy variants to wade through" * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429 ...
2019-06-06Revert "gfs2: Replace gl_revokes with a GLF flag"Bob Peterson
Commit 73118ca8baf7 introduced a glock reference counting bug in gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is no longer useful, so revert that commit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398Thomas Gleixner
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-22Merge tag 'gfs2-5.1.fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Andreas Gruenbacher: "Fix a gfs2 sign extension bug introduced in v4.3" * tag 'gfs2-5.1.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix sign extension bug in gfs2_update_stats
2019-05-22gfs2: Fix sign extension bug in gfs2_update_statsAndreas Gruenbacher
Commit 4d207133e9c3 changed the types of the statistic values in struct gfs2_lkstats from s64 to u64. Because of that, what should be a signed value in gfs2_update_stats turned into an unsigned value. When shifted right, we end up with a large positive value instead of a small negative value, which results in an incorrect variance estimate. Fixes: 4d207133e9c3 ("gfs2: Make statistics unsigned, suitable for use with do_div()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: stable@vger.kernel.org # v4.4+
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-13gfs2: Fix error path kobject memory leakTobin C. Harding
If a call to kobject_init_and_add() fails we must call kobject_put() otherwise we leak memory. Function gfs2_sys_fs_add always calls kobject_init_and_add() which always calls kobject_init(). It is safe to leave object destruction up to the kobject release function and never free it manually. Remove call to kfree() and always call kobject_put() in the error path. Signed-off-by: Tobin C. Harding <tobin@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-08Merge tag 'gfs2-for-5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Andreas Gruenbacher: "We've got the following patches ready for this merge window: - "gfs2: Fix loop in gfs2_rbm_find (v2)" A rework of a fix we ended up reverting in 5.0 because of an iozone performance regression. - "gfs2: read journal in large chunks" "gfs2: fix race between gfs2_freeze_func and unmount" An improved version of a commit we also ended up reverting in 5.0 because of a regression in xfstest generic/311. It turns out that the journal changes were mostly innocent and that unfreeze didn't wait for the freeze to complete, which caused the filesystem to be unmounted before it was actually idle. - "gfs2: Fix occasional glock use-after-free" "gfs2: Fix iomap write page reclaim deadlock" "gfs2: Fix lru_count going negative" Fixes for various problems reported and partially fixed by Citrix engineers. Thank you very much. - "gfs2: clean_journal improperly set sd_log_flush_head" Another fix from Bob. - .. and a few other minor cleanups" * tag 'gfs2-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: read journal in large chunks gfs2: Fix iomap write page reclaim deadlock gfs2: fix race between gfs2_freeze_func and unmount gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke} gfs2: Rename sd_log_le_{revoke,ordered} gfs2: Remove unnecessary extern declarations gfs2: Remove misleading comments in gfs2_evict_inode gfs2: Replace gl_revokes with a GLF flag gfs2: Fix occasional glock use-after-free gfs2: clean_journal improperly set sd_log_flush_head gfs2: Fix lru_count going negative gfs2: Fix loop in gfs2_rbm_find (v2)
2019-05-07Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "Nothing major in this series, just fixes and improvements all over the map. This contains: - Series of fixes for sed-opal (David, Jonas) - Fixes and performance tweaks for BFQ (via Paolo) - Set of fixes for bcache (via Coly) - Set of fixes for md (via Song) - Enabling multi-page for passthrough requests (Ming) - Queue release fix series (Ming) - Device notification improvements (Martin) - Propagate underlying device rotational status in loop (Holger) - Removal of mtip32xx trim support, which has been disabled for years (Christoph) - Improvement and cleanup of nvme command handling (Christoph) - Add block SPDX tags (Christoph) - Cleanup/hardening of bio/bvec iteration (Christoph) - A few NVMe pull requests (Christoph) - Removal of CONFIG_LBDAF (Christoph) - Various little fixes here and there" * tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block: (164 commits) block: fix mismerge in bvec_advance block: don't drain in-progress dispatch in blk_cleanup_queue() blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release blk-mq: always free hctx after request queue is freed blk-mq: split blk_mq_alloc_and_init_hctx into two parts blk-mq: free hw queue's resource in hctx's release handler blk-mq: move cancel of requeue_work into blk_mq_release blk-mq: grab .q_usage_counter when queuing request from plug code path block: fix function name in comment nvmet: protect discovery change log event list iteration nvme: mark nvme_core_init and nvme_core_exit static nvme: move command size checks to the core nvme-fabrics: check more command sizes nvme-pci: check more command sizes nvme-pci: remove an unneeded variable initialization nvme-pci: unquiesce admin queue on shutdown nvme-pci: shutdown on timeout during deletion nvme-pci: fix psdt field for single segment sgls nvme-multipath: don't print ANA group state by default nvme-multipath: split bios with the ns_head bio_set before submitting ...
2019-05-07gfs2: read journal in large chunksAbhi Das
Use bios to read in the journal into the address space of the journal inode (jd_inode), sequentially and in large chunks. This is faster for locating the journal head that the previous binary search approach. When performing recovery, we keep the journal in the address space until recovery is done, which further speeds up things. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix iomap write page reclaim deadlockAndreas Gruenbacher
Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing buffered writes by starting a transaction in iomap_begin, writing a range of pages, and ending that transaction in iomap_end. This approach suffers from two problems: (1) Any allocations necessary for the write are done in iomap_begin, so when the data aren't journaled, there is no need for keeping the transaction open until iomap_end. (2) Transactions keep the gfs2 log flush lock held. When iomap_file_buffered_write calls balance_dirty_pages, this can end up calling gfs2_write_inode, which will try to flush the log. This requires taking the log flush lock which is already held, resulting in a deadlock. Fix both of these issues by not keeping transactions open from iomap_begin to iomap_end. Instead, start a small transaction in page_prepare and end it in page_done when necessary. Reported-by: Edwin Török <edvin.torok@citrix.com> Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-05-07gfs2: fix race between gfs2_freeze_func and unmountAbhi Das
As part of the freeze operation, gfs2_freeze_func() is left blocking on a request to hold the sd_freeze_gl in SH. This glock is held in EX by the gfs2_freeze() code. A subsequent call to gfs2_unfreeze() releases the EXclusively held sd_freeze_gl, which allows gfs2_freeze_func() to acquire it in SH and resume its operation. gfs2_unfreeze(), however, doesn't wait for gfs2_freeze_func() to complete. If a umount is issued right after unfreeze, it could result in an inconsistent filesystem because some journal data (statfs update) isn't written out. Refer to commit 24972557b12c for a more detailed explanation of how freeze/unfreeze work. This patch causes gfs2_unfreeze() to wait for gfs2_freeze_func() to complete before returning to the user. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}Andreas Gruenbacher
Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no such thing as an "unrevoke" object; all this function does is remove existing revoke objects plus some bookkeeping. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Rename sd_log_le_{revoke,ordered}Andreas Gruenbacher
Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to sd_log_ordered: not sure what le stands for here, but it doesn't add clarity, and if it stands for list entry, it's actually confusing as those are both list heads but not list entries. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Remove unnecessary extern declarationsAndreas Gruenbacher
Make log operations statuc; they are only used locally. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Remove misleading comments in gfs2_evict_inodeAndreas Gruenbacher
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Replace gl_revokes with a GLF flagBob Peterson
The gl_revokes value determines how many outstanding revokes a glock has on the superblock revokes list; this is used to avoid unnecessary log flushes. However, gl_revokes is only ever tested for being zero, and it's only decremented in revoke_lo_after_commit, which removes all revokes from the list, so we know that the gl_revoke values of all the glocks on the list will reach zero. Therefore, we can replace gl_revokes with a bit flag. This saves an atomic counter in struct gfs2_glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix occasional glock use-after-freeAndreas Gruenbacher
This patch has to do with the life cycle of glocks and buffers. When gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata object is assigned to track the buffer, and that is queued to various lists, including the glock's gl_ail_list to indicate it's on the active items list. Once the page associated with the buffer has been written, it is removed from the ail list, but its life isn't over until a revoke has been successfully written. So after the block is written, its bufdata object is moved from the glock's gl_ail_list to a file-system-wide list of pending revokes, sd_log_le_revoke. At that point the glock still needs to track how many revokes it contributed to that list (in gl_revokes) so that things like glock go_sync can ensure all the metadata has been not only written, but also revoked before the glock is granted to a different node. This is to guarantee journal replay doesn't replay the block once the glock has been granted to another node. Ross Lagerwall recently discovered a race in which an inode could be evicted, and its glock freed after its ail list had been synced, but while it still had unwritten revokes on the sd_log_le_revoke list. The evict decremented the glock reference count to zero, which allowed the glock to be freed. After the revoke was written, function revoke_lo_after_commit tried to adjust the glock's gl_revokes counter and clear its GLF_LFLUSH flag, at which time it referenced the freed glock. This patch fixes the problem by incrementing the glock reference count in gfs2_add_revoke when the glock's first bufdata object is moved from the glock to the global revokes list. Later, when the glock's last such bufdata object is freed, the reference count is decremented. This guarantees that whichever process finishes last (the revoke writing or the evict) will properly free the glock, and neither will reference the glock after it has been freed. Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-05-07gfs2: clean_journal improperly set sd_log_flush_headBob Peterson
This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef. Due to that patch, function clean_journal was setting the value of sd_log_flush_head, but that's only valid if it is replaying the node's own journal. If it's replaying another node's journal, that's completely wrong and will lead to multiple problems. This patch tries to clean up the mess by passing the value of the logical journal block number into gfs2_write_log_header so the function can treat non-owned journals generically. For the local journal, the journal extent map is used for best performance. For other nodes from other journals, new function gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get. This patch also tries to establish more consistency when passing journal block parameters by changing several unsigned int types to a consistent u32. Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix lru_count going negativeRoss Lagerwall
Under certain conditions, lru_count may drop below zero resulting in a large amount of log spam like this: vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \ negative objects to delete nr=-1 This happens as follows: 1) A glock is moved from lru_list to the dispose list and lru_count is decremented. 2) The dispose function calls cond_resched() and drops the lru lock. 3) Another thread takes the lru lock and tries to add the same glock to lru_list, checking if the glock is on an lru list. 4) It is on a list (actually the dispose list) and so it avoids incrementing lru_count. 5) The glock is moved to lru_list. 5) The original thread doesn't dispose it because it has been re-added to the lru list but the lru_count has still decreased by one. Fix by checking if the LRU flag is set on the glock rather than checking if the glock is on some list and rearrange the code so that the LRU flag is added/removed precisely when the glock is added/removed from lru_list. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix loop in gfs2_rbm_find (v2)Andreas Gruenbacher
Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. This is an updated version of commit 2d29f6b96d ("gfs2: Fix loop in gfs2_rbm_find") which ended up being reverted because it introduced a performance regression in iozone (see commit e74c98ca2d). Changes since v1: - Simplify the wrap-around logic. - Handle the case where each resource group only has a single bitmap block (small filesystem). - Update rd_extfail_pt whenever we scan the entire bitmap, even when we don't start the scan at the very beginning of the bitmap. Fixes: e579ed4f446e ("GFS2: Introduce rbm field bii") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07Merge tag 'Wimplicit-fallthrough-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva: "Mark switch cases where we are expecting to fall through. This is part of the ongoing efforts to enable -Wimplicit-fallthrough. Most of them have been baking in linux-next for a whole development cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails going out for newly introduced code that triggers -Wimplicit-fallthrough to avoid gaining more of these cases while we work to remove the ones that are already present. We are getting close to completing this work. Currently, there are only 32 of 2311 of these cases left to be addressed in linux-next. I'm auditing every case; I take a look into the code and analyze it in order to determine if I'm dealing with an actual bug or a false positive, as explained here: https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/ While working on this, I've found and fixed the several missing break/return bugs, some of them introduced more than 5 years ago. Once this work is finished, we'll be able to universally enable "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from entering the kernel again" * tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits) memstick: mark expected switch fall-throughs drm/nouveau/nvkm: mark expected switch fall-throughs NFC: st21nfca: Fix fall-through warnings NFC: pn533: mark expected switch fall-throughs block: Mark expected switch fall-throughs ASN.1: mark expected switch fall-through lib/cmdline.c: mark expected switch fall-throughs lib: zstd: Mark expected switch fall-throughs scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs scsi: ppa: mark expected switch fall-through scsi: osst: mark expected switch fall-throughs scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs scsi: lpfc: lpfc_nvme: Mark expected switch fall-through scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs scsi: lpfc: lpfc_els: Mark expected switch fall-throughs scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs scsi: imm: mark expected switch fall-throughs scsi: csiostor: csio_wr: mark expected switch fall-through ...