summaryrefslogtreecommitdiff
path: root/fs/locks.c
AgeCommit message (Collapse)Author
2024-09-23Merge tag 'pull-stable-struct_fd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.
2024-08-28file: reclaim 24 bytes from f_ownerChristian Brauner
We do embedd struct fown_struct into struct file letting it take up 32 bytes in total. We could tweak struct fown_struct to be more compact but really it shouldn't even be embedded in struct file in the first place. Instead, actual users of struct fown_struct should allocate the struct on demand. This frees up 24 bytes in struct file. That will have some potentially user-visible changes for the ownership fcntl()s. Some of them can now fail due to allocation failures. Practically, that probably will almost never happen as the allocations are small and they only happen once per file. The fown_struct is used during kill_fasync() which is used by e.g., pipes to generate a SIGIO signal. Sending of such signals is conditional on userspace having set an owner for the file using one of the F_OWNER fcntl()s. Such users will be unaffected if struct fown_struct is allocated during the fcntl() call. There are a few subsystems that call __f_setown() expecting file->f_owner to be allocated: (1) tun devices file->f_op->fasync::tun_chr_fasync() -> __f_setown() There are no callers of tun_chr_fasync(). (2) tty devices file->f_op->fasync::tty_fasync() -> __tty_fasync() -> __f_setown() tty_fasync() has no additional callers but __tty_fasync() has. Note that __tty_fasync() only calls __f_setown() if the @on argument is true. It's called from: file->f_op->release::tty_release() -> tty_release() -> __tty_fasync() -> __f_setown() tty_release() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of tty_release() are safe as well. file->f_op->release::tty_open() -> tty_release() -> __tty_fasync() -> __f_setown() __tty_hangup() calls __tty_fasync() with @on false => __f_setown() is never called from tty_release(). => All callers of __tty_hangup() are safe as well. From the callchains it's obvious that (1) and (2) end up getting called via file->f_op->fasync(). That can happen either through the F_SETFL fcntl() with the FASYNC flag raised or via the FIOASYNC ioctl(). If FASYNC is requested and the file isn't already FASYNC then file->f_op->fasync() is called with @on true which ends up causing both (1) and (2) to call __f_setown(). (1) and (2) are the only subsystems that call __f_setown() from the file->f_op->fasync() handler. So both (1) and (2) have been updated to allocate a struct fown_struct prior to calling fasync_helper() to register with the fasync infrastructure. That's safe as they both call fasync_helper() which also does allocations if @on is true. The other interesting case are file leases: (3) file leases lease_manager_ops->lm_setup::lease_setup() -> __f_setown() Which in turn is called from: generic_add_lease() -> lease_manager_ops->lm_setup::lease_setup() -> __f_setown() So here again we can simply make generic_add_lease() allocate struct fown_struct prior to the lease_manager_ops->lm_setup::lease_setup() which happens under a spinlock. With that the two remaining subsystems that call __f_setown() are: (4) dnotify (5) sockets Both have their own custom ioctls to set struct fown_struct and both have been converted to allocate a struct fown_struct on demand from their respective ioctls. Interactions with O_PATH are fine as well e.g., when opening a /dev/tty as O_PATH then no file->f_op->open() happens thus no file->f_owner is allocated. That's fine as no file operation will be set for those and the device has never been opened. fcntl()s called on such things will just allocate a ->f_owner on demand. Although I have zero idea why'd you care about f_owner on an O_PATH fd. Link: https://lore.kernel.org/r/20240813-work-f_owner-v2-1-4e9343a79f9f@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12introduce fd_file(), convert all accessors to it.Al Viro
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-08-12filelock: fix name of file_lease slab cacheOmar Sandoval
When struct file_lease was split out from struct file_lock, the name of the file_lock slab cache was copied to the new slab cache for file_lease. This name conflict causes confusion in /proc/slabinfo and /sys/kernel/slab. In particular, it caused failures in drgn's test case for slab cache merging. Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59a/tests/linux_kernel/helpers/test_slab.py#L81 Fixes: c69ff4071935 ("filelock: split leases out of struct file_lock") Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.1722293276.git.osandov@fb.com Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24filelock: Fix fcntl/close race recovery compat pathJann Horn
When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-11Merge tag 'vfs-6.10-rc8.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "cachefiles: - Export an existing and add a new cachefile helper to be used in filesystems to fix reference count bugs - Use the newly added fscache_ty_get_volume() helper to get a reference count on an fscache_volume to handle volumes that are about to be removed cleanly - After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN wait for all ongoing cookie lookups to complete and for the object count to reach zero - Propagate errors from vfs_getxattr() to avoid an infinite loop in cachefiles_check_volume_xattr() because it keeps seeing ESTALE - Don't send new requests when an object is dropped by raising CACHEFILES_ONDEMAND_OJBSTATE_DROPPING - Cancel all requests for an object that is about to be dropped - Wait for the ondemand_boject_worker to finish before dropping a cachefiles object to prevent use-after-free - Use cyclic allocation for message ids to better handle id recycling - Add missing lock protection when iterating through the xarray when polling netfs: - Use standard logging helpers for debug logging VFS: - Fix potential use-after-free in file locks during trace_posix_lock_inode(). The tracepoint could fire while another task raced it and freed the lock that was requested to be traced - Only increment the nr_dentry_negative counter for dentries that are present on the superblock LRU. Currently, DCACHE_LRU_LIST list is used to detect this case. However, the flag is also raised in combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru is used. So checking only DCACHE_LRU_LIST will lead to wrong nr_dentry_negative count. Fix the check to not count dentries that are on a shrink related list Misc: - hfsplus: fix an uninitialized value issue in copy_name - minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even though we switched it to kmap_local_page() a while ago" * tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: minixfs: Fix minixfs_rename with HIGHMEM hfsplus: fix uninit-value in copy_name vfs: don't mod negative dentry count when on shrinker list filelock: fix potential use-after-free in posix_lock_inode cachefiles: add missing lock protection when polling cachefiles: cyclic allocation of msg_id to avoid reuse cachefiles: wait for ondemand_object_worker to finish when dropping object cachefiles: cancel all requests for the object that is being dropped cachefiles: stop sending new request when dropping object cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie() cachefiles: fix slab-use-after-free in fscache_withdraw_volume() netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume() netfs: Switch debug logging to pr_debug()
2024-07-05filelock: fix potential use-after-free in posix_lock_inodeJeff Layton
Light Hsieh reported a KASAN UAF warning in trace_posix_lock_inode(). The request pointer had been changed earlier to point to a lock entry that was added to the inode's list. However, before the tracepoint could fire, another task raced in and freed that lock. Fix this by moving the tracepoint inside the spinlock, which should ensure that this doesn't happen. Fixes: 74f6f5912693 ("locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock") Link: https://lore.kernel.org/linux-fsdevel/724ffb0a2962e912ea62bb0515deadf39c325112.camel@kernel.org/ Reported-by: Light Hsieh (謝明燈) <Light.Hsieh@mediatek.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240702-filelock-6-10-v1-1-96e766aadc98@kernel.org Reviewed-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-02filelock: Remove locks reliably when fcntl/close race is detectedJann Horn
When fcntl_setlk() races with close(), it removes the created lock with do_lock_file_wait(). However, LSMs can allow the first do_lock_file_wait() that created the lock while denying the second do_lock_file_wait() that tries to remove the lock. In theory (but AFAIK not in practice), posix_lock_file() could also fail to remove a lock due to GFP_KERNEL allocation failure (when splitting a range in the middle). After the bug has been triggered, use-after-free reads will occur in lock_get_status() when userspace reads /proc/locks. This can likely be used to read arbitrary kernel memory, but can't corrupt kernel memory. This only affects systems with SELinux / Smack / AppArmor / BPF-LSM in enforcing mode and only works from some security contexts. Fix it by calling locks_remove_posix() instead, which is designed to reliably get rid of POSIX locks associated with the given file and files_struct and is also used by filp_flush(). Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240702-fs-lock-recover-2-v1-1-edd456f63789@google.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-20filelock: fix deadlock detection in POSIX lockingJeff Layton
The FL_POSIX check in __locks_insert_block was inadvertantly broken recently and is now inserting only OFD locks instead of only legacy POSIX locks. This breaks deadlock detection in POSIX locks, and may also be the root cause of a performance regression noted by the kernel test robot. Restore the proper sense of the test. Fixes: b6be3714005c ("filelock: convert __locks_insert_block, conflict and deadlock checks to use file_lock_core") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202402181229.f8147f40-oliver.sang@intel.com Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240218-flsplit4-v1-1-26454fc090f2@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: don't do security checks on nfsd setlease callsJeff Layton
Zdenek reported seeing some AVC denials due to nfsd trying to set delegations: type=AVC msg=audit(09.11.2023 09:03:46.411:496) : avc: denied { lease } for pid=5127 comm=rpc.nfsd capability=lease scontext=system_u:system_r:nfsd_t:s0 tcontext=system_u:system_r:nfsd_t:s0 tclass=capability permissive=0 When setting delegations on behalf of nfsd, we don't want to do all of the normal capabilty and LSM checks. nfsd is a kernel thread and runs with CAP_LEASE set, so the uid checks end up being a no-op in most cases anyway. Some nfsd functions can end up running in normal process context when tearing down the server. At that point, the CAP_LEASE check can fail and cause the client to not tear down delegations when expected. Also, the way the per-fs ->setlease handlers work today is a little convoluted. The non-trivial ones are wrappers around generic_setlease, so when they fail due to permission problems they usually they end up doing a little extra work only to determine that they can't set the lease anyway. It would be more efficient to do those checks earlier. Transplant the permission checking from generic_setlease to vfs_setlease, which will make the permission checking happen earlier on filesystems that have a ->setlease operation. Add a new kernel_setlease function that bypasses these checks, and switch nfsd to use that instead of vfs_setlease. There is one behavioral change here: prior this patch the setlease_notifier would fire even if the lease attempt was going to fail the security checks later. With this change, it doesn't fire until the caller has passed them. I think this is a desirable change overall. nfsd is the only user of the setlease_notifier and it doesn't benefit from being notified about failed attempts. Cc: Ondrej Mosnáček <omosnacek@gmail.com> Reported-by: Zdenek Pytela <zpytela@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2248830 Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240205-bz2248830-v1-1-d0ec0daecba1@kernel.org Acked-by: Tom Talpey <tom@talpey.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: split leases out of struct file_lockJeff Layton
Add a new struct file_lease and move the lease-specific fields from struct file_lock to it. Convert the appropriate API calls to take struct file_lease instead, and convert the callers to use them. There is zero overlap between the lock manager operations for file locks and the ones for file leases, so split the lease-related operations off into a new lease_manager_operations struct. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-47-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert seqfile handling to use file_lock_coreJeff Layton
Reduce some pointer manipulation by just using file_lock_core where we can and only translate to a file_lock when needed. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-33-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert locks_translate_pid to take file_lock_coreJeff Layton
locks_translate_pid is used on both locks and leases, so have that take struct file_lock_core. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-32-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert locks_insert_lock_ctx and locks_delete_lock_ctxJeff Layton
Have these functions take a file_lock_core pointer instead of a file_lock. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-31-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert locks_wake_up_blocks to take a file_lock_core pointerJeff Layton
Have locks_wake_up_blocks take a file_lock_core pointer, and fix up the callers to pass one in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-30-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: make assign_type helper take a file_lock_core pointerJeff Layton
Have assign_type take struct file_lock_core instead of file_lock. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-29-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: reorganize locks_delete_block and __locks_insert_blockJeff Layton
Rename the old __locks_delete_block to __locks_unlink_lock. Rename change old locks_delete_block function to __locks_delete_block and have it take a file_lock_core. Make locks_delete_block a simple wrapper around __locks_delete_block. Also, change __locks_insert_block to take struct file_lock_core, and fix up its callers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-28-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: clean up locks_delete_block internalsJeff Layton
Rework the internals of locks_delete_block to use struct file_lock_core (mostly just for clarity's sake). The prototype is not changed. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-27-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert fl_blocker to file_lock_coreJeff Layton
Both locks and leases deal with fl_blocker. Switch the fl_blocker pointer in struct file_lock_core to point to the file_lock_core of the blocker instead of a file_lock structure. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-26-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert __locks_insert_block, conflict and deadlock checks to use ↵Jeff Layton
file_lock_core Have both __locks_insert_block and the deadlock and conflict checking functions take a struct file_lock_core pointer instead of a struct file_lock one. Also, change posix_locks_deadlock to return bool. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-25-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: make __locks_delete_block and __locks_wake_up_blocks take ↵Jeff Layton
file_lock_core Convert __locks_delete_block and __locks_wake_up_blocks to take a struct file_lock_core pointer. While we could do this in another way, we're going to need to add a file_lock() helper function later anyway, so introduce and use it now. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-24-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert locks_{insert,delete}_global_blockedJeff Layton
Have locks_insert_global_blocked and locks_delete_global_blocked take a struct file_lock_core pointer. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-23-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: make locks_{insert,delete}_global_locks take file_lock_core argJeff Layton
Convert these functions to take a file_lock_core instead of a file_lock. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-22-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert posix_owner_key to take file_lock_core argJeff Layton
Convert posix_owner_key to take struct file_lock_core pointer, and fix up the callers to pass one in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-21-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: make posix_same_owner take file_lock_core pointersJeff Layton
Change posix_same_owner to take struct file_lock_core pointers, and convert the callers to pass those in. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-20-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: convert more internal functions to use file_lock_coreJeff Layton
Convert more internal fs/locks.c functions to take and deal with struct file_lock_core instead of struct file_lock: - locks_dump_ctx_list - locks_check_ctx_file_list - locks_release_private - locks_owner_has_blockers Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-19-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: have fs/locks.c deal with file_lock_core directlyJeff Layton
Convert fs/locks.c to access fl_core fields direcly rather than using the backward-compatibility macros. Most of this was done with coccinelle, with a few by-hand fixups. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-18-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: split common fields into struct file_lock_coreJeff Layton
In a future patch, we're going to split file leases into their own structure. Since a lot of the underlying machinery uses the same fields move those into a new file_lock_core, and embed that inside struct file_lock. For now, add some macros to ensure that we can continue to build while the conversion is in progress. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: drop the IS_* macrosJeff Layton
These don't add a lot of value over just open-coding the flag check. Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-16-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: add some new helper functionsJeff Layton
In later patches we're going to embed some common fields into a new structure inside struct file_lock. Smooth the transition by adding some new helper functions, and converting the core file locking code to use them. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-4-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-02filelock: rename fl_pid variable in lock_get_statusJeff Layton
In later patches we're going to introduce some macros that will clash with the variable name here. Rename it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-3-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-28fs: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-11-02Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
2023-10-18treewide: mark stuff as __ro_after_initAlexey Dobriyan
__read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1. Also, mark some stuff as "const" and "__init" while I'm at it. [akpm@linux-foundation.org: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning] [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-16lockd: add doc to enable EXPORT_OP_ASYNC_LOCKAlexander Aring
This patch adds a note to enable EXPORT_OP_ASYNC_LOCK for asynchronous lock request handling. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-31Merge tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "I'm thrilled to announce that the Linux in-kernel NFS server now offers NFSv4 write delegations. A write delegation enables a client to cache data and metadata for a single file more aggressively, reducing network round trips and server workload. Many thanks to Dai Ngo for contributing this facility, and to Jeff Layton and Neil Brown for reviewing and testing it. This release also sees the removal of all support for DES- and triple-DES-based Kerberos encryption types in the kernel's SunRPC implementation. These encryption types have been deprecated by the Internet community for years and are considered insecure. This change affects both the in-kernel NFS client and server. The server's UDP and TCP socket transports have now fully adopted David Howells' new bio_vec iterator so that no more than one sendmsg() call is needed to transmit each RPC message. In particular, this helps kTLS optimize record boundaries when sending RPC-with-TLS replies, and it takes the server a baby step closer to handling file I/O via folios. We've begun work on overhauling the SunRPC thread scheduler to remove a costly linked-list walk when looking for an idle RPC service thread to wake. The pre-requisites are included in this release. Thanks to Neil Brown for his ongoing work on this improvement" * tag 'nfsd-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (56 commits) Documentation: Add missing documentation for EXPORT_OP flags SUNRPC: Remove unused declaration rpc_modcount() SUNRPC: Remove unused declarations NFSD: da_addr_body field missing in some GETDEVICEINFO replies SUNRPC: Remove return value of svc_pool_wake_idle_thread() SUNRPC: make rqst_should_sleep() idempotent() SUNRPC: Clean up svc_set_num_threads SUNRPC: Count ingress RPC messages per svc_pool SUNRPC: Deduplicate thread wake-up code SUNRPC: Move trace_svc_xprt_enqueue SUNRPC: Add enum svc_auth_status SUNRPC: change svc_xprt::xpt_flags bits to enum SUNRPC: change svc_rqst::rq_flags bits to enum SUNRPC: change svc_pool::sp_flags bits to enum SUNRPC: change cache_head.flags bits to enum SUNRPC: remove timeout arg from svc_recv() SUNRPC: change svc_recv() to return void. SUNRPC: call svc_process() from svc_recv(). nfsd: separate nfsd_last_thread() from nfsd_put() nfsd: Simplify code around svc_exit_thread() call in nfsd() ...
2023-08-29locks: allow support for write delegationDai Ngo
Remove the check for F_WRLCK in generic_add_lease to allow file_lock to be used for write delegation. First consumer is NFSD. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-08-28Merge tag 'filelock-v6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: - new functionality for F_OFD_GETLK: requesting a type of F_UNLCK will find info about whatever lock happens to be first in the given range, regardless of type. - an OFD lock selftest - bugfix involving a UAF in a tracepoint - comment typo fix * tag 'filelock-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lock fs/locks: Fix typo selftests: add OFD lock tests fs/locks: F_UNLCK extension for F_OFD_GETLK
2023-08-24locks: fix KASAN: use-after-free in trace_event_raw_event_filelock_lockWill Shiu
As following backtrace, the struct file_lock request , in posix_lock_inode is free before ftrace function using. Replace the ftrace function ahead free flow could fix the use-after-free issue. [name:report&]=============================================== BUG:KASAN: use-after-free in trace_event_raw_event_filelock_lock+0x80/0x12c [name:report&]Read at addr f6ffff8025622620 by task NativeThread/16753 [name:report_hw_tags&]Pointer tag: [f6], memory tag: [fe] [name:report&] BT: Hardware name: MT6897 (DT) Call trace: dump_backtrace+0xf8/0x148 show_stack+0x18/0x24 dump_stack_lvl+0x60/0x7c print_report+0x2c8/0xa08 kasan_report+0xb0/0x120 __do_kernel_fault+0xc8/0x248 do_bad_area+0x30/0xdc do_tag_check_fault+0x1c/0x30 do_mem_abort+0x58/0xbc el1_abort+0x3c/0x5c el1h_64_sync_handler+0x54/0x90 el1h_64_sync+0x68/0x6c trace_event_raw_event_filelock_lock+0x80/0x12c posix_lock_inode+0xd0c/0xd60 do_lock_file_wait+0xb8/0x190 fcntl_setlk+0x2d8/0x440 ... [name:report&] [name:report&]Allocated by task 16752: ... slab_post_alloc_hook+0x74/0x340 kmem_cache_alloc+0x1b0/0x2f0 posix_lock_inode+0xb0/0xd60 ... [name:report&] [name:report&]Freed by task 16752: ... kmem_cache_free+0x274/0x5b0 locks_dispose_list+0x3c/0x148 posix_lock_inode+0xc40/0xd60 do_lock_file_wait+0xb8/0x190 fcntl_setlk+0x2d8/0x440 do_fcntl+0x150/0xc18 ... Signed-off-by: Will Shiu <Will.Shiu@mediatek.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-08-24fs/locks: Fix typoJakub Wilk
Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-07-10fs: Pass argument to fcntl_setlease as intLuca Vizzarro
The interface for fcntl expects the argument passed for the command F_SETLEASE to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: David Laight <David.Laight@ACULAB.com> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: linux-nfs@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com> Message-Id: <20230414152459.816046-3-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-06-27fs/locks: F_UNLCK extension for F_OFD_GETLKStas Sergeev
Currently F_UNLCK with F_OFD_GETLK returns -EINVAL. This patch changes it such that specifying F_UNLCK returns information only about OFD locks that are owned by the given file description. Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Shuah Khan <shuah@kernel.org> Cc: linux-kselftest@vger.kernel.org Cc: linux-api@vger.kernel.org Signed-off-by: Stas Sergeev <stsp2@yandex.ru> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-03-09filelocks: use mount idmapping for setlease permission checkSeth Forshee
A user should be allowed to take out a lease via an idmapped mount if the fsuid matches the mapped uid of the inode. generic_setlease() is checking the unmapped inode uid, causing these operations to be denied. Fix this by comparing against the mapped inode uid instead of the unmapped uid. Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Cc: stable@vger.kernel.org Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09fs/locks: Remove redundant assignment to cmdJiapeng Chong
Variable 'cmd' set but not used. fs/locks.c:2428:3: warning: Value stored to 'cmd' is never read. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4439 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-02-21Merge tag 'rcu.2023.02.10a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
2023-02-02fs: Remove CONFIG_SRCUPaul E. McKenney
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in conditional compilation based on CONFIG_SRCU. Therefore, remove the #ifdef and throw away the #else clause. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de>
2023-01-11fs: remove locks_inodeJeff Layton
locks_inode was turned into a wrapper around file_inode in de2a4a501e71 (Partially revert "locks: fix file locking on overlayfs"). Finish replacing locks_inode invocations everywhere with file_inode. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-11filelock: move file locking definitions to separate header fileJeff Layton
The file locking definitions have lived in fs.h since the dawn of time, but they are only used by a small subset of the source files that include it. Move the file locking definitions to a new header file, and add the appropriate #include directives to the source files that need them. By doing this we trim down fs.h a bit and limit the amount of rebuilding that has to be done when we make changes to the file locking APIs. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Steve French <stfrench@microsoft.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30Add process name and pid to locks warningAndi Kleen
It's fairly useless to complain about using an obsolete feature without telling the user which process used it. My Fedora desktop randomly drops this message, but I would really need this patch to figure out what triggers is. [ jlayton: print pid as well as process name ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30filelock: add a new locks_inode_context accessor functionJeff Layton
There are a number of places in the kernel that are accessing the inode->i_flctx field without smp_load_acquire. This is required to ensure that the caller doesn't see a partially-initialized structure. Add a new accessor function for it to make this clear and convert all of the relevant accesses in locks.c to use it. Also, convert locks_free_lock_context to use the helper as well instead of just doing a "bare" assignment. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>