summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2019-08-04NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()Trond Myklebust
John Hubbard reports seeing the following stack trace: nfs4_do_reclaim rcu_read_lock /* we are now in_atomic() and must not sleep */ nfs4_purge_state_owners nfs4_free_state_owner nfs4_destroy_seqid_counter rpc_destroy_wait_queue cancel_delayed_work_sync __cancel_work_timer __flush_work start_flush_work might_sleep: (kernel/workqueue.c:2975: BUG) The solution is to separate out the freeing of the state owners from nfs4_purge_state_owners(), and perform that outside the atomic context. Reported-by: John Hubbard <jhubbard@nvidia.com> Fixes: 0aaaf5c424c7f ("NFS: Cache state owners after files are closed") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Check the return value of update_open_stateid()Trond Myklebust
Ensure that we always check the return value of update_open_stateid() so that we can retry if the update of local state failed. This fixes infinite looping on state recovery. Fixes: e23008ec81ef3 ("NFSv4 reduce attribute requests for open reclaim") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v3.7+
2019-08-04NFSv4.1: Only reap expired delegationsTrond Myklebust
Fix nfs_reap_expired_delegations() to ensure that we only reap delegations that are actually expired, rather than triggering on random errors. Fixes: 45870d6909d5a ("NFSv4.1: Test delegation stateids when server...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4.1: Fix open stateid recoveryTrond Myklebust
The logic for checking in nfs41_check_open_stateid() whether the state is supported by a delegation is inverted. In addition, it makes more sense to perform that check before we check for expired locks. Fixes: 8a64c4ef106d1 ("NFSv4.1: Even if the stateid is OK,...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Report the error from nfs4_select_rw_stateid()Trond Myklebust
In pnfs_update_layout() ensure that we do report any fatal errors from nfs4_select_rw_stateid(). Fixes: d9aba2b40de6 ("NFSv4: Don't use the zero stateid with layoutget") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: When recovering state fails with EAGAIN, retry the same recoveryTrond Myklebust
If the server returns with EAGAIN when we're trying to recover from a server reboot, we currently delay for 1 second, but then mark the stateid as needing recovery after the grace period has expired. Instead, we should just retry the same recovery process immediately after the 1 second delay. Break out of the loop after 10 retries. Fixes: 35a61606a612 ("NFS: Reduce indentation of the switch statement...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Print an error in the syslog when state is marked as irrecoverableTrond Myklebust
When error recovery fails due to a fatal error on the server, ensure we log it in the syslog. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-04NFSv4: Fix delegation state recoveryTrond Myklebust
Once we clear the NFS_DELEGATED_STATE flag, we're telling nfs_delegation_claim_opens() that we're done recovering all open state for that stateid, so we really need to ensure that we test for all open modes that are currently cached and recover them before exiting nfs4_open_delegation_recall(). Fixes: 24311f884189d ("NFSv4: Recovery of recalled read delegations...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.3+
2019-08-04NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateidTrond Myklebust
It is unsafe to dereference delegation outside the rcu lock, and in any case, the refcount is guaranteed held if cred is non-zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-20Merge branch 'work.dcache2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache and mountpoint updates from Al Viro: "Saner handling of refcounts to mountpoints. Transfer the counting reference from struct mount ->mnt_mountpoint over to struct mountpoint ->m_dentry. That allows us to get rid of the convoluted games with ordering of mount shutdowns. The cost is in teaching shrink_dcache_{parent,for_umount} to cope with mixed-filesystem shrink lists, which we'll also need for the Slab Movable Objects patchset" * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch the remnants of releasing the mountpoint away from fs_pin get rid of detach_mnt() make struct mountpoint bear the dentry reference to mountpoint, not struct mount Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt() __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore nfs: dget_parent() never returns NULL ceph: don't open-code the check for dead lockref
2019-07-18Merge tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - SUNRPC: Ensure bvecs are re-synced when we re-encode the RPC request - Fix an Oops in ff_layout_track_ds_error due to a PTR_ERR() dereference - Revert buggy NFS readdirplus optimisation - NFSv4: Handle the special Linux file open access mode - pnfs: Fix a problem where we gratuitously start doing I/O through the MDS Features: - Allow NFS client to set up multiple TCP connections to the server using a new 'nconnect=X' mount option. Queue length is used to balance load. - Enhance statistics reporting to report on all transports when using multiple connections. - Speed up SUNRPC by removing bh-safe spinlocks - Add a mechanism to allow NFSv4 to request that containers set a unique per-host identifier for when the hostname is not set. - Ensure NFSv4 updates the lease_time after a clientid update Bugfixes and cleanup: - Fix use-after-free in rpcrdma_post_recvs - Fix a memory leak when nfs_match_client() is interrupted - Fix buggy file access checking in NFSv4 open for execute - disable unsupported client side deduplication - Fix spurious client disconnections - Fix occasional RDMA transport deadlock - Various RDMA cleanups - Various tracepoint fixes - Fix the TCP callback channel to guarantee the server can actually send the number of callback requests that was negotiated at mount time" * tag 'nfs-for-5.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits) pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDS pnfs: Fix a problem where we gratuitously start doing I/O through the MDS SUNRPC: Optimise transport balancing code SUNRPC: Ensure the bvecs are reset when we re-encode the RPC request pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_error NFSv4: Don't use the zero stateid with layoutget SUNRPC: Fix up backchannel slot table accounting SUNRPC: Fix initialisation of struct rpc_xprt_switch SUNRPC: Skip zero-refcount transports SUNRPC: Replace division by multiplication in calculation of queue length NFSv4: Validate the stateid before applying it to state recovery nfs4.0: Refetch lease_time after clientid update nfs4: Rename nfs41_setup_state_renewal nfs4: Make nfs4_proc_get_lease_time available for nfs4.0 nfs: Fix copy-and-paste error in debug message NFS: Replace 16 seq_printf() calls by seq_puts() NFS: Use seq_putc() in nfs_show_stats() Revert "NFS: readdirplus optimization by cache mechanism" (memleak) SUNRPC: Fix transport accounting when caller specifies an rpc_xprt NFS: Record task, client ID, and XID in xdr_status trace points ...
2019-07-18pnfs/flexfiles: Add tracepoints for detecting pnfs fallback to MDSTrond Myklebust
Add tracepoints to allow debugging of the event chain leading to a pnfs fallback to doing I/O through the MDS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18pnfs: Fix a problem where we gratuitously start doing I/O through the MDSTrond Myklebust
If the client has to stop in pnfs_update_layout() to wait for another layoutget to complete, it currently exits and defaults to I/O through the MDS if the layoutget was successful. Fixes: d03360aaf5cc ("pNFS: Ensure we return the error if someone kills...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+
2019-07-18pnfs/flexfiles: Fix PTR_ERR() dereferences in ff_layout_track_ds_errorTrond Myklebust
mirror->mirror_ds can be NULL if uninitialised, but can contain a PTR_ERR() if call to GETDEVICEINFO failed. Fixes: 65990d1afbd2 ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # 4.10+
2019-07-18NFSv4: Don't use the zero stateid with layoutgetTrond Myklebust
The NFSv4.1 protocol explicitly forbids us from using the zero stateid together with layoutget, so when we see that nfs4_select_rw_stateid() is unable to return a valid delegation, lock or open stateid, then we should initiate recovery and retry. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-18SUNRPC: Fix up backchannel slot table accountingTrond Myklebust
Add a per-transport maximum limit in the socket case, and add helpers to allow the NFSv4 code to discover that limit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-15NFSv4: Validate the stateid before applying it to state recoveryTrond Myklebust
If the stateid is the zero or invalid stateid, then it is pointless to attempt to use it for recovery. In that case, try to fall back to using the open state stateid, or just doing a general recovery of all state on a given inode. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13nfs4.0: Refetch lease_time after clientid updateDonald Buczek
RFC 7530 requires us to refetch the lease time attribute once a new clientID is established. This is already implemented for the nfs4.1(+) clients by nfs41_init_clientid, which calls nfs41_finish_session_reset, which calls nfs4_setup_state_renewal. To make nfs4_setup_state_renewal available for nfs4.0, move it further to the top of the source file to include it regardles of CONFIG_NFS_V4_1 and to save a forward declaration. Call nfs4_setup_state_renewal from nfs4_init_clientid. Signed-off-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13nfs4: Rename nfs41_setup_state_renewalDonald Buczek
The function nfs41_setup_state_renewal is useful to the nfs 4.0 client as well, so rename the function to nfs4_setup_state_renewal. Signed-off-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13nfs4: Make nfs4_proc_get_lease_time available for nfs4.0Donald Buczek
Compile nfs4_proc_get_lease_time, enc_get_lease_time and dec_get_lease_time for nfs4.0. Use nfs4_sequence_done instead of nfs41_sequence_done in nfs4_proc_get_lease_time, Signed-off-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13nfs: Fix copy-and-paste error in debug messageDonald Buczek
The debug message of decode_attr_lease_time incorrectly says "file size". Fix it to "lease time". Signed-off-by: Donald Buczek <buczek@molgen.mpg.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13NFS: Replace 16 seq_printf() calls by seq_puts()Markus Elfring
Some strings should be put into a sequence. Thus use the corresponding function “seq_puts”. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-13NFS: Use seq_putc() in nfs_show_stats()Markus Elfring
A single character (line break) should be put into a sequence. Thus use the corresponding function “seq_putc”. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-12Revert "NFS: readdirplus optimization by cache mechanism" (memleak)Max Kellermann
This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564. That commit caused a severe memory leak in nfs_readdir_make_qstr(). When listing a directory with more than 100 files (this is how many struct nfs_cache_array_entry elements fit in one 4kB page), all allocated file name strings past those 100 leak. The root of the leakage is that those string pointers are managed in pages which are never linked into the page cache. fs/nfs/dir.c puts pages into the page cache by calling read_cache_page(); the callback function nfs_readdir_filler() will then fill the given page struct which was passed to it, which is already linked in the page cache (by do_read_cache_page() calling add_to_page_cache_lru()). Commit be4c2d4723a4 added another (local) array of allocated pages, to be filled with more data, instead of discarding excess items received from the NFS server. Those additional pages can be used by the next nfs_readdir_filler() call (from within the same nfs_readdir() call). The leak happens when some of those additional pages are never used (copied to the page cache using copy_highpage()). The pages will be freed by nfs_readdir_free_pages(), but their contents will not. The commit did not invoke nfs_readdir_clear_array() (and doing so would have been dangerous, because it did not track which of those pages were already copied to the page cache, risking double free bugs). How to reproduce the leak: - Use a kernel with CONFIG_SLUB_DEBUG_ON. - Create a directory on a NFS mount with more than 100 files with names long enough to use the "kmalloc-32" slab (so we can easily look up the allocation counts): for i in `seq 110`; do touch ${i}_0123456789abcdef; done - Drop all caches: echo 3 >/proc/sys/vm/drop_caches - Check the allocation counter: grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1 - Request a directory listing and check the allocation counters again: ls [...] grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1 There are now 120 new allocations. - Drop all caches and check the counters again: echo 3 >/proc/sys/vm/drop_caches grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1 110 allocations are gone, but 10 have leaked and will never be freed. Unhelpfully, those allocations are explicitly excluded from KMEMLEAK, that's why my initial attempts with KMEMLEAK were not successful: /* * Avoid a kmemleak false positive. The pointer to the name is stored * in a page cache page which kmemleak does not scan. */ kmemleak_not_leak(string->name); It would be possible to solve this bug without reverting the whole commit: - keep track of which pages were not used, and call nfs_readdir_clear_array() on them, or - manually link those pages into the page cache But for now I have decided to just revert the commit, because the real fix would require complex considerations, risking more dangerous (crash) bugs, which may seem unsuitable for the stable branches. Signed-off-by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-12Merge tag 'nfs-rdma-for-5.3-1' of ↵Trond Myklebust
git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA client updates for 5.3 New features: - Add a way to place MRs back on the free list - Reduce context switching - Add new trace events Bugfixes and cleanups: - Fix a BUG when tracing is enabled with NFSv4.1 - Fix a use-after-free in rpcrdma_post_recvs - Replace use of xdr_stream_pos in rpcrdma_marshal_req - Fix occasional transport deadlock - Fix show_nfs_errors macros, other tracing improvements - Remove RPCRDMA_REQ_F_PENDING and fr_state - Various simplifications and refactors
2019-07-10Merge tag 'copy-file-range-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull copy_file_range updates from Darrick Wong: "This fixes numerous parameter checking problems and inconsistent behaviors in the new(ish) copy_file_range system call. Now the system call will actually check its range parameters correctly; refuse to copy into files for which the caller does not have sufficient privileges; update mtime and strip setuid like file writes are supposed to do; and allows copying up to the EOF of the source file instead of failing the call like we used to. Summary: - Create a generic copy_file_range handler and make individual filesystems responsible for calling it (i.e. no more assuming that do_splice_direct will work or is appropriate) - Refactor copy_file_range and remap_range parameter checking where they are the same - Install missing copy_file_range parameter checking(!) - Remove suid/sgid and update mtime like any other file write - Change the behavior so that a copy range crossing the source file's eof will result in a short copy to the source file's eof instead of EINVAL - Permit filesystems to decide if they want to handle cross-superblock copy_file_range in their local handlers" * tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fuse: copy_file_range needs to strip setuid bits and update timestamps vfs: allow copy_file_range to copy across devices xfs: use file_modified() helper vfs: introduce file_modified() helper vfs: add missing checks to copy_file_range vfs: remove redundant checks from generic_remap_checks() vfs: introduce generic_file_rw_checks() vfs: no fallback for ->copy_file_range vfs: introduce generic_copy_file_range()
2019-07-10Merge tag 'fsnotify_for_v5.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "This contains cleanups of the fsnotify name removal hook and also a patch to disable fanotify permission events for 'proc' filesystem" * tag 'fsnotify_for_v5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: get rid of fsnotify_nameremove() fsnotify: move fsnotify_nameremove() hook out of d_delete() configfs: call fsnotify_rmdir() hook debugfs: call fsnotify_{unlink,rmdir}() hooks debugfs: simplify __debugfs_remove_file() devpts: call fsnotify_unlink() hook tracefs: call fsnotify_{unlink,rmdir}() hooks rpc_pipefs: call fsnotify_{unlink,rmdir}() hooks btrfs: call fsnotify_rmdir() hook fsnotify: add empty fsnotify_{unlink,rmdir}() hooks fanotify: Disallow permission events for proc filesystem
2019-07-10Revert "Merge tag 'keys-acl-20190703' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs" This reverts merge 0f75ef6a9cff49ff612f7ce0578bced9d0b38325 (and thus effectively commits 7a1ade847596 ("keys: Provide KEYCTL_GRANT_PERMISSION") 2e12256b9a76 ("keys: Replace uid/gid/perm permissions checking with an ACL") that the merge brought in). It turns out that it breaks booting with an encrypted volume, and Eric biggers reports that it also breaks the fscrypt tests [1] and loading of in-kernel X.509 certificates [2]. The root cause of all the breakage is likely the same, but David Howells is off email so rather than try to work it out it's getting reverted in order to not impact the rest of the merge window. [1] https://lore.kernel.org/lkml/20190710011559.GA7973@sol.localdomain/ [2] https://lore.kernel.org/lkml/20190710013225.GB7973@sol.localdomain/ Link: https://lore.kernel.org/lkml/CAHk-=wjxoeMJfeBahnWH=9zShKp2bsVy527vo3_y8HfOdhwAAw@mail.gmail.com/ Reported-by: Eric Biggers <ebiggers@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-09NFS: Record task, client ID, and XID in xdr_status trace pointsChuck Lever
When triggering an nfs_xdr_status trace point, record the task ID and XID of the failing RPC to better pinpoint the problem. This feels like a bit of a layering violation. Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09NFS: Update symbolic flags displayed by trace eventsChuck Lever
Add missing symbolic flag names and display flags variables in hexadecimal to improve observability. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09NFS: Display symbolic status code names in trace logChuck Lever
For improved readability, add nfs_show_status() call-sites in the generic NFS trace points so that the symbolic status code name is displayed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09NFS: Fix show_nfs_errors macros againChuck Lever
I noticed that NFS status values stopped working again. trace_print_symbols_seq() takes an unsigned long. Passing a negative errno or negative NFSERR value just confuses it, and since we're using C macros here and not static inline functions, all bets are off due to implicit type conversion. Straight-line the calling conventions so that error codes are stored in the trace record as positive values in an unsigned long field, mapped to symbolic as an unsigned long, and displayed as a negative value, to continue to enable grepping on "error=-". It's often the case that an error value that is positive is a byte count but when it's negative, it's an error (e.g. nfs4_write). Fix those cases so that the value that is eventually stored in the error field is a positive NFS status or errno, or zero. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-09NFS4: Add a trace event to record invalid CB sequence IDsChuck Lever
Help debug NFSv4 callback failures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-07-08Merge tag 'keys-acl-20190703' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring ACL support from David Howells: "This changes the permissions model used by keys and keyrings to be based on an internal ACL by the following means: - Replace the permissions mask internally with an ACL that contains a list of ACEs, each with a specific subject with a permissions mask. Potted default ACLs are available for new keys and keyrings. ACE subjects can be macroised to indicate the UID and GID specified on the key (which remain). Future commits will be able to add additional subject types, such as specific UIDs or domain tags/namespaces. Also split a number of permissions to give finer control. Examples include splitting the revocation permit from the change-attributes permit, thereby allowing someone to be granted permission to revoke a key without allowing them to change the owner; also the ability to join a keyring is split from the ability to link to it, thereby stopping a process accessing a keyring by joining it and thus acquiring use of possessor permits. - Provide a keyctl to allow the granting or denial of one or more permits to a specific subject. Direct access to the ACL is not granted, and the ACL cannot be viewed" * tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Provide KEYCTL_GRANT_PERMISSION keys: Replace uid/gid/perm permissions checking with an ACL
2019-07-08Merge tag 'keys-namespace-20190627' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyring namespacing from David Howells: "These patches help make keys and keyrings more namespace aware. Firstly some miscellaneous patches to make the process easier: - Simplify key index_key handling so that the word-sized chunks assoc_array requires don't have to be shifted about, making it easier to add more bits into the key. - Cache the hash value in the key so that we don't have to calculate on every key we examine during a search (it involves a bunch of multiplications). - Allow keying_search() to search non-recursively. Then the main patches: - Make it so that keyring names are per-user_namespace from the point of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not accessible cross-user_namespace. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this. - Move the user and user-session keyrings to the user_namespace rather than the user_struct. This prevents them propagating directly across user_namespaces boundaries (ie. the KEY_SPEC_* flags will only pick from the current user_namespace). - Make it possible to include the target namespace in which the key shall operate in the index_key. This will allow the possibility of multiple keys with the same description, but different target domains to be held in the same keyring. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this. - Make it so that keys are implicitly invalidated by removal of a domain tag, causing them to be garbage collected. - Institute a network namespace domain tag that allows keys to be differentiated by the network namespace in which they operate. New keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned the network domain in force when they are created. - Make it so that the desired network namespace can be handed down into the request_key() mechanism. This allows AFS, NFS, etc. to request keys specific to the network namespace of the superblock. This also means that the keys in the DNS record cache are thenceforth namespaced, provided network filesystems pass the appropriate network namespace down into dns_query(). For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other cache keyrings, such as idmapper keyrings, also need to set the domain tag - for which they need access to the network namespace of the superblock" * tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: keys: Pass the network namespace into request_key mechanism keys: Network namespace domain tag keys: Garbage collect keys for which the domain has been removed keys: Include target namespace in match criteria keys: Move the user and user-session keyrings to the user_namespace keys: Namespace keyring names keys: Add a 'recurse' flag for keyring searches keys: Cache the hash value to avoid lots of recalculation keys: Simplify key description management
2019-07-06NFS: Cleanup if nfs_match_client is interruptedBenjamin Coddington
Don't bail out before cleaning up a new allocation if the wait for searching for a matching nfs client is interrupted. Memory leaks. Reported-by: syzbot+7fe11b49c1cc30e3fce2@syzkaller.appspotmail.com Fixes: 950a578c6128 ("NFS: make nfs_match_client killable") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06nfs: disable client side deduplicationDarrick J. Wong
The NFS protocol doesn't support deduplication, so turn it off again. Fixes: ce96e888fe48e ("Fix nfs4.2 return -EINVAL when do dedupe operation") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06NFSv4: Add lease_time and lease_expired to 'nfs4:' line of mountstatsDave Wysochanski
On the NFS client there is no low-impact way to determine the nfs4 lease time or whether the lease is expired, so add these to mountstats with times displayed in seconds. If the lease is not expired, display lease_expired=0. Otherwise, display lease_expired=seconds_since_expired, similar to 'age:' line in mountstats. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06NFS: Clean up writeback codeTrond Myklebust
Now that the VM promises never to recurse back into the filesystem layer on writeback, remove all the GFP_NOFS references etc from the generic writeback code. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06Merge branch 'multipath_tcp'Trond Myklebust
2019-07-06Merge branch 'containers'Trond Myklebust
2019-07-06NFS: send state management on a single connection.NeilBrown
With NFSv4.1, different network connections need to be explicitly bound to a session. During session startup, this is not possible so only a single connection must be used for session startup. So add a task flag to disable the default round-robin choice of connections (when nconnect > 1) and force the use of a single connection. Then use that flag on all requests for session management - for consistence, include NFSv4.0 management (SETCLIENTID) and session destruction Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06NFS: Allow multiple connections to a NFSv2 or NFSv3 serverTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06NFS: Display the "nconnect" mount option if it is set.Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2019-07-06pNFS: Allow multiple connections to the DSTrond Myklebust
If the user specifies -onconnect=<number> mount option, and the transport protocol is TCP, then set up <number> connections to the pNFS data server as well. The connections will all go to the same IP address. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2019-07-06NFSv4: Allow multiple connections to NFSv4.x (x>0) serversTrond Myklebust
If the user specifies the -onconn=<number> mount option, and the transport protocol is TCP, then set up <number> connections to the server. The connections will all go to the same IP address. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2019-07-06NFS: Add a mount option to specify number of TCP connections to useTrond Myklebust
Allow the user to specify that the client should use multiple connections to the server. For the moment, this functionality will be limited to TCP and to NFSv4.x (x>0). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2019-07-06NFS: Add sysfs support for per-container identifierTrond Myklebust
In order to identify containers to the NFS client, we add a per-net sysfs attribute that udev can fill with the appropriate identifier. The identifier could be a unique hostname, but in most cases it will probably be a persisted uuid. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06NFS: Add deferred cache invalidation for close-to-open consistency violationsTrond Myklebust
If the client detects that close-to-open cache consistency has been violated, and that the file or directory has been changed on the server, then do a cache invalidation when we're done working with the file. The reason we don't do an immediate cache invalidation is that we want to avoid performance problems due to false positives. Also, note that we cannot guarantee cache consistency in this situation even if we do invalidate the cache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06NFS: Cleanup - add nfs_clients_exit to mirror nfs_clients_initTrond Myklebust
Add a helper to clean up the struct nfs_net when it is being destroyed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>