summaryrefslogtreecommitdiff
path: root/fs/afs
AgeCommit message (Collapse)Author
2019-04-25afs: Add file locking tracepointsDavid Howells
Add two tracepoints for monitoring AFS file locking. Firstly, add one that follows the operational part: echo 1 >/sys/kernel/debug/tracing/events/afs/afs_flock_op/enable And add a second that more follows the event-driven part: echo 1 >/sys/kernel/debug/tracing/events/afs/afs_flock_ev/enable Individual file_lock structs seen by afs are tagged with debugging IDs that are displayed in the trace log to make it easier to see what's going on, especially as setting the first lock always seems to involve copying the file_lock twice. Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25afs: Further fix file lockingDavid Howells
Further fix the file locking in the afs filesystem client in a number of ways, including: (1) Don't submit the operation to obtain a lock from the server in a work queue context, but rather do it in the process context of whoever issued the requesting system call. (2) The owner of the file_lock struct at the front of the pending_locks queue now owns right to talk to the server. (3) Write locks can be instantly granted if they don't overlap with any other locks *and* we have a write lock on the server. (4) In the event of an authentication/permission error, all other matching pending locks requests are also immediately aborted. (5) Properly use VFS core locks_lock_file_wait() to distribute the server lock amongst local client locks, including waiting for the lock to become available. Test with: sqlite3 /afs/.../scratch/billings.sqlite <<EOF CREATE TABLE hosts ( hostname varchar(80), shorthost varchar(80), room varchar(30), building varchar(30), PRIMARY KEY(shorthost) ); EOF With the version of sqlite3 that I have, this should fail consistently with EAGAIN, whether or not the program is straced (which introduces some delays between lock syscalls). Fixes: 0fafdc9f888b ("afs: Fix file locking") Reported-by: Jonathan Billings <jsbillin@umich.edu> Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25afs: Fix AFS file locking to allow fine grained locksDavid Howells
Fix AFS file locking to allow fine grained locks as some applications, such as firefox, won't work if they can't take such locks on certain state files - thereby preventing the use of kAFS to distribute a home directory. Note that this cannot be made completely functional as the protocol only has provision for whole-file locks, so there exists the possibility of a process deadlocking itself by getting a partial read-lock on a file first and then trying to get a non-overlapping write-lock - but we got the server's read lock with the first lock, so we're now stuck. OpenAFS solves this by just granting any partial-range lock directly without consulting the server - and hoping there's no remote collision. I want to implement that in a separate patch and it requires a bit more thought. Fixes: 8d6c554126b8 ("AFS: implement file locking") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25afs: Calculate lock extend timer from set/extend reply receptionDavid Howells
Record the timestamp on the first reply DATA packet received in response to a set- or extend-lock operation, then use this to calculate the time remaining till the lock expires rather than using whatever time the requesting process wakes up and finishes processing the operation as a base. Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25afs: Split wait from afs_make_call()David Howells
Split the call to afs_wait_for_call_to_complete() from afs_make_call() to make it easier to handle asynchronous calls and to make it easier to convert a synchronous call to an asynchronous one in future, for instance when someone tries to interrupt an operation by pressing Ctrl-C. Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13afs: Fix in-progess ops to ignore server-level callback invalidationDavid Howells
The in-kernel afs filesystem client counts the number of server-level callback invalidation events (CB.InitCallBackState* RPC operations) that it receives from the server. This is stored in cb_s_break in various structures, including afs_server and afs_vnode. If an inode is examined by afs_validate(), say, the afs_server copy is compared, along with other break counters, to those in afs_vnode, and if one or more of the counters do not match, it is considered that the server's callback promise is broken. At points where this happens, AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be refetched from the server. afs_validate() issues an FS.FetchStatus operation to get updated metadata - and based on the updated data_version may invalidate the pagecache too. However, the break counters are also used to determine whether to note a new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag) and whether to cache the permit data included in the YFSFetchStatus record by the server. The problem comes when the server sends us a CB.InitCallBackState op. The first such instance doesn't cause cb_s_break to be incremented, but rather causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours after last use and all the volumes have been automatically unmounted and the server has forgotten about the client[*], this *will* likely cause an increment. [*] There are other circumstances too, such as the server restarting or needing to make space in its callback table. Note that the server won't send us a CB.InitCallBackState op until we talk to it again. So what happens is: (1) A mount for a new volume is attempted, a inode is created for the root vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set immediately, as we don't have a nominated server to talk to yet - and we may iterate through a few to find one. (2) Before the operation happens, afs_fetch_status(), say, notes in the cursor (fc.cb_break) the break counter sum from the vnode, volume and server counters, but the server->cb_s_break is currently 0. (3) We send FS.FetchStatus to the server. The server sends us back CB.InitCallBackState. We increment server->cb_s_break. (4) Our FS.FetchStatus completes. The reply includes a callback record. (5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether the callback promise was broken by checking the break counter sum from step (2) against the current sum. This fails because of step (3), so we don't set the callback record and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode. This does not preclude the syscall from progressing, and we don't loop here rechecking the status, but rather assume it's good enough for one round only and will need to be rechecked next time. (6) afs_validate() it triggered on the vnode, probably called from d_revalidate() checking the parent directory. (7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't update vnode->cb_s_break and assumes the vnode to be invalid. (8) afs_validate() needs to calls afs_fetch_status(). Go back to step (2) and repeat, every time the vnode is validated. This primarily affects volume root dir vnodes. Everything subsequent to those inherit an already incremented cb_s_break upon mounting. The issue is that we assume that the callback record and the cached permit information in a reply from the server can't be trusted after getting a server break - but this is wrong since the server makes sure things are done in the right order, holding up our ops if necessary[*]. [*] There is an extremely unlikely scenario where a reply from before the CB.InitCallBackState could get its delivery deferred till after - at which point we think we have a promise when we don't. This, however, requires unlucky mass packet loss to one call. AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from a server we've never contacted before, but this should be unnecessary. It's also further insulated from the problem on an initial mount by querying the server first with FS.GetCapabilities, which triggers the CB.InitCallBackState. Fix this by (1) Remove AFS_SERVER_FL_NEW. (2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the calculation. (3) In afs_cb_is_broken(), don't include cb_s_break in the check. Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13afs: Unlock pages for __pagevec_release()Marc Dionne
__pagevec_release() complains loudly if any page in the vector is still locked. The pages need to be locked for generic_error_remove_page(), but that function doesn't actually unlock them. Unlock the pages afterwards. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jonathan Billings <jsbillin@umich.edu>
2019-04-13afs: Differentiate abort due to unmarshalling from other errorsDavid Howells
Differentiate an abort due to an unmarshalling error from an abort due to other errors, such as ENETUNREACH. It doesn't make sense to set abort code RXGEN_*_UNMARSHAL in such a case, so use RX_USER_ABORT instead. Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13afs: Avoid section confusion in CM_NAMEAndi Kleen
__tracepoint_str cannot be const because the tracepoint_str section is not read-only. Remove the stray const. Cc: dhowells@redhat.com Cc: viro@zeniv.linux.org.uk Signed-off-by: Andi Kleen <ak@linux.intel.com>
2019-04-13afs: avoid deprecated get_seconds()Arnd Bergmann
get_seconds() has a limited range on 32-bit architectures and is deprecated because of that. While AFS uses the same limits for its inode timestamps on the wire protocol, let's just use the simpler current_time() as we do for other file systems. This will still zero out the 'tv_nsec' field of the timestamps internally. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-12afs: Check for rxrpc call completion in wait loopMarc Dionne
Check the state of the rxrpc call backing an afs call in each iteration of the call wait loop in case the rxrpc call has already been terminated at the rxrpc layer. Interrupt the wait loop and mark the afs call as complete if the rxrpc layer call is complete. There were cases where rxrpc errors were not passed up to afs, which could result in this loop waiting forever for an afs call to transition to AFS_CALL_COMPLETE while the rx call was already complete. Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12rxrpc: Make rxrpc_kernel_check_life() indicate if call completedMarc Dionne
Make rxrpc_kernel_check_life() pass back the life counter through the argument list and return true if the call has not yet completed. Suggested-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08afs: Mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that in many cases I placed a /* Fall through */ comment at the bottom of the case, which what GCC is expecting to find. In other cases I had to tweak a bit the format of the comments. This patch suppresses ALL missing-break-in-switch false positives in fs/afs Addresses-Coverity-ID: 115042 ("Missing break in switch") Addresses-Coverity-ID: 115043 ("Missing break in switch") Addresses-Coverity-ID: 115045 ("Missing break in switch") Addresses-Coverity-ID: 1357430 ("Missing break in switch") Addresses-Coverity-ID: 115047 ("Missing break in switch") Addresses-Coverity-ID: 115050 ("Missing break in switch") Addresses-Coverity-ID: 115051 ("Missing break in switch") Addresses-Coverity-ID: 1467806 ("Missing break in switch") Addresses-Coverity-ID: 1467807 ("Missing break in switch") Addresses-Coverity-ID: 1467811 ("Missing break in switch") Addresses-Coverity-ID: 115041 ("Missing break in switch") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-03-28afs: Fix StoreData op marshallingDavid Howells
The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls generated by ->setattr() ops for the purpose of expanding a file is incorrect due to older documentation incorrectly describing the way the RPC 'FileLength' parameter is meant to work. The older documentation says that this is the length the file is meant to end up at the end of the operation; however, it was never implemented this way in any of the servers, but rather the file is truncated down to this before the write operation is effected, and never expanded to it (and, indeed, it was renamed to 'TruncPos' in 2014). Fix this by setting the position parameter to the new file length and doing a zero-lengh write there. The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file it then mmaps. This can be tested by giving the following test program a filename in an AFS directory: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/mman.h> int main(int argc, char *argv[]) { char *p; int fd; if (argc != 2) { fprintf(stderr, "Format: test-trunc-mmap <file>\n"); exit(2); } fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC); if (fd < 0) { perror(argv[1]); exit(1); } if (ftruncate(fd, 0x140008) == -1) { perror("ftruncate"); exit(1); } p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (p == MAP_FAILED) { perror("mmap"); exit(1); } p[0] = 'a'; if (munmap(p, 4096) < 0) { perror("munmap"); exit(1); } if (close(fd) < 0) { perror("close"); exit(1); } exit(0); } Fixes: 31143d5d515e ("AFS: implement basic file write support") Reported-by: Jonathan Billings <jsbillin@umich.edu> Tested-by: Jonathan Billings <jsbillin@umich.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12Merge branch 'work.mount' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs mount infrastructure updates from Al Viro: "The rest of core infrastructure; no new syscalls in that pile, but the old parts are switched to new infrastructure. At that point conversions of individual filesystems can happen independently; some are done here (afs, cgroup, procfs, etc.), there's also a large series outside of that pile dealing with NFS (quite a bit of option-parsing stuff is getting used there - it's one of the most convoluted filesystems in terms of mount-related logics), but NFS bits are the next cycle fodder. It got seriously simplified since the last cycle; documentation is probably the weakest bit at the moment - I considered dropping the commit introducing Documentation/filesystems/mount_api.txt (cutting the size increase by quarter ;-), but decided that it would be better to fix it up after -rc1 instead. That pile allows to do followup work in independent branches, which should make life much easier for the next cycle. fs/super.c size increase is unpleasant; there's a followup series that allows to shrink it considerably, but I decided to leave that until the next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits) afs: Use fs_context to pass parameters over automount afs: Add fs_context support vfs: Add some logging to the core users of the fs_context log vfs: Implement logging through fs_context vfs: Provide documentation for new mount API vfs: Remove kern_mount_data() hugetlbfs: Convert to fs_context cpuset: Use fs_context kernfs, sysfs, cgroup, intel_rdt: Support fs_context cgroup: store a reference to cgroup_ns into cgroup_fs_context cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper cgroup_do_mount(): massage calling conventions cgroup: stash cgroup_root reference into cgroup_fs_context cgroup2: switch to option-by-option parsing cgroup1: switch to option-by-option parsing cgroup: take options parsing into ->parse_monolithic() cgroup: fold cgroup1_mount() into cgroup1_get_tree() cgroup: start switching to fs_context ipc: Convert mqueue fs to fs_context proc: Add fs_context support to procfs ...
2019-03-12mm: refactor readahead defines in mm.hNikolay Borisov
All users of VM_MAX_READAHEAD actually convert it to kbytes and then to pages. Define the macro explicitly as (SZ_128K / PAGE_SIZE). This simplifies the expression in every filesystem. Also rename the macro to VM_READAHEAD_PAGES to properly convey its meaning. Finally remove unused VM_MIN_READAHEAD [akpm@linux-foundation.org: fix fs/io_uring.c, per Stephen] Link: http://lkml.kernel.org/r/20181221144053.24318-1-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: David Howells <dhowells@redhat.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: David Sterba <dsterba@suse.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-28afs: Use fs_context to pass parameters over automountDavid Howells
Alter the AFS automounting code to create and modify an fs_context struct when parameterising a new mount triggered by an AFS mountpoint rather than constructing device name and option strings. Also remove the cell=, vol= and rwpath options as they are then redundant. The reason they existed is because the 'device name' may be derived literally from a mountpoint object in the filesystem, so default cell and parent-type information needed to be passed in by some other method from the automount routines. The vol= option didn't end up being used. Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric W. Biederman <ebiederm@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28afs: Add fs_context supportDavid Howells
Add fs_context support to the AFS filesystem, converting the parameter parsing to store options there. This will form the basis for namespace propagation over mountpoints within the AFS model, thereby allowing AFS to be used in containers more easily. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-25afs: Fix manually set volume location server listDavid Howells
When a cell with a volume location server list is added manually by echoing the details into /proc/net/afs/cells, a record is added but the flag saying it has been looked up isn't set. This causes the VL server rotation code to wait forever, with the top of /proc/pid/stack looking like: afs_select_vlserver+0x3a6/0x6f3 afs_vl_lookup_vldb+0x4b/0x92 afs_create_volume+0x25/0x1b9 ... with the thread stuck in afs_start_vl_iteration() waiting for AFS_CELL_FL_NO_LOOKUP_YET to be cleared. Fix this by clearing AFS_CELL_FL_NO_LOOKUP_YET when setting up a record if that record's details were supplied manually. Fixes: 0a5143f2f89c ("afs: Implement VL server rotation") Reported-by: Dave Botsch <dwb7@cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-17afs: Fix race in async call refcountingDavid Howells
There's a race between afs_make_call() and afs_wake_up_async_call() in the case that an error is returned from rxrpc_kernel_send_data() after it has queued the final packet. afs_make_call() will try and clean up the mess, but the call state may have been moved on thereby causing afs_process_async_call() to also try and to delete the call. Fix this by: (1) Getting an extra ref for an asynchronous call for the call itself to hold. This makes sure the call doesn't evaporate on us accidentally and will allow the call to be retained by the caller in a future patch. The ref is released on leaving afs_make_call() or afs_wait_for_call_to_complete(). (2) In the event of an error from rxrpc_kernel_send_data(): (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the call has been aborted and ended. This prevents afs_deliver_to_call() from doing anything with any notifications it gets. (b) Explicitly end the call immediately to prevent further callbacks. (c) Cancel any queued async_work and wait for the work if it's executing. This allows us to be sure the race won't recur when we change the state. We put the work queue's ref on the call if we managed to cancel it. (d) Put the call's ref that we got in (1). This belongs to us as long as the call is in state AFS_CALL_CL_REQUESTING. Fixes: 341f741f04be ("afs: Refcount the afs_call struct") Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17afs: Provide a function to get a ref on a callDavid Howells
Provide a function to get a reference on an afs_call struct. Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17afs: Fix key refcounting in file locking codeDavid Howells
Fix the refcounting of the authentication keys in the file locking code. The vnode->lock_key member points to a key on which it expects to be holding a ref, but it isn't always given an extra ref, however. Fixes: 0fafdc9f888b ("afs: Fix file locking") Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17afs: Don't set vnode->cb_s_break in afs_validate()Marc Dionne
A cb_interest record is not necessarily attached to the vnode on entry to afs_validate(), which can cause an oops when we try to bring the vnode's cb_s_break up to date in the default case (ie. no current callback promise and the vnode has not been deleted). Fix this by simply removing the line, as vnode->cb_s_break will be set when needed by afs_register_server_cb_interest() when we next get a callback promise from RPC call. The oops looks something like: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 ... RIP: 0010:afs_validate+0x66/0x250 [kafs] ... Call Trace: afs_d_revalidate+0x8d/0x340 [kafs] ? __d_lookup+0x61/0x150 lookup_dcache+0x44/0x70 ? lookup_dcache+0x44/0x70 __lookup_hash+0x24/0xa0 do_unlinkat+0x11d/0x2c0 __x64_sys_unlink+0x23/0x30 do_syscall_64+0x4d/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: ae3b7361dc0e ("afs: Fix validation/callback interaction") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-10afs: Set correct lock type for the yfs CreateFileMarc Dionne
A lock type of 0 is "LockRead", which makes the fileserver record an unintentional read lock on the new file. This will cause problems later on if the file is the subject of locking operations. The correct default value should be -1 ("LockNone"). Fix the operation marshalling code to set the value and provide an enum to symbolise the values whilst we're at it. Fixes: 30062bd13e36 ("afs: Implement YFS support in the fs client") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-10afs: Use struct_size() in kzalloc()Gustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-04fs: don't open code lru_to_page()Nikolay Borisov
Multiple filesystems open code lru_to_page(). Rectify this by moving the macro from mm_inline (which is specific to lru stuff) to the more generic mm.h header and start using the macro where appropriate. No functional changes. Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Pankaj gupta <pagupta@redhat.com> Acked-by: "Yan, Zheng" <zyan@redhat.com> [ceph] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04fs/: remove caller signal_pending branch predictionsDavidlohr Bueso
This is already done for us internally by the signal machinery. [akpm@linux-foundation.org: fix fs/buffer.c] Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-30Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs fixes from Al Viro: "Assorted fixes all over the place. The iov_iter one is this cycle regression (splice from UDP triggering WARN_ON()), the rest is older" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Use d_instantiate() rather than d_add() and don't d_drop() afs: Fix missing net error handling afs: Fix validation/callback interaction iov_iter: teach csum_and_copy_to_iter() to handle pipe-backed ones exportfs: do not read dentry after free exportfs: fix 'passing zero to ERR_PTR()' warning aio: fix failure to put the file pointer sysv: return 'err' instead of 0 in __sysv_write_inode
2018-11-29afs: Use d_instantiate() rather than d_add() and don't d_drop()David Howells
Use d_instantiate() rather than d_add() and don't d_drop() in afs_vnode_new_inode(). The dentry shouldn't be removed as it's not changing its name. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29afs: Fix missing net error handlingDavid Howells
kAFS can be given certain network errors (EADDRNOTAVAIL, EHOSTDOWN and ERFKILL) that it doesn't handle in its server/address rotation algorithms. They cause the probing and rotation to abort immediately rather than rotating. Fix this by: (1) Abstracting out the error prioritisation from the VL and FS rotation algorithms into a common function and expand usage into the server probing code. When multiple errors are available, this code selects the one we'd prefer to return. (2) Add handling for EADDRNOTAVAIL, EHOSTDOWN and ERFKILL. Fixes: 0fafdc9f888b ("afs: Fix file locking") Fixes: 0338747d8454 ("afs: Probe multiple fileservers simultaneously") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29afs: Fix validation/callback interactionDavid Howells
When afs_validate() is called to validate a vnode (inode), there are two unhandled cases in the fastpath at the top of the function: (1) If the vnode is promised (AFS_VNODE_CB_PROMISED is set), the break counters match and the data has expired, then there's an implicit case in which the vnode needs revalidating. This has no consequences since the default "valid = false" set at the top of the function happens to do the right thing. (2) If the vnode is not promised and it hasn't been deleted (AFS_VNODE_DELETED is not set) then there's a default case we're not handling in which the vnode is invalid. If the vnode is invalid, we need to bring cb_s_break and cb_v_break up to date before we refetch the status. As a consequence, once the server loses track of the client (ie. sufficient time has passed since we last sent it an operation), it will send us a CB.InitCallBackState* operation when we next try to talk to it. This calls afs_init_callback_state() which increments afs_server::cb_s_break, but this then doesn't propagate to the afs_vnode record. The result being that every afs_validate() call thereafter sends a status fetch operation to the server. Clarify and fix this by: (A) Setting valid in all the branches rather than initialising it at the top so that the compiler catches where we've missed. (B) Restructuring the logic in the 'promised' branch so that we set valid to false if the callback is due to expire (or has expired) and so that the final case is that the vnode is still valid. (C) Adding an else-statement that ups cb_s_break and cb_v_break if the promised and deleted cases don't match. Fixes: c435ee34551e ("afs: Overhaul the callback handling") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-15rxrpc: Fix life checkDavid Howells
The life-checking function, which is used by kAFS to make sure that a call is still live in the event of a pending signal, only samples the received packet serial number counter; it doesn't actually provoke a change in the counter, rather relying on the server to happen to give us a packet in the time window. Fix this by adding a function to force a ping to be transmitted. kAFS then keeps track of whether there's been a stall, and if so, uses the new function to ping the server, resetting the timeout to allow the reply to come back. If there's a stall, a ping and the call is *still* stalled in the same place after another period, then the call will be aborted. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Fixes: f4d15fb6f99a ("rxrpc: Provide functions for allowing cleaner handling of signals") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24afs: Probe multiple fileservers simultaneouslyDavid Howells
Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Fix callback handlingDavid Howells
In some circumstances, the callback interest pointer is NULL, so in such a case we can't dereference it when checking to see if the callback is broken. This causes an oops in some circumstances. Fix this by replacing the function that worked out the aggregate break counter with one that actually does the comparison, and then make that return true (ie. broken) if there is no callback interest as yet (ie. the pointer is NULL). Fixes: 68251f0a6818 ("afs: Fix whole-volume callback handling") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Eliminate the address pointer from the address list cursorDavid Howells
Eliminate the address pointer from the address list cursor as it's redundant (ac->addrs[ac->index] can be used to find the same address) and address lists must be replaced rather than being rearranged, so is of limited value. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Allow dumping of server cursor on operation failureDavid Howells
Provide an option to allow the file or volume location server cursor to be dumped if the rotation routine falls off the end without managing to contact a server. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Implement YFS support in the fs clientDavid Howells
Implement support for talking to YFS-variant fileservers in the cache manager and the filesystem client. These implement upgraded services on the same port as their AFS services. YFS fileservers provide expanded capabilities over AFS. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Expand data structure fields to support YFSDavid Howells
Expand fields in various data structures to support the expanded information that YFS is capable of returning. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Get the target vnode in afs_rmdir() and get a callback on itDavid Howells
Get the target vnode in afs_rmdir() and validate it before we attempt the deletion, The vnode pointer will be passed through to the delivery function in a later patch so that the delivery function can mark it deleted. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Calc callback expiry in op reply deliveryDavid Howells
Calculate the callback expiration time at the point of operation reply delivery, using the reply time queried from AF_RXRPC on that call as a base. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Fix FS.FetchStatus delivery from updating wrong vnodeDavid Howells
The FS.FetchStatus reply delivery function was updating inode of the directory in which a lookup had been done with the status of the looked up file. This corrupts some of the directory state. Fixes: 5cf9dd55a0ec ("afs: Prospectively look up extra files when doing a single lookup") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Implement the YFS cache manager serviceDavid Howells
Implement the YFS cache manager service which gives extra capabilities on top of AFS. This is done by listening for an additional service on the same port and indicating that anyone requesting an upgrade should be upgraded to the YFS port. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Remove callback details from afs_callback_break structDavid Howells
Remove unnecessary details of a broken callback, such as version, expiry and type, from the afs_callback_break struct as they're not actually used and make the list take more memory. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Commit the status on a new file/dir/symlinkDavid Howells
Call the function to commit the status on a new file, dir or symlink so that the access rights for the caller's key are cached for that object. Without this, the next access to the file will cause a FetchStatus operation to be emitted to retrieve the access rights. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFSDavid Howells
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode number equivalent) to 96 bits to allow the support of YFS. This requires the iget comparator to check the vnode->fid rather than i_ino and i_generation as i_ino is not sufficiently capacious. It also requires this data to be placed into the vnode cache key for fscache. For the moment, just discard the top 32 bits of the vnode ID when returning it though stat. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Don't invoke the server to read data beyond EOFDavid Howells
When writing a new page, clear space in the page rather than attempting to load it from the server if the space is beyond the EOF. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Add a couple of tracepoints to log I/O errorsDavid Howells
Add a couple of tracepoints to log the production of I/O errors within the AFS filesystem. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Handle EIO from delivery functionDavid Howells
Fix afs_deliver_to_call() to handle -EIO being returned by the operation delivery function, indicating that the call found itself in the wrong state, by printing an error and aborting the call. Currently, an assertion failure will occur. This can happen, say, if the delivery function falls off the end without calling afs_extract_data() with the want_more parameter set to false to collect the end of the Rx phase of a call. The assertion failure looks like: AFS: Assertion failed 4 == 7 is false 0x4 == 0x7 is false ------------[ cut here ]------------ kernel BUG at fs/afs/rxrpc.c:462! and is matched in the trace buffer by a line like: kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY Fixes: 98bf40cd99fc ("afs: Protect call->state changes against signals") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Fix TTL on VL server and address listsDavid Howells
Currently the TTL on VL server and address lists isn't set in all circumstances and may be set to poor choices in others, since the TTL is derived from the SRV/AFSDB DNS record if and when available. Fix the TTL by limiting the range to a minimum and maximum from the current time. At some point these can be made into sysctl knobs. Further, use the TTL we obtained from the upcall to set the expiry on negative results too; in future a mechanism can be added to force reloading of such data. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Implement VL server rotationDavid Howells
Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs/<cell>/vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells <dhowells@redhat.com>