Age | Commit message (Collapse) | Author |
|
Also some clean ups.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
By requsting more attributes during a readdir, we can mimic the readdir plus
operation that was in NFSv3.
To test, I ran the command `ls -lU --color=none` on directories with various
numbers of files. Without readdir plus, I see this:
n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
user | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
sys | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
access | 3 | 1 | 1 | 4 | 31
getattr | 2 | 1 | 1 | 1 | 1
lookup | 104 | 1,003 | 10,003 | 100,003 | 1,000,003
readdir | 2 | 16 | 158 | 1,575 | 15,749
total | 111 | 1,021 | 10,163 | 101,583 | 1,015,784
With readdir plus enabled, I see this:
n files | 100 | 1,000 | 10,000 | 100,000 | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
user | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
sys | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
access | 3 | 1 | 1 | 1 | 7
getattr | 2 | 1 | 1 | 1 | 1
lookup | 4 | 3 | 3 | 3 | 3
readdir | 6 | 62 | 630 | 6,300 | 62,993
total | 15 | 67 | 635 | 6,305 | 63,004
Readdir plus disabled has about a 16x increase in the number of rpc calls and
is 4 - 5 times slower on large directories.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Getattr should be able to decode errors and the readdir file handle.
decode_getfattr_attrs does the actual attribute decoding, while
decode_getfattr_generic will check the opcode before decoding. This will
let other functions call decode_getfattr_attrs to decode their attributes.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Check if the decoded entry has the eof bit set when returning from xdr_decode
with an error. If it does, we should set the eof bits in the array before
returning. This should keep us from looping when we expect more data but the
server doesn't give us anything new.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Check for all errors, not a specific one.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We can use vmapped pages to read more information from the network at once.
This will reduce the number of calls needed to complete a readdir.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Remove the page size checking code for a readdir decode. This is now done
by decode_dirent with xdr_streams.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Convert nfs*xdr.c to use an xdr stream in decode_dirent. This will prevent a
kernel oops that has been occuring when reading a vmapped page.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We sometimes need to be able to read ahead in an xdr_stream without
incrementing the current pointer position.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We will now use readdir plus even on directories that are very large.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This patch adds readdir plus support to the cache array.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
If we're going through the loop in nfs_readdir() more than once, we usually
do not want to restart searching from the beginning of the pages cache.
We only want to do that if the previous search failed...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This patch adds the readdir cache array and functions to retreive the array
stored on a cache page, clear the array by freeing allocated memory, add an
entry to the array, and search the array for a given cookie.
It then modifies readdir to make use of the new cache array.
With the new cache array method, we no longer need some of this code.
Finally, nfs_llseek_dir() will set file->f_pos to a value greater than 0 and
desc->dir_cookie to zero. When we see this, readdir needs to find the file
at position file->f_pos from the start of the directory.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
nfs4state.c uses interfaces from ratelimit.h. It needs to include
that header file to fix build errors:
fs/nfs/nfs4state.c:1195: warning: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: warning: parameter names (without types) in function declaration
fs/nfs/nfs4state.c:1195: error: invalid storage class for function 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: error: implicit declaration of function '__ratelimit'
fs/nfs/nfs4state.c:1195: error: '_rs' undeclared (first use in this function)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Otherwise, we cannot recover state correctly.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
If nfs_intent_set_file() returns an error, we usually want to pass that
back up the stack.
Also ensure that nfs_open_revalidate() returns '1' on success.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
If the server sends us an NFS4ERR_STALE_CLIENTID while the state management
thread is busy reclaiming state, we do want to treat all state that wasn't
reclaimed before the STALE_CLIENTID as if a network partition occurred (see
the edge conditions described in RFC3530 and RFC5661).
What we do not want to do is to send an nfs4_reclaim_complete(), since we
haven't yet even started reclaiming state after the server rebooted.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
|
|
In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.
However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
|
|
NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.
Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
|
|
In the case where we lock the page, and then find out that the page has
been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
This is what block_page_mkwrite() does in these situations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
|
|
This patch creates a new idmapper system that uses the request-key function to
place a call into userspace to map user and group ids to names. The old
idmapper was single threaded, which prevented more than one request from running
at a single time. This means that a user would have to wait for an upcall to
finish before accessing a cached result.
The upcall result is stored on a keyring of type id_resolver. See the file
Documentation/filesystems/nfs/idmapper.txt for instructions.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We may end up removing the current entry from nfs_access_lru_list.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
WB_SYNC_NONE is supposed to mean "don't wait on anything". That should
also include not waiting for COMMIT calls to complete.
WB_SYNC_NONE is also implied when wbc->nonblocking and
wbc->for_background are set, so we can replace those checks in
nfs_commit_unstable_pages with a check for WB_SYNC_NONE.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
In nfs_open_revalidate(), if the open_context() call returns an inode that
is not the same as dentry->d_inode, then we will call
put_nfs_open_context() with a valid dentry->d_inode, but without the
context being part of the nfsi->open_files list.
In this case too, we want to just skip the list removal, but we do want to
call the ->close_context() callback in order to close the NFSv4 state.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
|
|
Having to explicitly initialize sr_slotid to NFS4_MAX_SLOT_TABLE
resulted in numerous bugs. Keeping the current slot as a pointer
to the slot table is more straight forward and robust as it's
implicitly set up to NULL wherever the seq_res member is initialized
to zeroes.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Display the status of 'local_lock' mount option in /proc/mounts.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
inode may be NULL when put_nfs_open_context is called from nfs_atomic_lookup
before d_add_unique(dentry, inode)
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
NFS clients since 2.6.12 support flock locks by emulating fcntl byte-range
locks. Due to this, some windows applications which seem to use both flock
(share mode lock mapped as flock by Samba) and fcntl locks sequentially on
the same file, can't lock as they falsely assume the file is already locked.
The problem was reported on a setup with windows clients accessing excel files
on a Samba exported share which is originally a NFS mount from a NetApp filer.
Older NFS clients (< 2.6.12) did not see this problem as flock locks were
considered local. To support legacy flock behavior, this patch adds a mount
option "-olocal_lock=" which can take the following values:
'none' - Neither flock locks nor POSIX locks are local
'flock' - flock locks are local
'posix' - fcntl/POSIX locks are local
'all' - Both flock locks and POSIX locks are local
Testing:
- This patch was tested by using -olocal_lock option with different values
and the NLM calls were noted from the network packet captured.
'none' - NLM calls were seen during both flock() and fcntl(), flock lock
was granted, fcntl was denied
'flock' - no NLM calls for flock(), NLM call was seen for fcntl(),
granted
'posix' - NLM call was seen for flock() - granted, no NLM call for fcntl()
'all' - no NLM calls were seen during both flock() and fcntl()
- No bugs were seen during NFSv4 locking/unlocking in general and NFSv4
reboot recovery.
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This patch removes all calls to lock_kernel() from the client. This patch
should be applied after the "fs/lock.c prepare for BKL removal" patch submitted
by Arnd Bergmann on September 18.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Clean up: Introduce a helper to '\0'-terminate XDR strings
that are placed in a page in the page cache.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
create_singlethread_workqueue() is deprecated.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
create_workqueue() is a deprecated function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This fixes an Oopsable condition that was introduced by commit
d3d4152a5d59af9e13a73efa9e9c24383fbe307f (nfs: make sillyrename an async
operation)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The call to nfs_async_rename_release() after rpc_run_task() is incorrect.
The rpc_run_task() is always guaranteed to call the ->rpc_release() method.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
A synchronous rename can be interrupted by a SIGKILL. If that happens
during a sillyrename operation, it's possible for the rename call to
be sent to the server, but the task exits before processing the
reply. If this happens, the sillyrenamed file won't get cleaned up
during nfs_dentry_iput and the server is left with a dangling .nfs* file
hanging around.
Fix this problem by turning sillyrename into an asynchronous operation
and have the task doing the sillyrename just wait on the reply. If the
task is killed before the sillyrename completes, it'll still proceed
to completion.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
...since that's where most of the sillyrenaming code lives. A comment
block is added to the beginning as well to clarify how sillyrenaming
works. Also, make nfs_async_unlink static as nfs_sillyrename is the only
caller.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Right now, v3 and v4 have their own variants. Create a standard struct
that will work for v3 and v4. v2 doesn't get anything but a simple error
and so isn't affected by this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Each NFS version has its own version of the rename args container.
Standardize them on a common one that's identical to the one NFSv4
uses.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Remove all remaining references to the struct nameidata from the low level
NFS layers. Again pass down a partially initialised struct nfs_open_context
when we want to do atomic open+create.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Remove references to 'struct nameidata' from the low-level open_revalidate
code, and replace them with a struct nfs_open_context which will be
correctly initialised upon success.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Start moving the 'struct nameidata' dependent code out of the lower level
NFS code in preparation for the removal of open intents.
Instead of the struct nameidata, we pass down a partially initialised
struct nfs_open_context that will be fully initialised by the atomic open
upon success.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Clean up: rpcb_getport_sync() has no more users, so remove it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
As a convenience, introduce a kernel command line option to enable
NFSROOT debugging messages.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Clean up: now that mount option parsing for nfsroot is handled
in fs/nfs/super.c, remove code in fs/nfs/nfsroot.c that is no
longer used. This includes code that constructs the legacy
nfs_mount_data structure, and code that does a MNT call to the
server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Replace duplicate code in NFSROOT for mounting an NFS server on '/'
with logic that uses the existing mainline text-based logic in the NFS
client.
Add documenting comments where appropriate.
Note that this means NFSROOT mounts now use the same default settings
as v2/v3 mounts done via mount(2) from user space.
vers=3,tcp,rsize=<negotiated default>,wsize=<negotiated default>
As before, however, no version/protocol negotiation with the server is
done.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|