diff options
58 files changed, 2519 insertions, 464 deletions
diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst index 8536134f31fd..95c2c009874c 100644 --- a/Documentation/filesystems/nfs/index.rst +++ b/Documentation/filesystems/nfs/index.rst @@ -8,6 +8,7 @@ NFS client-identifier exporting + localio pnfs rpc-cache rpc-server-gss diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst new file mode 100644 index 000000000000..bd1967e2eab3 --- /dev/null +++ b/Documentation/filesystems/nfs/localio.rst @@ -0,0 +1,357 @@ +=========== +NFS LOCALIO +=========== + +Overview +======== + +The LOCALIO auxiliary RPC protocol allows the Linux NFS client and +server to reliably handshake to determine if they are on the same +host. Select "NFS client and server support for LOCALIO auxiliary +protocol" in menuconfig to enable CONFIG_NFS_LOCALIO in the kernel +config (both CONFIG_NFS_FS and CONFIG_NFSD must also be enabled). + +Once an NFS client and server handshake as "local", the client will +bypass the network RPC protocol for read, write and commit operations. +Due to this XDR and RPC bypass, these operations will operate faster. + +The LOCALIO auxiliary protocol's implementation, which uses the same +connection as NFS traffic, follows the pattern established by the NFS +ACL protocol extension. + +The LOCALIO auxiliary protocol is needed to allow robust discovery of +clients local to their servers. In a private implementation that +preceded use of this LOCALIO protocol, a fragile sockaddr network +address based match against all local network interfaces was attempted. +But unlike the LOCALIO protocol, the sockaddr-based matching didn't +handle use of iptables or containers. + +The robust handshake between local client and server is just the +beginning, the ultimate use case this locality makes possible is the +client is able to open files and issue reads, writes and commits +directly to the server without having to go over the network. The +requirement is to perform these loopback NFS operations as efficiently +as possible, this is particularly useful for container use cases +(e.g. kubernetes) where it is possible to run an IO job local to the +server. + +The performance advantage realized from LOCALIO's ability to bypass +using XDR and RPC for reads, writes and commits can be extreme, e.g.: + +fio for 20 secs with directio, qd of 8, 16 libaio threads: + - With LOCALIO: + 4K read: IOPS=979k, BW=3825MiB/s (4011MB/s)(74.7GiB/20002msec) + 4K write: IOPS=165k, BW=646MiB/s (678MB/s)(12.6GiB/20002msec) + 128K read: IOPS=402k, BW=49.1GiB/s (52.7GB/s)(982GiB/20002msec) + 128K write: IOPS=11.5k, BW=1433MiB/s (1503MB/s)(28.0GiB/20004msec) + + - Without LOCALIO: + 4K read: IOPS=79.2k, BW=309MiB/s (324MB/s)(6188MiB/20003msec) + 4K write: IOPS=59.8k, BW=234MiB/s (245MB/s)(4671MiB/20002msec) + 128K read: IOPS=33.9k, BW=4234MiB/s (4440MB/s)(82.7GiB/20004msec) + 128K write: IOPS=11.5k, BW=1434MiB/s (1504MB/s)(28.0GiB/20011msec) + +fio for 20 secs with directio, qd of 8, 1 libaio thread: + - With LOCALIO: + 4K read: IOPS=230k, BW=898MiB/s (941MB/s)(17.5GiB/20001msec) + 4K write: IOPS=22.6k, BW=88.3MiB/s (92.6MB/s)(1766MiB/20001msec) + 128K read: IOPS=38.8k, BW=4855MiB/s (5091MB/s)(94.8GiB/20001msec) + 128K write: IOPS=11.4k, BW=1428MiB/s (1497MB/s)(27.9GiB/20001msec) + + - Without LOCALIO: + 4K read: IOPS=77.1k, BW=301MiB/s (316MB/s)(6022MiB/20001msec) + 4K write: IOPS=32.8k, BW=128MiB/s (135MB/s)(2566MiB/20001msec) + 128K read: IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec) + 128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec) + +FAQ +=== + +1. What are the use cases for LOCALIO? + + a. Workloads where the NFS client and server are on the same host + realize improved IO performance. In particular, it is common when + running containerised workloads for jobs to find themselves + running on the same host as the knfsd server being used for + storage. + +2. What are the requirements for LOCALIO? + + a. Bypass use of the network RPC protocol as much as possible. This + includes bypassing XDR and RPC for open, read, write and commit + operations. + b. Allow client and server to autonomously discover if they are + running local to each other without making any assumptions about + the local network topology. + c. Support the use of containers by being compatible with relevant + namespaces (e.g. network, user, mount). + d. Support all versions of NFS. NFSv3 is of particular importance + because it has wide enterprise usage and pNFS flexfiles makes use + of it for the data path. + +3. Why doesn’t LOCALIO just compare IP addresses or hostnames when + deciding if the NFS client and server are co-located on the same + host? + + Since one of the main use cases is containerised workloads, we cannot + assume that IP addresses will be shared between the client and + server. This sets up a requirement for a handshake protocol that + needs to go over the same connection as the NFS traffic in order to + identify that the client and the server really are running on the + same host. The handshake uses a secret that is sent over the wire, + and can be verified by both parties by comparing with a value stored + in shared kernel memory if they are truly co-located. + +4. Does LOCALIO improve pNFS flexfiles? + + Yes, LOCALIO complements pNFS flexfiles by allowing it to take + advantage of NFS client and server locality. Policy that initiates + client IO as closely to the server where the data is stored naturally + benefits from the data path optimization LOCALIO provides. + +5. Why not develop a new pNFS layout to enable LOCALIO? + + A new pNFS layout could be developed, but doing so would put the + onus on the server to somehow discover that the client is co-located + when deciding to hand out the layout. + There is value in a simpler approach (as provided by LOCALIO) that + allows the NFS client to negotiate and leverage locality without + requiring more elaborate modeling and discovery of such locality in a + more centralized manner. + +6. Why is having the client perform a server-side file OPEN, without + using RPC, beneficial? Is the benefit pNFS specific? + + Avoiding the use of XDR and RPC for file opens is beneficial to + performance regardless of whether pNFS is used. Especially when + dealing with small files its best to avoid going over the wire + whenever possible, otherwise it could reduce or even negate the + benefits of avoiding the wire for doing the small file I/O itself. + Given LOCALIO's requirements the current approach of having the + client perform a server-side file open, without using RPC, is ideal. + If in the future requirements change then we can adapt accordingly. + +7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)? + + Strong authentication is usually tied to the connection itself. It + works by establishing a context that is cached by the server, and + that acts as the key for discovering the authorisation token, which + can then be passed to rpc.mountd to complete the authentication + process. On the other hand, in the case of AUTH_UNIX, the credential + that was passed over the wire is used directly as the key in the + upcall to rpc.mountd. This simplifies the authentication process, and + so makes AUTH_UNIX easier to support. + +8. How do export options that translate RPC user IDs behave for LOCALIO + operations (eg. root_squash, all_squash)? + + Export options that translate user IDs are managed by nfsd_setuser() + which is called by nfsd_setuser_and_check_port() which is called by + __fh_verify(). So they get handled exactly the same way for LOCALIO + as they do for non-LOCALIO. + +9. How does LOCALIO make certain that object lifetimes are managed + properly given NFSD and NFS operate in different contexts? + + See the detailed "NFS Client and Server Interlock" section below. + +RPC +=== + +The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL" +RPC method that allows the Linux NFS client to verify the local Linux +NFS server can see the nonce (single-use UUID) the client generated and +made available in nfs_common. This protocol isn't part of an IETF +standard, nor does it need to be considering it is Linux-to-Linux +auxiliary RPC protocol that amounts to an implementation detail. + +The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of +the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode +XDR methods are used instead of the less efficient variable sized +methods. + +The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned +by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ): +Linux Kernel Organization 400122 nfslocalio + +The LOCALIO protocol spec in rpcgen syntax is:: + + /* raw RFC 9562 UUID */ + #define UUID_SIZE 16 + typedef u8 uuid_t<UUID_SIZE>; + + program NFS_LOCALIO_PROGRAM { + version LOCALIO_V1 { + void + NULL(void) = 0; + + void + UUID_IS_LOCAL(uuid_t) = 1; + } = 1; + } = 400122; + +LOCALIO uses the same transport connection as NFS traffic. As such, +LOCALIO is not registered with rpcbind. + +NFS Common and Client/Server Handshake +====================================== + +fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client +to generate a nonce (single-use UUID) and associated short-lived +nfs_uuid_t struct, register it with nfs_common for subsequent lookup and +verification by the NFS server and if matched the NFS server populates +members in the nfs_uuid_t struct. The NFS client then uses nfs_common to +transfer the nfs_uuid_t from its nfs_uuids to the nn->nfsd_serv +clients_list from the nfs_common's uuids_list. See: +fs/nfs/localio.c:nfs_local_probe() + +nfs_common's nfs_uuids list is the basis for LOCALIO enablement, as such +it has members that point to nfsd memory for direct use by the client +(e.g. 'net' is the server's network namespace, through it the client can +access nn->nfsd_serv with proper rcu read access). It is this client +and server synchronization that enables advanced usage and lifetime of +objects to span from the host kernel's nfsd to per-container knfsd +instances that are connected to nfs client's running on the same local +host. + +NFS Client and Server Interlock +=============================== + +LOCALIO provides the nfs_uuid_t object and associated interfaces to +allow proper network namespace (net-ns) and NFSD object refcounting: + + We don't want to keep a long-term counted reference on each NFSD's + net-ns in the client because that prevents a server container from + completely shutting down. + + So we avoid taking a reference at all and rely on the per-cpu + reference to the server (detailed below) being sufficient to keep + the net-ns active. This involves allowing the NFSD's net-ns exit + code to iterate all active clients and clear their ->net pointers + (which are needed to find the per-cpu-refcount for the nfsd_serv). + + Details: + + - Embed nfs_uuid_t in nfs_client. nfs_uuid_t provides a list_head + that can be used to find the client. It does add the 16-byte + uuid_t to nfs_client so it is bigger than needed (given that + uuid_t is only used during the initial NFS client and server + LOCALIO handshake to determine if they are local to each other). + If that is really a problem we can find a fix. + + - When the nfs server confirms that the uuid_t is local, it moves + the nfs_uuid_t onto a per-net-ns list in NFSD's nfsd_net. + + - When each server's net-ns is shutting down - in a "pre_exit" + handler, all these nfs_uuid_t have their ->net cleared. There is + an rcu_synchronize() call between pre_exit() handlers and exit() + handlers so any caller that sees nfs_uuid_t ->net as not NULL can + safely manage the per-cpu-refcount for nfsd_serv. + + - The client's nfs_uuid_t is passed to nfsd_open_local_fh() so it + can safely dereference ->net in a private rcu_read_lock() section + to allow safe access to the associated nfsd_net and nfsd_serv. + +So LOCALIO required the introduction and use of NFSD's percpu_ref to +interlock nfsd_destroy_serv() and nfsd_open_local_fh(), to ensure each +nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh(), and +warrants a more detailed explanation: + + nfsd_open_local_fh() uses nfsd_serv_try_get() before opening its + nfsd_file handle and then the caller (NFS client) must drop the + reference for the nfsd_file and associated nn->nfsd_serv using + nfs_file_put_local() once it has completed its IO. + + This interlock working relies heavily on nfsd_open_local_fh() being + afforded the ability to safely deal with the possibility that the + NFSD's net-ns (and nfsd_net by association) may have been destroyed + by nfsd_destroy_serv() via nfsd_shutdown_net() -- which is only + possible given the nfs_uuid_t ->net pointer managemenet detailed + above. + +All told, this elaborate interlock of the NFS client and server has been +verified to fix an easy to hit crash that would occur if an NFSD +instance running in a container, with a LOCALIO client mounted, is +shutdown. Upon restart of the container and associated NFSD the client +would go on to crash due to NULL pointer dereference that occurred due +to the LOCALIO client's attempting to nfsd_open_local_fh(), using +nn->nfsd_serv, without having a proper reference on nn->nfsd_serv. + +NFS Client issues IO instead of Server +====================================== + +Because LOCALIO is focused on protocol bypass to achieve improved IO +performance, alternatives to the traditional NFS wire protocol (SUNRPC +with XDR) must be provided to access the backing filesystem. + +See fs/nfs/localio.c:nfs_local_open_fh() and +fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes +focused use of select nfs server objects to allow a client local to a +server to open a file pointer without needing to go over the network. + +The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the +server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access +both the associated nfsd network namespace and nn->nfsd_serv in terms of +RCU. If nfsd_open_local_fh() finds that the client no longer sees valid +nfsd objects (be it struct net or nn->nfsd_serv) it returns -ENXIO +to nfs_local_open_fh() and the client will try to reestablish the +LOCALIO resources needed by calling nfs_local_probe() again. This +recovery is needed if/when an nfsd instance running in a container were +to reboot while a LOCALIO client is connected to it. + +Once the client has an open nfsd_file pointer it will issue reads, +writes and commits directly to the underlying local filesystem (normally +done by the nfs server). As such, for these operations, the NFS client +is issuing IO to the underlying local filesystem that it is sharing with +the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and +fs/nfs/localio.c:nfs_local_commit(). + +Security +======== + +Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka +AUTH_SYS) is used. + +Care is taken to ensure the same NFS security mechanisms are used +(authentication, etc) regardless of whether LOCALIO or regular NFS +access is used. The auth_domain established as part of the traditional +NFS client access to the NFS server is also used for LOCALIO. + +Relative to containers, LOCALIO gives the client access to the network +namespace the server has. This is required to allow the client to access +the server's per-namespace nfsd_net struct. With traditional NFS, the +client is afforded this same level of access (albeit in terms of the NFS +protocol via SUNRPC). No other namespaces (user, mount, etc) have been +altered or purposely extended from the server to the client. + +Testing +======= + +The LOCALIO auxiliary protocol and associated NFS LOCALIO read, write +and commit access have proven stable against various test scenarios: + +- Client and server both on the same host. + +- All permutations of client and server support enablement for both + local and remote client and server. + +- Testing against NFS storage products that don't support the LOCALIO + protocol was also performed. + +- Client on host, server within a container (for both v3 and v4.2). + The container testing was in terms of podman managed containers and + includes successful container stop/restart scenario. + +- Formalizing these test scenarios in terms of existing test + infrastructure is on-going. Initial regular coverage is provided in + terms of ktest running xfstests against a LOCALIO-enabled NFS loopback + mount configuration, and includes lockdep and KASAN coverage, see: + https://evilpiepirate.org/~testdashboard/ci?user=snitzer&branch=snitm-nfs-next + https://github.com/koverstreet/ktest + +- Various kdevops testing (in terms of "Chuck's BuildBot") has been + performed to regularly verify the LOCALIO changes haven't caused any + regressions to non-LOCALIO NFS use cases. + +- All of Hammerspace's various sanity tests pass with LOCALIO enabled + (this includes numerous pNFS and flexfiles tests). diff --git a/fs/Kconfig b/fs/Kconfig index 0e4efec1d92e..949895cff872 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -386,6 +386,29 @@ config NFS_COMMON depends on NFSD || NFS_FS || LOCKD default y +config NFS_COMMON_LOCALIO_SUPPORT + tristate + default n + default y if NFSD=y || NFS_FS=y + default m if NFSD=m && NFS_FS=m + select SUNRPC + +config NFS_LOCALIO + bool "NFS client and server support for LOCALIO auxiliary protocol" + depends on NFSD && NFS_FS + select NFS_COMMON_LOCALIO_SUPPORT + default n + help + Some NFS servers support an auxiliary NFS LOCALIO protocol + that is not an official part of the NFS protocol. + + This option enables support for the LOCALIO protocol in the + kernel's NFS server and client. Enable this to permit local + NFS clients to bypass the network when issuing reads and + writes to the local NFS server. + + If unsure, say N. + config NFS_V4_2_SSC_HELPER bool default y if NFS_V4_2 diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig index 57249f040dfc..0eb20012792f 100644 --- a/fs/nfs/Kconfig +++ b/fs/nfs/Kconfig @@ -4,6 +4,7 @@ config NFS_FS depends on INET && FILE_LOCKING && MULTIUSER select LOCKD select SUNRPC + select NFS_COMMON select NFS_ACL_SUPPORT if NFS_V3_ACL help Choose Y here if you want to access files residing on other diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile index 5f6db37f461e..9fb2f2cac87e 100644 --- a/fs/nfs/Makefile +++ b/fs/nfs/Makefile @@ -13,6 +13,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \ nfs-$(CONFIG_ROOT_NFS) += nfsroot.o nfs-$(CONFIG_SYSCTL) += sysctl.o nfs-$(CONFIG_NFS_FSCACHE) += fscache.o +nfs-$(CONFIG_NFS_LOCALIO) += localio.o obj-$(CONFIG_NFS_V2) += nfsv2.o nfsv2-y := nfs2super.o proc.o nfs2xdr.o diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 8286edd6062d..a1d21c4be0ac 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -178,6 +178,14 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init) clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1; clp->cl_net = get_net(cl_init->net); +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + seqlock_init(&clp->cl_boot_lock); + ktime_get_real_ts64(&clp->cl_nfssvc_boot); + clp->cl_uuid.net = NULL; + clp->cl_uuid.dom = NULL; + spin_lock_init(&clp->cl_localio_lock); +#endif /* CONFIG_NFS_LOCALIO */ + clp->cl_principal = "*"; clp->cl_xprtsec = cl_init->xprtsec; return clp; @@ -233,6 +241,8 @@ static void pnfs_init_server(struct nfs_server *server) */ void nfs_free_client(struct nfs_client *clp) { + nfs_local_disable(clp); + /* -EIO all pending I/O */ if (!IS_ERR(clp->cl_rpcclient)) rpc_shutdown_client(clp->cl_rpcclient); @@ -424,7 +434,10 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init) list_add_tail(&new->cl_share_link, &nn->nfs_client_list); spin_unlock(&nn->nfs_client_lock); - return rpc_ops->init_client(new, cl_init); + new = rpc_ops->init_client(new, cl_init); + if (!IS_ERR(new)) + nfs_local_probe(new); + return new; } spin_unlock(&nn->nfs_client_lock); @@ -997,8 +1010,8 @@ struct nfs_server *nfs_alloc_server(void) init_waitqueue_head(&server->write_congestion_wait); atomic_long_set(&server->writeback, 0); - ida_init(&server->openowner_id); - ida_init(&server->lockowner_id); + atomic64_set(&server->owner_ctr, 0); + pnfs_init_server(server); rpc_init_wait_queue(&server->uoc_rpcwaitq, "NFS UOC"); @@ -1037,8 +1050,6 @@ void nfs_free_server(struct nfs_server *server) } ida_free(&s_sysfs_ids, server->s_sysfs_id); - ida_destroy(&server->lockowner_id); - ida_destroy(&server->openowner_id); put_cred(server->cred); nfs_release_automount_timer(); call_rcu(&server->rcu, delayed_free); diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 4cb97ef41350..492cffd9d3d8 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -151,7 +151,7 @@ struct nfs_cache_array { unsigned char folio_full : 1, folio_is_eof : 1, cookies_are_ordered : 1; - struct nfs_cache_array_entry array[]; + struct nfs_cache_array_entry array[] __counted_by(size); }; struct nfs_readdir_descriptor { @@ -328,7 +328,8 @@ static int nfs_readdir_folio_array_append(struct folio *folio, goto out; } - cache_entry = &array->array[array->size]; + array->size++; + cache_entry = &array->array[array->size - 1]; cache_entry->cookie = array->last_cookie; cache_entry->ino = entry->ino; cache_entry->d_type = entry->d_type; @@ -337,7 +338,6 @@ static int nfs_readdir_folio_array_append(struct folio *folio, array->last_cookie = entry->cookie; if (array->last_cookie <= cache_entry->cookie) array->cookies_are_ordered = 0; - array->size++; if (entry->eof != 0) nfs_readdir_array_set_eof(array); out: diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c index b6e9aeaf4ce2..d39a1f58e18d 100644 --- a/fs/nfs/filelayout/filelayout.c +++ b/fs/nfs/filelayout/filelayout.c @@ -488,7 +488,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr) /* Perform an asynchronous read to ds */ nfs_initiate_pgio(ds_clnt, hdr, hdr->cred, NFS_PROTO(hdr->inode), &filelayout_read_call_ops, - 0, RPC_TASK_SOFTCONN); + 0, RPC_TASK_SOFTCONN, NULL); return PNFS_ATTEMPTED; } @@ -530,7 +530,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync) /* Perform an asynchronous write */ nfs_initiate_pgio(ds_clnt, hdr, hdr->cred, NFS_PROTO(hdr->inode), &filelayout_write_call_ops, - sync, RPC_TASK_SOFTCONN); + sync, RPC_TASK_SOFTCONN, NULL); return PNFS_ATTEMPTED; } @@ -1011,7 +1011,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how) data->args.fh = fh; return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode), &filelayout_commit_call_ops, how, - RPC_TASK_SOFTCONN); + RPC_TASK_SOFTCONN, NULL); out_err: pnfs_generic_prepare_to_resend_writes(data); pnfs_generic_commit_release(data); diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 39ba9f4208aa..f78115c6c2c1 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -11,6 +11,7 @@ #include <linux/nfs_mount.h> #include <linux/nfs_page.h> #include <linux/module.h> +#include <linux/file.h> #include <linux/sched/mm.h> #include <linux/sunrpc/metrics.h> @@ -162,6 +163,21 @@ decode_name(struct xdr_stream *xdr, u32 *id) return 0; } +static struct nfsd_file * +ff_local_open_fh(struct nfs_client *clp, const struct cred *cred, + struct nfs_fh *fh, fmode_t mode) +{ + if (mode & FMODE_WRITE) { + /* + * Always request read and write access since this corresponds + * to a rw layout. + */ + mode |= FMODE_READ; + } + + return nfs_local_open_fh(clp, cred, fh, mode); +} + static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1, const struct nfs4_ff_layout_mirror *m2) { @@ -237,7 +253,7 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags) static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror) { - const struct cred *cred; + const struct cred *cred; ff_layout_remove_mirror(mirror); kfree(mirror->fh_versions); @@ -1756,6 +1772,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr) struct pnfs_layout_segment *lseg = hdr->lseg; struct nfs4_pnfs_ds *ds; struct rpc_clnt *ds_clnt; + struct nfsd_file *localio; struct nfs4_ff_layout_mirror *mirror; const struct cred *ds_cred; loff_t offset = hdr->args.offset; @@ -1802,11 +1819,18 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr) hdr->args.offset = offset; hdr->mds_offset = offset; + /* Start IO accounting for local read */ + localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, FMODE_READ); + if (localio) { + hdr->task.tk_start = ktime_get(); + ff_layout_read_record_layoutstats_start(&hdr->task, hdr); + } + /* Perform an asynchronous read to ds */ nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops, vers == 3 ? &ff_layout_read_call_ops_v3 : &ff_layout_read_call_ops_v4, - 0, RPC_TASK_SOFTCONN); + 0, RPC_TASK_SOFTCONN, localio); put_cred(ds_cred); return PNFS_ATTEMPTED; @@ -1826,6 +1850,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync) struct pnfs_layout_segment *lseg = hdr->lseg; struct nfs4_pnfs_ds *ds; struct rpc_clnt *ds_clnt; + struct nfsd_file *localio; struct nfs4_ff_layout_mirror *mirror; const struct cred *ds_cred; loff_t offset = hdr->args.offset; @@ -1870,11 +1895,19 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync) */ hdr->args.offset = offset; + /* Start IO accounting for local write */ + localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, + FMODE_READ|FMODE_WRITE); + if (localio) { + hdr->task.tk_start = ktime_get(); + ff_layout_write_record_layoutstats_start(&hdr->task, hdr); + } + /* Perform an asynchronous write */ nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops, vers == 3 ? &ff_layout_write_call_ops_v3 : &ff_layout_write_call_ops_v4, - sync, RPC_TASK_SOFTCONN); + sync, RPC_TASK_SOFTCONN, localio); put_cred(ds_cred); return PNFS_ATTEMPTED; @@ -1908,6 +1941,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how) struct pnfs_layout_segment *lseg = data->lseg; struct nfs4_pnfs_ds *ds; struct rpc_clnt *ds_clnt; + struct nfsd_file *localio; struct nfs4_ff_layout_mirror *mirror; const struct cred *ds_cred; u32 idx; @@ -1946,10 +1980,18 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how) if (fh) data->args.fh = fh; + /* Start IO accounting for local commit */ + localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, + FMODE_READ|FMODE_WRITE); + if (localio) { + data->task.tk_start = ktime_get(); + ff_layout_commit_record_layoutstats_start(&data->task, data); + } + ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops, vers == 3 ? &ff_layout_commit_call_ops_v3 : &ff_layout_commit_call_ops_v4, - how, RPC_TASK_SOFTCONN); + how, RPC_TASK_SOFTCONN, localio); put_cred(ds_cred); return ret; out_err: @@ -2087,12 +2129,6 @@ static int ff_layout_encode_ioerr(struct xdr_stream *xdr, } static void -encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len) -{ - WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0); -} - -static void ff_layout_encode_ff_iostat_head(struct xdr_stream *xdr, const nfs4_stateid *stateid, const struct nfs42_layoutstat_devinfo *devinfo) diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c index e028f5a0ef5f..e58bedfb1dcc 100644 --- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c @@ -395,6 +395,12 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg, /* connect success, check rsize/wsize limit */ if (!status) { + /* + * ds_clp is put in destroy_ds(). + * keep ds_clp even if DS is local, so that if local IO cannot + * proceed somehow, we can fall back to NFS whenever we want. + */ + nfs_local_probe(ds->ds_clp); max_payload = nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient), NULL); diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c index 6c9f3f6645dd..7e000d782e28 100644 --- a/fs/nfs/fs_context.c +++ b/fs/nfs/fs_context.c @@ -49,6 +49,7 @@ enum nfs_param { Opt_bsize, Opt_clientaddr, Opt_cto, + Opt_alignwrite, Opt_fg, Opt_fscache, Opt_fscache_flag, @@ -149,6 +150,7 @@ static const struct fs_parameter_spec nfs_fs_parameters[] = { fsparam_u32 ("bsize", Opt_bsize), fsparam_string("clientaddr", Opt_clientaddr), fsparam_flag_no("cto", Opt_cto), + fsparam_flag_no("alignwrite", Opt_alignwrite), fsparam_flag ("fg", Opt_fg), fsparam_flag_no("fsc", Opt_fscache_flag), fsparam_string("fsc", Opt_fscache), @@ -592,6 +594,12 @@ static int nfs_fs_context_parse_param(struct fs_context *fc, else ctx->flags |= NFS_MOUNT_TRUNK_DISCOVERY; break; + case Opt_alignwrite: + if (result.negated) + ctx->flags |= NFS_MOUNT_NO_ALIGNWRITE; + else + ctx->flags &= ~NFS_MOUNT_NO_ALIGNWRITE; + break; case Opt_ac: if (result.negated) ctx->flags |= NFS_MOUNT_NOAC; diff --git a/fs/nfs/getroot.c b/fs/nfs/getroot.c index 11ff2b2e060f..f13d25d95b85 100644 --- a/fs/nfs/getroot.c +++ b/fs/nfs/getroot.c @@ -62,7 +62,7 @@ static int nfs_superblock_set_dummy_root(struct super_block *sb, struct inode *i } /* - * get an NFS2/NFS3 root dentry from the root filehandle + * get a root dentry from the root filehandle */ int nfs_get_root(struct super_block *s, struct fs_context *fc) { diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index b4914a11c3c2..542c7d97b235 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -2461,35 +2461,54 @@ static void nfs_destroy_inodecache(void) kmem_cache_destroy(nfs_inode_cachep); } +struct workqueue_struct *nfslocaliod_workqueue; struct workqueue_struct *nfsiod_workqueue; EXPORT_SYMBOL_GPL(nfsiod_workqueue); /* - * start up the nfsiod workqueue + * Destroy the nfsiod workqueues */ -static int nfsiod_start(void) +static void nfsiod_stop(void) { struct workqueue_struct *wq; - dprintk("RPC: creating workqueue nfsiod\n"); - wq = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0); - if (wq == NULL) - return -ENOMEM; - nfsiod_workqueue = wq; - return 0; + + wq = nfsiod_workqueue; + if (wq != NULL) { + nfsiod_workqueue = NULL; + destroy_workqueue(wq); + } +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + wq = nfslocaliod_workqueue; + if (wq != NULL) { + nfslocaliod_workqueue = NULL; + destroy_workqueue(wq); + } +#endif /* CONFIG_NFS_LOCALIO */ } /* - * Destroy the nfsiod workqueue + * Start the nfsiod workqueues */ -static void nfsiod_stop(void) +static int nfsiod_start(void) { - struct workqueue_struct *wq; - - wq = nfsiod_workqueue; - if (wq == NULL) - return; - nfsiod_workqueue = NULL; - destroy_workqueue(wq); + dprintk("RPC: creating workqueue nfsiod\n"); + nfsiod_workqueue = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0); + if (nfsiod_workqueue == NULL) + return -ENOMEM; +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + /* + * localio writes need to use a normal (non-memreclaim) workqueue. + * When we start getting low on space, XFS goes and calls flush_work() on + * a non-memreclaim work queue, which causes a priority inversion problem. + */ + dprintk("RPC: creating workqueue nfslocaliod\n"); + nfslocaliod_workqueue = alloc_workqueue("nfslocaliod", WQ_UNBOUND, 0); + if (unlikely(nfslocaliod_workqueue == NULL)) { + nfsiod_stop(); + return -ENOMEM; + } +#endif /* CONFIG_NFS_LOCALIO */ + return 0; } unsigned int nfs_net_id; diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 5902a9beca1f..430733e3eff2 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -9,6 +9,7 @@ #include <linux/crc32.h> #include <linux/sunrpc/addr.h> #include <linux/nfs_page.h> +#include <linux/nfslocalio.h> #include <linux/wait_bit.h> #define NFS_SB_MASK (SB_RDONLY|SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS) @@ -308,7 +309,8 @@ void nfs_pgio_header_free(struct nfs_pgio_header *); int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *); int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr, const struct cred *cred, const struct nfs_rpc_ops *rpc_ops, - const struct rpc_call_ops *call_ops, int how, int flags); + const struct rpc_call_ops *call_ops, int how, int flags, + struct nfsd_file *localio); void nfs_free_request(struct nfs_page *req); struct nfs_pgio_mirror * nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc); @@ -438,6 +440,7 @@ int nfs_check_flags(int); /* inode.c */ extern struct workqueue_struct *nfsiod_workqueue; +extern struct workqueue_struct *nfslocaliod_workqueue; extern struct inode *nfs_alloc_inode(struct super_block *sb); extern void nfs_free_inode(struct inode *); extern int nfs_write_inode(struct inode *, struct writeback_control *); @@ -449,6 +452,51 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags); extern bool nfs_check_cache_invalid(struct inode *, unsigned long); extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode); +#if IS_ENABLED(CONFIG_NFS_LOCALIO) +/* localio.c */ +extern void nfs_local_disable(struct nfs_client *); +extern void nfs_local_probe(struct nfs_client *); +extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *, + const struct cred *, + struct nfs_fh *, + const fmode_t); +extern int nfs_local_doio(struct nfs_client *, + struct nfsd_file *, + struct nfs_pgio_header *, + const struct rpc_call_ops *); +extern int nfs_local_commit(struct nfsd_file *, + struct nfs_commit_data *, + const struct rpc_call_ops *, int); +extern bool nfs_server_is_local(const struct nfs_client *clp); + +#else /* CONFIG_NFS_LOCALIO */ +static inline void nfs_local_disable(struct nfs_client *clp) {} +static inline void nfs_local_probe(struct nfs_client *clp) {} +static inline struct nfsd_file * +nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, + struct nfs_fh *fh, const fmode_t mode) +{ + return NULL; +} +static inline int nfs_local_doio(struct nfs_client *clp, + struct nfsd_file *localio, + struct nfs_pgio_header *hdr, + const struct rpc_call_ops *call_ops) +{ + return -EINVAL; +} +static inline int nfs_local_commit(struct nfsd_file *localio, + struct nfs_commit_data *data, + const struct rpc_call_ops *call_ops, int how) +{ + return -EINVAL; +} +static inline bool nfs_server_is_local(const struct nfs_client *clp) +{ + return false; +} +#endif /* CONFIG_NFS_LOCALIO */ + /* super.c */ extern const struct super_operations nfs_sops; bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t); @@ -505,7 +553,6 @@ extern int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio, struct nfs_open_context *ctx, struct folio *folio); extern void nfs_pageio_complete_read(struct nfs_pageio_descriptor *pgio); -extern void nfs_read_prepare(struct rpc_task *task, void *calldata); extern void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio); /* super.c */ @@ -528,7 +575,8 @@ extern int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data, const struct nfs_rpc_ops *nfs_ops, const struct rpc_call_ops *call_ops, - int how, int flags); + int how, int flags, + struct nfsd_file *localio); extern void nfs_init_commit(struct nfs_commit_data *data, struct list_head *head, struct pnfs_layout_segment *lseg, diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c new file mode 100644 index 000000000000..c29cdf51c458 --- /dev/null +++ b/fs/nfs/localio.c @@ -0,0 +1,757 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * NFS client support for local clients to bypass network stack + * + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> + * Copyright (C) 2024 NeilBrown <neilb@suse.de> + */ + +#include <linux/module.h> +#include <linux/errno.h> +#include <linux/vfs.h> +#include <linux/file.h> +#include <linux/inet.h> +#include <linux/sunrpc/addr.h> +#include <linux/inetdevice.h> +#include <net/addrconf.h> +#include <linux/nfs_common.h> +#include <linux/nfslocalio.h> +#include <linux/module.h> +#include <linux/bvec.h> + +#include <linux/nfs.h> +#include <linux/nfs_fs.h> +#include <linux/nfs_xdr.h> + +#include "internal.h" +#include "pnfs.h" +#include "nfstrace.h" + +#define NFSDBG_FACILITY NFSDBG_VFS + +struct nfs_local_kiocb { + struct kiocb kiocb; + struct bio_vec *bvec; + struct nfs_pgio_header *hdr; + struct work_struct work; + struct nfsd_file *localio; +}; + +struct nfs_local_fsync_ctx { + struct nfsd_file *localio; + struct nfs_commit_data *data; + struct work_struct work; + struct kref kref; + struct completion *done; +}; +static void nfs_local_fsync_work(struct work_struct *work); + +static bool localio_enabled __read_mostly = true; +module_param(localio_enabled, bool, 0644); + +static inline bool nfs_client_is_local(const struct nfs_client *clp) +{ + return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags); +} + +bool nfs_server_is_local(const struct nfs_client *clp) +{ + return nfs_client_is_local(clp) && localio_enabled; +} +EXPORT_SYMBOL_GPL(nfs_server_is_local); + +/* + * UUID_IS_LOCAL XDR functions + */ + +static void localio_xdr_enc_uuidargs(struct rpc_rqst *req, + struct xdr_stream *xdr, + const void *data) +{ + const u8 *uuid = data; + + encode_opaque_fixed(xdr, uuid, UUID_SIZE); +} + +static int localio_xdr_dec_uuidres(struct rpc_rqst *req, + struct xdr_stream *xdr, + void *result) +{ + /* void return */ + return 0; +} + +static const struct rpc_procinfo nfs_localio_procedures[] = { + [LOCALIOPROC_UUID_IS_LOCAL] = { + .p_proc = LOCALIOPROC_UUID_IS_LOCAL, + .p_encode = localio_xdr_enc_uuidargs, + .p_decode = localio_xdr_dec_uuidres, + .p_arglen = XDR_QUADLEN(UUID_SIZE), + .p_replen = 0, + .p_statidx = LOCALIOPROC_UUID_IS_LOCAL, + .p_name = "UUID_IS_LOCAL", + }, +}; + +static unsigned int nfs_localio_counts[ARRAY_SIZE(nfs_localio_procedures)]; +static const struct rpc_version nfslocalio_version1 = { + .number = 1, + .nrprocs = ARRAY_SIZE(nfs_localio_procedures), + .procs = nfs_localio_procedures, + .counts = nfs_localio_counts, +}; + +static const struct rpc_version *nfslocalio_version[] = { + [1] = &nfslocalio_version1, +}; + +extern const struct rpc_program nfslocalio_program; +static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program }; + +const struct rpc_program nfslocalio_program = { + .name = "nfslocalio", + .number = NFS_LOCALIO_PROGRAM, + .nrvers = ARRAY_SIZE(nfslocalio_version), + .version = nfslocalio_version, + .stats = &nfslocalio_rpcstat, +}; + +/* + * nfs_local_enable - enable local i/o for an nfs_client + */ +static void nfs_local_enable(struct nfs_client *clp) +{ + spin_lock(&clp->cl_localio_lock); + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags); + trace_nfs_local_enable(clp); + spin_unlock(&clp->cl_localio_lock); +} + +/* + * nfs_local_disable - disable local i/o for an nfs_client + */ +void nfs_local_disable(struct nfs_client *clp) +{ + spin_lock(&clp->cl_localio_lock); + if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) { + trace_nfs_local_disable(clp); + nfs_uuid_invalidate_one_client(&clp->cl_uuid); + } + spin_unlock(&clp->cl_localio_lock); +} + +/* + * nfs_init_localioclient - Initialise an NFS localio client connection + */ +static struct rpc_clnt *nfs_init_localioclient(struct nfs_client *clp) +{ + struct rpc_clnt *rpcclient_localio; + + rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient, + &nfslocalio_program, 1); + + dprintk_rcu("%s: server (%s) %s NFS LOCALIO.\n", + __func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR), + (IS_ERR(rpcclient_localio) ? "does not support" : "supports")); + + return rpcclient_localio; +} + +static bool nfs_server_uuid_is_local(struct nfs_client *clp) +{ + u8 uuid[UUID_SIZE]; + struct rpc_message msg = { + .rpc_argp = &uuid, + }; + struct rpc_clnt *rpcclient_localio; + int status; + + rpcclient_localio = nfs_init_localioclient(clp); + if (IS_ERR(rpcclient_localio)) + return false; + + export_uuid(uuid, &clp->cl_uuid.uuid); + + msg.rpc_proc = &nfs_localio_procedures[LOCALIOPROC_UUID_IS_LOCAL]; + status = rpc_call_sync(rpcclient_localio, &msg, 0); + dprintk("%s: NFS reply UUID_IS_LOCAL: status=%d\n", + __func__, status); + rpc_shutdown_client(rpcclient_localio); + + /* Server is only local if it initialized required struct members */ + if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom) + return false; + + return true; +} + +/* + * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client + * - called after alloc_client and init_client (so cl_rpcclient exists) + * - this function is idempotent, it can be called for old or new clients + */ +void nfs_local_probe(struct nfs_client *clp) +{ + /* Disallow localio if disabled via sysfs or AUTH_SYS isn't used */ + if (!localio_enabled || + clp->cl_rpcclient->cl_auth->au_flavor != RPC_AUTH_UNIX) { + nfs_local_disable(clp); + return; + } + + if (nfs_client_is_local(clp)) { + /* If already enabled, disable and re-enable */ + nfs_local_disable(clp); + } + + nfs_uuid_begin(&clp->cl_uuid); + if (nfs_server_uuid_is_local(clp)) + nfs_local_enable(clp); + nfs_uuid_end(&clp->cl_uuid); +} +EXPORT_SYMBOL_GPL(nfs_local_probe); + +/* + * nfs_local_open_fh - open a local filehandle in terms of nfsd_file + * + * Returns a pointer to a struct nfsd_file or NULL + */ +struct nfsd_file * +nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, + struct nfs_fh *fh, const fmode_t mode) +{ + struct nfsd_file *localio; + int status; + + if (!nfs_server_is_local(clp)) + return NULL; + if (mode & ~(FMODE_READ | FMODE_WRITE)) + return NULL; + + localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient, + cred, fh, mode); + if (IS_ERR(localio)) { + status = PTR_ERR(localio); + trace_nfs_local_open_fh(fh, mode, status); + switch (status) { + case -ENOMEM: + case -ENXIO: + case -ENOENT: + /* Revalidate localio, will disable if unsupported */ + nfs_local_probe(clp); + } + return NULL; + } + return localio; +} +EXPORT_SYMBOL_GPL(nfs_local_open_fh); + +static struct bio_vec * +nfs_bvec_alloc_and_import_pagevec(struct page **pagevec, + unsigned int npages, gfp_t flags) +{ + struct bio_vec *bvec, *p; + + bvec = kmalloc_array(npages, sizeof(*bvec), flags); + if (bvec != NULL) { + for (p = bvec; npages > 0; p++, pagevec++, npages--) { + p->bv_page = *pagevec; + p->bv_len = PAGE_SIZE; + p->bv_offset = 0; + } + } + return bvec; +} + +static void +nfs_local_iocb_free(struct nfs_local_kiocb *iocb) +{ + kfree(iocb->bvec); + kfree(iocb); +} + +static struct nfs_local_kiocb * +nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, + struct nfsd_file *localio, gfp_t flags) +{ + struct nfs_local_kiocb *iocb; + + iocb = kmalloc(sizeof(*iocb), flags); + if (iocb == NULL) + return NULL; + iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec, + hdr->page_array.npages, flags); + if (iocb->bvec == NULL) { + kfree(iocb); + return NULL; + } + init_sync_kiocb(&iocb->kiocb, nfs_to->nfsd_file_file(localio)); + iocb->kiocb.ki_pos = hdr->args.offset; + iocb->localio = localio; + iocb->hdr = hdr; + iocb->kiocb.ki_flags &= ~IOCB_APPEND; + return iocb; +} + +static void +nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir) +{ + struct nfs_pgio_header *hdr = iocb->hdr; + + iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages, + hdr->args.count + hdr->args.pgbase); + if (hdr->args.pgbase != 0) + iov_iter_advance(i, hdr->args.pgbase); +} + +static void +nfs_local_hdr_release(struct nfs_pgio_header *hdr, + const struct rpc_call_ops *call_ops) +{ + call_ops->rpc_call_done(&hdr->task, hdr); + call_ops->rpc_release(hdr); +} + +static void +nfs_local_pgio_init(struct nfs_pgio_header *hdr, + const struct rpc_call_ops *call_ops) +{ + hdr->task.tk_ops = call_ops; + if (!hdr->task.tk_start) + hdr->task.tk_start = ktime_get(); +} + +static void +nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status) +{ + if (status >= 0) { + hdr->res.count = status; + hdr->res.op_status = NFS4_OK; + hdr->task.tk_status = 0; + } else { + hdr->res.op_status = nfs4_stat_to_errno(status); + hdr->task.tk_status = status; + } +} + +static void +nfs_local_pgio_release(struct nfs_local_kiocb *iocb) +{ + struct nfs_pgio_header *hdr = iocb->hdr; + + nfs_to->nfsd_file_put_local(iocb->localio); + nfs_local_iocb_free(iocb); + nfs_local_hdr_release(hdr, hdr->task.tk_ops); +} + +static void +nfs_local_read_done(struct nfs_local_kiocb *iocb, long status) +{ + struct nfs_pgio_header *hdr = iocb->hdr; + struct file *filp = iocb->kiocb.ki_filp; + + nfs_local_pgio_done(hdr, status); + + if (hdr->res.count != hdr->args.count || + hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp))) + hdr->res.eof = true; + + dprintk("%s: read %ld bytes eof %d.\n", __func__, + status > 0 ? status : 0, hdr->res.eof); +} + +static void nfs_local_call_read(struct work_struct *work) +{ + struct nfs_local_kiocb *iocb = + container_of(work, struct nfs_local_kiocb, work); + struct file *filp = iocb->kiocb.ki_filp; + const struct cred *save_cred; + struct iov_iter iter; + ssize_t status; + + save_cred = override_creds(filp->f_cred); + + nfs_local_iter_init(&iter, iocb, READ); + + status = filp->f_op->read_iter(&iocb->kiocb, &iter); + WARN_ON_ONCE(status == -EIOCBQUEUED); + + nfs_local_read_done(iocb, status); + nfs_local_pgio_release(iocb); + + revert_creds(save_cred); +} + +static int +nfs_do_local_read(struct nfs_pgio_header *hdr, + struct nfsd_file *localio, + const struct rpc_call_ops *call_ops) +{ + struct nfs_local_kiocb *iocb; + + dprintk("%s: vfs_read count=%u pos=%llu\n", + __func__, hdr->args.count, hdr->args.offset); + + iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL); + if (iocb == NULL) + return -ENOMEM; + + nfs_local_pgio_init(hdr, call_ops); + hdr->res.eof = false; + + INIT_WORK(&iocb->work, nfs_local_call_read); + queue_work(nfslocaliod_workqueue, &iocb->work); + + return 0; +} + +static void +nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode) +{ + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; + u32 *verf = (u32 *)verifier->data; + int seq = 0; + + do { + read_seqbegin_or_lock(&clp->cl_boot_lock, &seq); + verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec; + verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec; + } while (need_seqretry(&clp->cl_boot_lock, seq)); + done_seqretry(&clp->cl_boot_lock, seq); +} + +static void +nfs_reset_boot_verifier(struct inode *inode) +{ + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; + + write_seqlock(&clp->cl_boot_lock); + ktime_get_real_ts64(&clp->cl_nfssvc_boot); + write_sequnlock(&clp->cl_boot_lock); +} + +static void +nfs_set_local_verifier(struct inode *inode, + struct nfs_writeverf *verf, + enum nfs3_stable_how how) +{ + nfs_copy_boot_verifier(&verf->verifier, inode); + verf->committed = how; +} + +/* Factored out from fs/nfsd/vfs.h:fh_getattr() */ +static int __vfs_getattr(struct path *p, struct kstat *stat, int version) +{ + u32 request_mask = STATX_BASIC_STATS; + + if (version == 4) + request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE); + return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT); +} + +/* Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute() */ +static u64 __nfsd4_change_attribute(const struct kstat *stat, + const struct inode *inode) +{ + u64 chattr; + + if (stat->result_mask & STATX_CHANGE_COOKIE) { + chattr = stat->change_cookie; + if (S_ISREG(inode->i_mode) && + !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) { + chattr += (u64)stat->ctime.tv_sec << 30; + chattr += stat->ctime.tv_nsec; + } + } else { + chattr = time_to_chattr(&stat->ctime); + } + return chattr; +} + +static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb) +{ + struct kstat stat; + struct file *filp = iocb->kiocb.ki_filp; + struct nfs_pgio_header *hdr = iocb->hdr; + struct nfs_fattr *fattr = hdr->res.fattr; + int version = NFS_PROTO(hdr->inode)->version; + + if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version)) + return; + + fattr->valid = (NFS_ATTR_FATTR_FILEID | + NFS_ATTR_FATTR_CHANGE | + NFS_ATTR_FATTR_SIZE | + NFS_ATTR_FATTR_ATIME | + NFS_ATTR_FATTR_MTIME | + NFS_ATTR_FATTR_CTIME | + NFS_ATTR_FATTR_SPACE_USED); + + fattr->fileid = stat.ino; + fattr->size = stat.size; + fattr->atime = stat.atime; + fattr->mtime = stat.mtime; + fattr->ctime = stat.ctime; + if (version == 4) { + fattr->change_attr = + __nfsd4_change_attribute(&stat, file_inode(filp)); + } else + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime); + fattr->du.nfs3.used = stat.blocks << 9; +} + +static void +nfs_local_write_done(struct nfs_local_kiocb *iocb, long status) +{ + struct nfs_pgio_header *hdr = iocb->hdr; + struct inode *inode = hdr->inode; + + dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0); + + /* Handle short writes as if they are ENOSPC */ + if (status > 0 && status < hdr->args.count) { + hdr->mds_offset += status; + hdr->args.offset += status; + hdr->args.pgbase += status; + hdr->args.count -= status; + nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset); + status = -ENOSPC; + } + if (status < 0) + nfs_reset_boot_verifier(inode); + else if (nfs_should_remove_suid(inode)) { + /* Deal with the suid/sgid bit corner case */ + spin_lock(&inode->i_lock); + nfs_set_cache_invalid(inode, NFS_INO_INVALID_MODE); + spin_unlock(&inode->i_lock); + } + nfs_local_pgio_done(hdr, status); +} + +static void nfs_local_call_write(struct work_struct *work) +{ + struct nfs_local_kiocb *iocb = + container_of(work, struct nfs_local_kiocb, work); + struct file *filp = iocb->kiocb.ki_filp; + unsigned long old_flags = current->flags; + const struct cred *save_cred; + struct iov_iter iter; + ssize_t status; + + current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO; + save_cred = override_creds(filp->f_cred); + + nfs_local_iter_init(&iter, iocb, WRITE); + + file_start_write(filp); + status = filp->f_op->write_iter(&iocb->kiocb, &iter); + file_end_write(filp); + WARN_ON_ONCE(status == -EIOCBQUEUED); + + nfs_local_write_done(iocb, status); + nfs_local_vfs_getattr(iocb); + nfs_local_pgio_release(iocb); + + revert_creds(save_cred); + current->flags = old_flags; +} + +static int +nfs_do_local_write(struct nfs_pgio_header *hdr, + struct nfsd_file *localio, + const struct rpc_call_ops *call_ops) +{ + struct nfs_local_kiocb *iocb; + + dprintk("%s: vfs_write count=%u pos=%llu %s\n", + __func__, hdr->args.count, hdr->args.offset, + (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable"); + + iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO); + if (iocb == NULL) + return -ENOMEM; + + switch (hdr->args.stable) { + default: + break; + case NFS_DATA_SYNC: + iocb->kiocb.ki_flags |= IOCB_DSYNC; + break; + case NFS_FILE_SYNC: + iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC; + } + nfs_local_pgio_init(hdr, call_ops); + + nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable); + + INIT_WORK(&iocb->work, nfs_local_call_write); + queue_work(nfslocaliod_workqueue, &iocb->work); + + return 0; +} + +int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio, + struct nfs_pgio_header *hdr, + const struct rpc_call_ops *call_ops) +{ + int status = 0; + struct file *filp = nfs_to->nfsd_file_file(localio); + + if (!hdr->args.count) + return 0; + /* Don't support filesystems without read_iter/write_iter */ + if (!filp->f_op->read_iter || !filp->f_op->write_iter) { + nfs_local_disable(clp); + status = -EAGAIN; + goto out; + } + + switch (hdr->rw_mode) { + case FMODE_READ: + status = nfs_do_local_read(hdr, localio, call_ops); + break; + case FMODE_WRITE: + status = nfs_do_local_write(hdr, localio, call_ops); + break; + default: + dprintk("%s: invalid mode: %d\n", __func__, + hdr->rw_mode); + status = -EINVAL; + } +out: + if (status != 0) { + nfs_to->nfsd_file_put_local(localio); + hdr->task.tk_status = status; + nfs_local_hdr_release(hdr, call_ops); + } + return status; +} + +static void +nfs_local_init_commit(struct nfs_commit_data *data, + const struct rpc_call_ops *call_ops) +{ + data->task.tk_ops = call_ops; +} + +static int +nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data) +{ + loff_t start = data->args.offset; + loff_t end = LLONG_MAX; + + if (data->args.count > 0) { + end = start + data->args.count - 1; + if (end < start) + end = LLONG_MAX; + } + + dprintk("%s: commit %llu - %llu\n", __func__, start, end); + return vfs_fsync_range(filp, start, end, 0); +} + +static void +nfs_local_commit_done(struct nfs_commit_data *data, int status) +{ + if (status >= 0) { + nfs_set_local_verifier(data->inode, + data->res.verf, + NFS_FILE_SYNC); + data->res.op_status = NFS4_OK; + data->task.tk_status = 0; + } else { + nfs_reset_boot_verifier(data->inode); + data->res.op_status = nfs4_stat_to_errno(status); + data->task.tk_status = status; + } +} + +static void +nfs_local_release_commit_data(struct nfsd_file *localio, + struct nfs_commit_data *data, + const struct rpc_call_ops *call_ops) +{ + nfs_to->nfsd_file_put_local(localio); + call_ops->rpc_call_done(&data->task, data); + call_ops->rpc_release(data); +} + +static struct nfs_local_fsync_ctx * +nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, + struct nfsd_file *localio, gfp_t flags) +{ + struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags); + + if (ctx != NULL) { + ctx->localio = localio; + ctx->data = data; + INIT_WORK(&ctx->work, nfs_local_fsync_work); + kref_init(&ctx->kref); + ctx->done = NULL; + } + return ctx; +} + +static void +nfs_local_fsync_ctx_kref_free(struct kref *kref) +{ + kfree(container_of(kref, struct nfs_local_fsync_ctx, kref)); +} + +static void +nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx) +{ + kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free); +} + +static void +nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx) +{ + nfs_local_release_commit_data(ctx->localio, ctx->data, + ctx->data->task.tk_ops); + nfs_local_fsync_ctx_put(ctx); +} + +static void +nfs_local_fsync_work(struct work_struct *work) +{ + struct nfs_local_fsync_ctx *ctx; + int status; + + ctx = container_of(work, struct nfs_local_fsync_ctx, work); + + status = nfs_local_run_commit(nfs_to->nfsd_file_file(ctx->localio), + ctx->data); + nfs_local_commit_done(ctx->data, status); + if (ctx->done != NULL) + complete(ctx->done); + nfs_local_fsync_ctx_free(ctx); +} + +int nfs_local_commit(struct nfsd_file *localio, + struct nfs_commit_data *data, + const struct rpc_call_ops *call_ops, int how) +{ + struct nfs_local_fsync_ctx *ctx; + + ctx = nfs_local_fsync_ctx_alloc(data, localio, GFP_KERNEL); + if (!ctx) { + nfs_local_commit_done(data, -ENOMEM); + nfs_local_release_commit_data(localio, data, call_ops); + return -ENOMEM; + } + + nfs_local_init_commit(data, call_ops); + kref_get(&ctx->kref); + if (how & FLUSH_SYNC) { + DECLARE_COMPLETION_ONSTACK(done); + ctx->done = &done; + queue_work(nfsiod_workqueue, &ctx->work); + wait_for_completion(&done); + } else + queue_work(nfsiod_workqueue, &ctx->work); + nfs_local_fsync_ctx_put(ctx); + return 0; +} diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c index c19093814296..6e75c6c2d234 100644 --- a/fs/nfs/nfs2xdr.c +++ b/fs/nfs/nfs2xdr.c @@ -22,14 +22,12 @@ #include <linux/nfs.h> #include <linux/nfs2.h> #include <linux/nfs_fs.h> +#include <linux/nfs_common.h> #include "nfstrace.h" #include "internal.h" #define NFSDBG_FACILITY NFSDBG_XDR -/* Mapping from NFS error code to "errno" error code. */ -#define errno_NFSERR_IO EIO - /* * Declare the space requirements for NFS arguments and replies as * number of 32bit-words @@ -64,8 +62,6 @@ #define NFS_readdirres_sz (1+NFS_pagepad_sz) #define NFS_statfsres_sz (1+NFS_info_sz) -static int nfs_stat_to_errno(enum nfs_stat); - /* * Encode/decode NFSv2 basic data types * @@ -1054,70 +1050,6 @@ out_default: return nfs_stat_to_errno(status); } - -/* - * We need to translate between nfs status return values and - * the local errno values which may not be the same. - */ -static const struct { - int stat; - int errno; -} nfs_errtbl[] = { - { NFS_OK, 0 }, - { NFSERR_PERM, -EPERM }, - { NFSERR_NOENT, -ENOENT }, - { NFSERR_IO, -errno_NFSERR_IO}, - { NFSERR_NXIO, -ENXIO }, -/* { NFSERR_EAGAIN, -EAGAIN }, */ - { NFSERR_ACCES, -EACCES }, - { NFSERR_EXIST, -EEXIST }, - { NFSERR_XDEV, -EXDEV }, - { NFSERR_NODEV, -ENODEV }, - { NFSERR_NOTDIR, -ENOTDIR }, - { NFSERR_ISDIR, -EISDIR }, - { NFSERR_INVAL, -EINVAL }, - { NFSERR_FBIG, -EFBIG }, - { NFSERR_NOSPC, -ENOSPC }, - { NFSERR_ROFS, -EROFS }, - { NFSERR_MLINK, -EMLINK }, - { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, - { NFSERR_NOTEMPTY, -ENOTEMPTY }, - { NFSERR_DQUOT, -EDQUOT }, - { NFSERR_STALE, -ESTALE }, - { NFSERR_REMOTE, -EREMOTE }, -#ifdef EWFLUSH - { NFSERR_WFLUSH, -EWFLUSH }, -#endif - { NFSERR_BADHANDLE, -EBADHANDLE }, - { NFSERR_NOT_SYNC, -ENOTSYNC }, - { NFSERR_BAD_COOKIE, -EBADCOOKIE }, - { NFSERR_NOTSUPP, -ENOTSUPP }, - { NFSERR_TOOSMALL, -ETOOSMALL }, - { NFSERR_SERVERFAULT, -EREMOTEIO }, - { NFSERR_BADTYPE, -EBADTYPE }, - { NFSERR_JUKEBOX, -EJUKEBOX }, - { -1, -EIO } -}; - -/** - * nfs_stat_to_errno - convert an NFS status code to a local errno - * @status: NFS status code to convert - * - * Returns a local errno value, or -EIO if the NFS status code is - * not recognized. This function is used jointly by NFSv2 and NFSv3. - */ -static int nfs_stat_to_errno(enum nfs_stat status) -{ - int i; - - for (i = 0; nfs_errtbl[i].stat != -1; i++) { - if (nfs_errtbl[i].stat == (int)status) - return nfs_errtbl[i].errno; - } - dprintk("NFS: Unrecognized nfs status value: %u\n", status); - return nfs_errtbl[i].errno; -} - #define PROC(proc, argtype, restype, timer) \ [NFSPROC_##proc] = { \ .p_proc = NFSPROC_##proc, \ diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c index 60f032be805a..4ae01c10b7e2 100644 --- a/fs/nfs/nfs3xdr.c +++ b/fs/nfs/nfs3xdr.c @@ -21,14 +21,13 @@ #include <linux/nfs3.h> #include <linux/nfs_fs.h> #include <linux/nfsacl.h> +#include <linux/nfs_common.h> + #include "nfstrace.h" #include "internal.h" #define NFSDBG_FACILITY NFSDBG_XDR -/* Mapping from NFS error code to "errno" error code. */ -#define errno_NFSERR_IO EIO - /* * Declare the space requirements for NFS arguments and replies as * number of 32bit-words @@ -91,8 +90,6 @@ NFS3_pagepad_sz) #define ACL3_setaclres_sz (1+NFS3_post_op_attr_sz) -static int nfs3_stat_to_errno(enum nfs_stat); - /* * Map file type to S_IFMT bits */ @@ -1406,7 +1403,7 @@ static int nfs3_xdr_dec_getattr3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1445,7 +1442,7 @@ static int nfs3_xdr_dec_setattr3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1495,7 +1492,7 @@ out_default: error = decode_post_op_attr(xdr, result->dir_attr, userns); if (unlikely(error)) goto out; - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1537,7 +1534,7 @@ static int nfs3_xdr_dec_access3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1578,7 +1575,7 @@ static int nfs3_xdr_dec_readlink3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1658,7 +1655,7 @@ static int nfs3_xdr_dec_read3res(struct rpc_rqst *req, struct xdr_stream *xdr, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1728,7 +1725,7 @@ static int nfs3_xdr_dec_write3res(struct rpc_rqst *req, struct xdr_stream *xdr, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1795,7 +1792,7 @@ out_default: error = decode_wcc_data(xdr, result->dir_attr, userns); if (unlikely(error)) goto out; - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1835,7 +1832,7 @@ static int nfs3_xdr_dec_remove3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1881,7 +1878,7 @@ static int nfs3_xdr_dec_rename3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -1926,7 +1923,7 @@ static int nfs3_xdr_dec_link3res(struct rpc_rqst *req, struct xdr_stream *xdr, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /** @@ -2101,7 +2098,7 @@ out_default: error = decode_post_op_attr(xdr, result->dir_attr, rpc_rqst_userns(req)); if (unlikely(error)) goto out; - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2167,7 +2164,7 @@ static int nfs3_xdr_dec_fsstat3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2243,7 +2240,7 @@ static int nfs3_xdr_dec_fsinfo3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2304,7 +2301,7 @@ static int nfs3_xdr_dec_pathconf3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } /* @@ -2350,7 +2347,7 @@ static int nfs3_xdr_dec_commit3res(struct rpc_rqst *req, out: return error; out_status: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } #ifdef CONFIG_NFS_V3_ACL @@ -2416,7 +2413,7 @@ static int nfs3_xdr_dec_getacl3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req, @@ -2435,76 +2432,11 @@ static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req, out: return error; out_default: - return nfs3_stat_to_errno(status); + return nfs_stat_to_errno(status); } #endif /* CONFIG_NFS_V3_ACL */ - -/* - * We need to translate between nfs status return values and - * the local errno values which may not be the same. - */ -static const struct { - int stat; - int errno; -} nfs_errtbl[] = { - { NFS_OK, 0 }, - { NFSERR_PERM, -EPERM }, - { NFSERR_NOENT, -ENOENT }, - { NFSERR_IO, -errno_NFSERR_IO}, - { NFSERR_NXIO, -ENXIO }, -/* { NFSERR_EAGAIN, -EAGAIN }, */ - { NFSERR_ACCES, -EACCES }, - { NFSERR_EXIST, -EEXIST }, - { NFSERR_XDEV, -EXDEV }, - { NFSERR_NODEV, -ENODEV }, - { NFSERR_NOTDIR, -ENOTDIR }, - { NFSERR_ISDIR, -EISDIR }, - { NFSERR_INVAL, -EINVAL }, - { NFSERR_FBIG, -EFBIG }, - { NFSERR_NOSPC, -ENOSPC }, - { NFSERR_ROFS, -EROFS }, - { NFSERR_MLINK, -EMLINK }, - { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, - { NFSERR_NOTEMPTY, -ENOTEMPTY }, - { NFSERR_DQUOT, -EDQUOT }, - { NFSERR_STALE, -ESTALE }, - { NFSERR_REMOTE, -EREMOTE }, -#ifdef EWFLUSH - { NFSERR_WFLUSH, -EWFLUSH }, -#endif - { NFSERR_BADHANDLE, -EBADHANDLE }, - { NFSERR_NOT_SYNC, -ENOTSYNC }, - { NFSERR_BAD_COOKIE, -EBADCOOKIE }, - { NFSERR_NOTSUPP, -ENOTSUPP }, - { NFSERR_TOOSMALL, -ETOOSMALL }, - { NFSERR_SERVERFAULT, -EREMOTEIO }, - { NFSERR_BADTYPE, -EBADTYPE }, - { NFSERR_JUKEBOX, -EJUKEBOX }, - { -1, -EIO } -}; - -/** - * nfs3_stat_to_errno - convert an NFS status code to a local errno - * @status: NFS status code to convert - * - * Returns a local errno value, or -EIO if the NFS status code is - * not recognized. This function is used jointly by NFSv2 and NFSv3. - */ -static int nfs3_stat_to_errno(enum nfs_stat status) -{ - int i; - - for (i = 0; nfs_errtbl[i].stat != -1; i++) { - if (nfs_errtbl[i].stat == (int)status) - return nfs_errtbl[i].errno; - } - dprintk("NFS: Unrecognized nfs status value: %u\n", status); - return nfs_errtbl[i].errno; -} - - #define PROC(proc, argtype, restype, timer) \ [NFS3PROC_##proc] = { \ .p_proc = NFS3PROC_##proc, \ diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index c2045a2a9d0f..7d383d29a995 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -83,7 +83,7 @@ struct nfs4_minor_version_ops { #define NFS_SEQID_CONFIRMED 1 struct nfs_seqid_counter { ktime_t create_time; - int owner_id; + u64 owner_id; int flags; u32 counter; spinlock_t lock; /* Protects the list */ diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index b8ffbe52ba15..cd2fbde2e6d7 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -3904,6 +3904,18 @@ static void nfs4_close_context(struct nfs_open_context *ctx, int is_sync) #define FATTR4_WORD2_NFS41_MASK (2*FATTR4_WORD2_SUPPATTR_EXCLCREAT - 1UL) #define FATTR4_WORD2_NFS42_MASK (2*FATTR4_WORD2_OPEN_ARGUMENTS - 1UL) +#define FATTR4_WORD2_NFS42_TIME_DELEG_MASK \ + (FATTR4_WORD2_TIME_DELEG_MODIFY|FATTR4_WORD2_TIME_DELEG_ACCESS) +static bool nfs4_server_delegtime_capable(struct nfs4_server_caps_res *res) +{ + u32 share_access_want = res->open_caps.oa_share_access_want[0]; + u32 attr_bitmask = res->attr_bitmask[2]; + + return (share_access_want & NFS4_SHARE_WANT_DELEG_TIMESTAMPS) && + ((attr_bitmask & FATTR4_WORD2_NFS42_TIME_DELEG_MASK) == + FATTR4_WORD2_NFS42_TIME_DELEG_MASK); +} + static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle) { u32 minorversion = server->nfs_client->cl_minorversion; @@ -3982,8 +3994,6 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f #endif if (res.attr_bitmask[0] & FATTR4_WORD0_FS_LOCATIONS) server->caps |= NFS_CAP_FS_LOCATIONS; - if (res.attr_bitmask[2] & FATTR4_WORD2_TIME_DELEG_MODIFY) - server->caps |= NFS_CAP_DELEGTIME; if (!(res.attr_bitmask[0] & FATTR4_WORD0_FILEID)) server->fattr_valid &= ~NFS_ATTR_FATTR_FILEID; if (!(res.attr_bitmask[1] & FATTR4_WORD1_MODE)) @@ -4011,6 +4021,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f if (res.open_caps.oa_share_access_want[0] & NFS4_SHARE_WANT_OPEN_XOR_DELEGATION) server->caps |= NFS_CAP_OPEN_XOR; + if (nfs4_server_delegtime_capable(&res)) + server->caps |= NFS_CAP_DELEGTIME; memcpy(server->cache_consistency_bitmask, res.attr_bitmask, sizeof(server->cache_consistency_bitmask)); server->cache_consistency_bitmask[0] &= FATTR4_WORD0_CHANGE|FATTR4_WORD0_SIZE; diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 877f682b45f2..581864a15888 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -501,11 +501,7 @@ nfs4_alloc_state_owner(struct nfs_server *server, sp = kzalloc(sizeof(*sp), gfp_flags); if (!sp) return NULL; - sp->so_seqid.owner_id = ida_alloc(&server->openowner_id, gfp_flags); - if (sp->so_seqid.owner_id < 0) { - kfree(sp); - return NULL; - } + sp->so_seqid.owner_id = atomic64_inc_return(&server->owner_ctr); sp->so_server = server; sp->so_cred = get_cred(cred); spin_lock_init(&sp->so_lock); @@ -536,7 +532,6 @@ static void nfs4_free_state_owner(struct nfs4_state_owner *sp) { nfs4_destroy_seqid_counter(&sp->so_seqid); put_cred(sp->so_cred); - ida_free(&sp->so_server->openowner_id, sp->so_seqid.owner_id); kfree(sp); } @@ -879,19 +874,13 @@ static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, f refcount_set(&lsp->ls_count, 1); lsp->ls_state = state; lsp->ls_owner = owner; - lsp->ls_seqid.owner_id = ida_alloc(&server->lockowner_id, GFP_KERNEL_ACCOUNT); - if (lsp->ls_seqid.owner_id < 0) - goto out_free; + lsp->ls_seqid.owner_id = atomic64_inc_return(&server->owner_ctr); INIT_LIST_HEAD(&lsp->ls_locks); return lsp; -out_free: - kfree(lsp); - return NULL; } void nfs4_free_lock_state(struct nfs_server *server, struct nfs4_lock_state *lsp) { - ida_free(&server->lockowner_id, lsp->ls_seqid.owner_id); nfs4_destroy_seqid_counter(&lsp->ls_seqid); kfree(lsp); } @@ -1957,6 +1946,7 @@ restart: set_bit(ops->owner_flag_bit, &sp->so_flags); nfs4_put_state_owner(sp); status = nfs4_recovery_handle_error(clp, status); + nfs4_free_state_owners(&freeme); return (status != 0) ? status : -EAGAIN; } @@ -2023,6 +2013,12 @@ static int nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status) nfs_mark_client_ready(clp, -EPERM); clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state); return -EPERM; + case -ETIMEDOUT: + if (clp->cl_cons_state == NFS_CS_SESSION_INITING) { + nfs_mark_client_ready(clp, -EIO); + return -EIO; + } + fallthrough; case -EACCES: case -NFS4ERR_DELAY: case -EAGAIN: diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c index 61190d6a5a77..e8ac3f615f93 100644 --- a/fs/nfs/nfs4xdr.c +++ b/fs/nfs/nfs4xdr.c @@ -52,6 +52,7 @@ #include <linux/nfs.h> #include <linux/nfs4.h> #include <linux/nfs_fs.h> +#include <linux/nfs_common.h> #include "nfs4_fs.h" #include "nfs4trace.h" @@ -63,11 +64,7 @@ #define NFSDBG_FACILITY NFSDBG_XDR -/* Mapping from NFS error code to "errno" error code. */ -#define errno_NFSERR_IO EIO - struct compound_hdr; -static int nfs4_stat_to_errno(int); static void encode_layoutget(struct xdr_stream *xdr, const struct nfs4_layoutget_args *args, struct compound_hdr *hdr); @@ -975,11 +972,6 @@ static __be32 *reserve_space(struct xdr_stream *xdr, size_t nbytes) return p; } -static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len) -{ - WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0); -} - static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str) { WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0); @@ -1424,12 +1416,12 @@ static inline void encode_openhdr(struct xdr_stream *xdr, const struct nfs_opena */ encode_nfs4_seqid(xdr, arg->seqid); encode_share_access(xdr, arg->share_access); - p = reserve_space(xdr, 36); + p = reserve_space(xdr, 40); p = xdr_encode_hyper(p, arg->clientid); - *p++ = cpu_to_be32(24); + *p++ = cpu_to_be32(28); p = xdr_encode_opaque_fixed(p, "open id:", 8); *p++ = cpu_to_be32(arg->server->s_dev); - *p++ = cpu_to_be32(arg->id.uniquifier); + p = xdr_encode_hyper(p, arg->id.uniquifier); xdr_encode_hyper(p, arg->id.create_time); } @@ -4408,14 +4400,6 @@ static int decode_access(struct xdr_stream *xdr, u32 *supported, u32 *access) return 0; } -static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len) -{ - ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len); - if (unlikely(ret < 0)) - return -EIO; - return 0; -} - static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid) { return decode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE); @@ -7620,72 +7604,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry, return 0; } -/* - * We need to translate between nfs status return values and - * the local errno values which may not be the same. - */ -static struct { - int stat; - int errno; -} nfs_errtbl[] = { - { NFS4_OK, 0 }, - { NFS4ERR_PERM, -EPERM }, - { NFS4ERR_NOENT, -ENOENT }, - { NFS4ERR_IO, -errno_NFSERR_IO}, - { NFS4ERR_NXIO, -ENXIO }, - { NFS4ERR_ACCESS, -EACCES }, - { NFS4ERR_EXIST, -EEXIST }, - { NFS4ERR_XDEV, -EXDEV }, - { NFS4ERR_NOTDIR, -ENOTDIR }, - { NFS4ERR_ISDIR, -EISDIR }, - { NFS4ERR_INVAL, -EINVAL }, - { NFS4ERR_FBIG, -EFBIG }, - { NFS4ERR_NOSPC, -ENOSPC }, - { NFS4ERR_ROFS, -EROFS }, - { NFS4ERR_MLINK, -EMLINK }, - { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG }, - { NFS4ERR_NOTEMPTY, -ENOTEMPTY }, - { NFS4ERR_DQUOT, -EDQUOT }, - { NFS4ERR_STALE, -ESTALE }, - { NFS4ERR_BADHANDLE, -EBADHANDLE }, - { NFS4ERR_BAD_COOKIE, -EBADCOOKIE }, - { NFS4ERR_NOTSUPP, -ENOTSUPP }, - { NFS4ERR_TOOSMALL, -ETOOSMALL }, - { NFS4ERR_SERVERFAULT, -EREMOTEIO }, - { NFS4ERR_BADTYPE, -EBADTYPE }, - { NFS4ERR_LOCKED, -EAGAIN }, - { NFS4ERR_SYMLINK, -ELOOP }, - { NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP }, - { NFS4ERR_DEADLOCK, -EDEADLK }, - { NFS4ERR_NOXATTR, -ENODATA }, - { NFS4ERR_XATTR2BIG, -E2BIG }, - { -1, -EIO } -}; - -/* - * Convert an NFS error code to a local one. - * This one is used jointly by NFSv2 and NFSv3. - */ -static int -nfs4_stat_to_errno(int stat) -{ - int i; - for (i = 0; nfs_errtbl[i].stat != -1; i++) { - if (nfs_errtbl[i].stat == stat) - return nfs_errtbl[i].errno; - } - if (stat <= 10000 || stat > 10100) { - /* The server is looney tunes. */ - return -EREMOTEIO; - } - /* If we cannot translate the error, the recovery routines should - * handle it. - * Note: remaining NFSv4 error codes have values > 10000, so should - * not conflict with native Linux error codes. - */ - return -stat; -} - #ifdef CONFIG_NFS_V4_2 #include "nfs42xdr.c" #endif /* CONFIG_NFS_V4_2 */ diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h index 352fdaed4075..1eab98c277fa 100644 --- a/fs/nfs/nfstrace.h +++ b/fs/nfs/nfstrace.h @@ -1685,6 +1685,67 @@ TRACE_EVENT(nfs_mount_path, TP_printk("path='%s'", __get_str(path)) ); +TRACE_EVENT(nfs_local_open_fh, + TP_PROTO( + const struct nfs_fh *fh, + fmode_t fmode, + int error + ), + + TP_ARGS(fh, fmode, error), + + TP_STRUCT__entry( + __field(int, error) + __field(u32, fhandle) + __field(unsigned int, fmode) + ), + + TP_fast_assign( + __entry->error = error; + __entry->fhandle = nfs_fhandle_hash(fh); + __entry->fmode = (__force unsigned int)fmode; + ), + + TP_printk( + "error=%d fhandle=0x%08x mode=%s", + __entry->error, + __entry->fhandle, + show_fs_fmode_flags(__entry->fmode) + ) +); + +DECLARE_EVENT_CLASS(nfs_local_client_event, + TP_PROTO( + const struct nfs_client *clp + ), + + TP_ARGS(clp), + + TP_STRUCT__entry( + __field(unsigned int, protocol) + __string(server, clp->cl_hostname) + ), + + TP_fast_assign( + __entry->protocol = clp->rpc_ops->version; + __assign_str(server); + ), + + TP_printk( + "server=%s NFSv%u", __get_str(server), __entry->protocol + ) +); + +#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \ + DEFINE_EVENT(nfs_local_client_event, name, \ + TP_PROTO( \ + const struct nfs_client *clp \ + ), \ + TP_ARGS(clp)) + +DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable); +DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable); + DECLARE_EVENT_CLASS(nfs_xdr_event, TP_PROTO( const struct xdr_stream *xdr, diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c index 04124f226665..e27c07bd8929 100644 --- a/fs/nfs/pagelist.c +++ b/fs/nfs/pagelist.c @@ -731,7 +731,8 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata) int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr, const struct cred *cred, const struct nfs_rpc_ops *rpc_ops, - const struct rpc_call_ops *call_ops, int how, int flags) + const struct rpc_call_ops *call_ops, int how, int flags, + struct nfsd_file *localio) { struct rpc_task *task; struct rpc_message msg = { @@ -761,6 +762,10 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr, hdr->args.count, (unsigned long long)hdr->args.offset); + if (localio) + return nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client, + localio, hdr, call_ops); + task = rpc_run_task(&task_setup_data); if (IS_ERR(task)) return PTR_ERR(task); @@ -953,6 +958,12 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc) nfs_pgheader_init(desc, hdr, nfs_pgio_header_free); ret = nfs_generic_pgio(desc, hdr); if (ret == 0) { + struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client; + + struct nfsd_file *localio = + nfs_local_open_fh(clp, hdr->cred, + hdr->args.fh, hdr->args.context->mode); + if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion) task_flags = RPC_TASK_MOVEABLE; ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode), @@ -961,7 +972,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc) NFS_PROTO(hdr->inode), desc->pg_rpc_callops, desc->pg_ioflags, - RPC_TASK_CRED_NOREF | task_flags); + RPC_TASK_CRED_NOREF | task_flags, + localio); } return ret; } diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c index a74ee69a2fa6..dbef837e871a 100644 --- a/fs/nfs/pnfs_nfs.c +++ b/fs/nfs/pnfs_nfs.c @@ -490,7 +490,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages, nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(data->inode), data->mds_ops, how, - RPC_TASK_CRED_NOREF); + RPC_TASK_CRED_NOREF, NULL); } else { nfs_init_commit(data, NULL, data->lseg, cinfo); initiate_commit(data, how); diff --git a/fs/nfs/read.c b/fs/nfs/read.c index a6103333b666..81bd1b9aba17 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -48,8 +48,7 @@ static struct nfs_pgio_header *nfs_readhdr_alloc(void) static void nfs_readhdr_free(struct nfs_pgio_header *rhdr) { - if (rhdr->res.scratch != NULL) - kfree(rhdr->res.scratch); + kfree(rhdr->res.scratch); kmem_cache_free(nfs_rdata_cachep, rhdr); } diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 97b386032b71..9723b6c53397 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -551,6 +551,9 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss, else seq_puts(m, ",local_lock=posix"); + if (nfss->flags & NFS_MOUNT_NO_ALIGNWRITE) + seq_puts(m, ",noalignwrite"); + if (nfss->flags & NFS_MOUNT_WRITE_EAGER) { if (nfss->flags & NFS_MOUNT_WRITE_WAIT) seq_puts(m, ",write=wait"); diff --git a/fs/nfs/write.c b/fs/nfs/write.c index d074d0ceb4f0..ead2dc55952d 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -772,8 +772,7 @@ static void nfs_inode_add_request(struct nfs_page *req) nfs_lock_request(req); spin_lock(&mapping->i_private_lock); set_bit(PG_MAPPED, &req->wb_flags); - folio_set_private(folio); - folio->private = req; + folio_attach_private(folio, req); spin_unlock(&mapping->i_private_lock); atomic_long_inc(&nfsi->nrequests); /* this a head request for a page group - mark it as having an @@ -797,8 +796,7 @@ static void nfs_inode_remove_request(struct nfs_page *req) spin_lock(&mapping->i_private_lock); if (likely(folio)) { - folio->private = NULL; - folio_clear_private(folio); + folio_detach_private(folio); clear_bit(PG_MAPPED, &req->wb_head->wb_flags); } spin_unlock(&mapping->i_private_lock); @@ -1297,7 +1295,10 @@ static int nfs_can_extend_write(struct file *file, struct folio *folio, struct file_lock_context *flctx = locks_inode_context(inode); struct file_lock *fl; int ret; + unsigned int mntflags = NFS_SERVER(inode)->flags; + if (mntflags & NFS_MOUNT_NO_ALIGNWRITE) + return 0; if (file->f_flags & O_DSYNC) return 0; if (!nfs_folio_write_uptodate(folio, pagelen)) @@ -1663,7 +1664,8 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release); int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data, const struct nfs_rpc_ops *nfs_ops, const struct rpc_call_ops *call_ops, - int how, int flags) + int how, int flags, + struct nfsd_file *localio) { struct rpc_task *task; int priority = flush_task_priority(how); @@ -1692,6 +1694,9 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data, dprintk("NFS: initiated commit call\n"); + if (localio) + return nfs_local_commit(localio, data, call_ops, how); + task = rpc_run_task(&task_setup_data); if (IS_ERR(task)) return PTR_ERR(task); @@ -1791,6 +1796,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how, struct nfs_commit_info *cinfo) { struct nfs_commit_data *data; + struct nfsd_file *localio; unsigned short task_flags = 0; /* another commit raced with us */ @@ -1807,9 +1813,12 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how, nfs_init_commit(data, head, NULL, cinfo); if (NFS_SERVER(inode)->nfs_client->cl_minorversion) task_flags = RPC_TASK_MOVEABLE; + + localio = nfs_local_open_fh(NFS_SERVER(inode)->nfs_client, data->cred, + data->args.fh, data->context->mode); return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode), data->mds_ops, how, - RPC_TASK_CRED_NOREF | task_flags); + RPC_TASK_CRED_NOREF | task_flags, localio); } /* diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile index 119c75ab9fd0..a5e54809701e 100644 --- a/fs/nfs_common/Makefile +++ b/fs/nfs_common/Makefile @@ -6,5 +6,10 @@ obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o nfs_acl-objs := nfsacl.o +obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o +nfs_localio-objs := nfslocalio.o + obj-$(CONFIG_GRACE_PERIOD) += grace.o obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o + +obj-$(CONFIG_NFS_COMMON) += common.o diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c new file mode 100644 index 000000000000..34a115176f97 --- /dev/null +++ b/fs/nfs_common/common.c @@ -0,0 +1,134 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include <linux/module.h> +#include <linux/nfs_common.h> +#include <linux/nfs4.h> + +/* + * We need to translate between nfs status return values and + * the local errno values which may not be the same. + */ +static const struct { + int stat; + int errno; +} nfs_errtbl[] = { + { NFS_OK, 0 }, + { NFSERR_PERM, -EPERM }, + { NFSERR_NOENT, -ENOENT }, + { NFSERR_IO, -errno_NFSERR_IO}, + { NFSERR_NXIO, -ENXIO }, +/* { NFSERR_EAGAIN, -EAGAIN }, */ + { NFSERR_ACCES, -EACCES }, + { NFSERR_EXIST, -EEXIST }, + { NFSERR_XDEV, -EXDEV }, + { NFSERR_NODEV, -ENODEV }, + { NFSERR_NOTDIR, -ENOTDIR }, + { NFSERR_ISDIR, -EISDIR }, + { NFSERR_INVAL, -EINVAL }, + { NFSERR_FBIG, -EFBIG }, + { NFSERR_NOSPC, -ENOSPC }, + { NFSERR_ROFS, -EROFS }, + { NFSERR_MLINK, -EMLINK }, + { NFSERR_NAMETOOLONG, -ENAMETOOLONG }, + { NFSERR_NOTEMPTY, -ENOTEMPTY }, + { NFSERR_DQUOT, -EDQUOT }, + { NFSERR_STALE, -ESTALE }, + { NFSERR_REMOTE, -EREMOTE }, +#ifdef EWFLUSH + { NFSERR_WFLUSH, -EWFLUSH }, +#endif + { NFSERR_BADHANDLE, -EBADHANDLE }, + { NFSERR_NOT_SYNC, -ENOTSYNC }, + { NFSERR_BAD_COOKIE, -EBADCOOKIE }, + { NFSERR_NOTSUPP, -ENOTSUPP }, + { NFSERR_TOOSMALL, -ETOOSMALL }, + { NFSERR_SERVERFAULT, -EREMOTEIO }, + { NFSERR_BADTYPE, -EBADTYPE }, + { NFSERR_JUKEBOX, -EJUKEBOX }, + { -1, -EIO } +}; + +/** + * nfs_stat_to_errno - convert an NFS status code to a local errno + * @status: NFS status code to convert + * + * Returns a local errno value, or -EIO if the NFS status code is + * not recognized. This function is used jointly by NFSv2 and NFSv3. + */ +int nfs_stat_to_errno(enum nfs_stat status) +{ + int i; + + for (i = 0; nfs_errtbl[i].stat != -1; i++) { + if (nfs_errtbl[i].stat == (int)status) + return nfs_errtbl[i].errno; + } + return nfs_errtbl[i].errno; +} +EXPORT_SYMBOL_GPL(nfs_stat_to_errno); + +/* + * We need to translate between nfs v4 status return values and + * the local errno values which may not be the same. + */ +static const struct { + int stat; + int errno; +} nfs4_errtbl[] = { + { NFS4_OK, 0 }, + { NFS4ERR_PERM, -EPERM }, + { NFS4ERR_NOENT, -ENOENT }, + { NFS4ERR_IO, -errno_NFSERR_IO}, + { NFS4ERR_NXIO, -ENXIO }, + { NFS4ERR_ACCESS, -EACCES }, + { NFS4ERR_EXIST, -EEXIST }, + { NFS4ERR_XDEV, -EXDEV }, + { NFS4ERR_NOTDIR, -ENOTDIR }, + { NFS4ERR_ISDIR, -EISDIR }, + { NFS4ERR_INVAL, -EINVAL }, + { NFS4ERR_FBIG, -EFBIG }, + { NFS4ERR_NOSPC, -ENOSPC }, + { NFS4ERR_ROFS, -EROFS }, + { NFS4ERR_MLINK, -EMLINK }, + { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG }, + { NFS4ERR_NOTEMPTY, -ENOTEMPTY }, + { NFS4ERR_DQUOT, -EDQUOT }, + { NFS4ERR_STALE, -ESTALE }, + { NFS4ERR_BADHANDLE, -EBADHANDLE }, + { NFS4ERR_BAD_COOKIE, -EBADCOOKIE }, + { NFS4ERR_NOTSUPP, -ENOTSUPP }, + { NFS4ERR_TOOSMALL, -ETOOSMALL }, + { NFS4ERR_SERVERFAULT, -EREMOTEIO }, + { NFS4ERR_BADTYPE, -EBADTYPE }, + { NFS4ERR_LOCKED, -EAGAIN }, + { NFS4ERR_SYMLINK, -ELOOP }, + { NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP }, + { NFS4ERR_DEADLOCK, -EDEADLK }, + { NFS4ERR_NOXATTR, -ENODATA }, + { NFS4ERR_XATTR2BIG, -E2BIG }, + { -1, -EIO } +}; + +/* + * Convert an NFS error code to a local one. + * This one is used by NFSv4. + */ +int nfs4_stat_to_errno(int stat) +{ + int i; + for (i = 0; nfs4_errtbl[i].stat != -1; i++) { + if (nfs4_errtbl[i].stat == stat) + return nfs4_errtbl[i].errno; + } + if (stat <= 10000 || stat > 10100) { + /* The server is looney tunes. */ + return -EREMOTEIO; + } + /* If we cannot translate the error, the recovery routines should + * handle it. + * Note: remaining NFSv4 error codes have values > 10000, so should + * not conflict with native Linux error codes. + */ + return -stat; +} +EXPORT_SYMBOL_GPL(nfs4_stat_to_errno); diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c new file mode 100644 index 000000000000..42b479b9191f --- /dev/null +++ b/fs/nfs_common/nfslocalio.c @@ -0,0 +1,172 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> + * Copyright (C) 2024 NeilBrown <neilb@suse.de> + */ + +#include <linux/module.h> +#include <linux/rculist.h> +#include <linux/nfslocalio.h> +#include <net/netns/generic.h> + +MODULE_LICENSE("GPL"); +MODULE_DESCRIPTION("NFS localio protocol bypass support"); + +static DEFINE_SPINLOCK(nfs_uuid_lock); + +/* + * Global list of nfs_uuid_t instances + * that is protected by nfs_uuid_lock. + */ +static LIST_HEAD(nfs_uuids); + +void nfs_uuid_begin(nfs_uuid_t *nfs_uuid) +{ + nfs_uuid->net = NULL; + nfs_uuid->dom = NULL; + uuid_gen(&nfs_uuid->uuid); + + spin_lock(&nfs_uuid_lock); + list_add_tail_rcu(&nfs_uuid->list, &nfs_uuids); + spin_unlock(&nfs_uuid_lock); +} +EXPORT_SYMBOL_GPL(nfs_uuid_begin); + +void nfs_uuid_end(nfs_uuid_t *nfs_uuid) +{ + if (nfs_uuid->net == NULL) { + spin_lock(&nfs_uuid_lock); + list_del_init(&nfs_uuid->list); + spin_unlock(&nfs_uuid_lock); + } +} +EXPORT_SYMBOL_GPL(nfs_uuid_end); + +static nfs_uuid_t * nfs_uuid_lookup_locked(const uuid_t *uuid) +{ + nfs_uuid_t *nfs_uuid; + + list_for_each_entry(nfs_uuid, &nfs_uuids, list) + if (uuid_equal(&nfs_uuid->uuid, uuid)) + return nfs_uuid; + + return NULL; +} + +static struct module *nfsd_mod; + +void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list, + struct net *net, struct auth_domain *dom, + struct module *mod) +{ + nfs_uuid_t *nfs_uuid; + + spin_lock(&nfs_uuid_lock); + nfs_uuid = nfs_uuid_lookup_locked(uuid); + if (nfs_uuid) { + kref_get(&dom->ref); + nfs_uuid->dom = dom; + /* + * We don't hold a ref on the net, but instead put + * ourselves on a list so the net pointer can be + * invalidated. + */ + list_move(&nfs_uuid->list, list); + rcu_assign_pointer(nfs_uuid->net, net); + + __module_get(mod); + nfsd_mod = mod; + } + spin_unlock(&nfs_uuid_lock); +} +EXPORT_SYMBOL_GPL(nfs_uuid_is_local); + +static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid) +{ + if (nfs_uuid->net) { + module_put(nfsd_mod); + nfs_uuid->net = NULL; + } + if (nfs_uuid->dom) { + auth_domain_put(nfs_uuid->dom); + nfs_uuid->dom = NULL; + } + list_del_init(&nfs_uuid->list); +} + +void nfs_uuid_invalidate_clients(struct list_head *list) +{ + nfs_uuid_t *nfs_uuid, *tmp; + + spin_lock(&nfs_uuid_lock); + list_for_each_entry_safe(nfs_uuid, tmp, list, list) + nfs_uuid_put_locked(nfs_uuid); + spin_unlock(&nfs_uuid_lock); +} +EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_clients); + +void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid) +{ + if (nfs_uuid->net) { + spin_lock(&nfs_uuid_lock); + nfs_uuid_put_locked(nfs_uuid); + spin_unlock(&nfs_uuid_lock); + } +} +EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client); + +struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid, + struct rpc_clnt *rpc_clnt, const struct cred *cred, + const struct nfs_fh *nfs_fh, const fmode_t fmode) +{ + struct net *net; + struct nfsd_file *localio; + + /* + * Not running in nfsd context, so must safely get reference on nfsd_serv. + * But the server may already be shutting down, if so disallow new localio. + * uuid->net is NOT a counted reference, but rcu_read_lock() ensures that + * if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe + * and if it succeeds we will have an implied reference to the net. + * + * Otherwise NFS may not have ref on NFSD and therefore cannot safely + * make 'nfs_to' calls. + */ + rcu_read_lock(); + net = rcu_dereference(uuid->net); + if (!net || !nfs_to->nfsd_serv_try_get(net)) { + rcu_read_unlock(); + return ERR_PTR(-ENXIO); + } + rcu_read_unlock(); + /* We have an implied reference to net thanks to nfsd_serv_try_get */ + localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt, + cred, nfs_fh, fmode); + if (IS_ERR(localio)) + nfs_to->nfsd_serv_put(net); + return localio; +} +EXPORT_SYMBOL_GPL(nfs_open_local_fh); + +/* + * The NFS LOCALIO code needs to call into NFSD using various symbols, + * but cannot be statically linked, because that will make the NFS + * module always depend on the NFSD module. + * + * 'nfs_to' provides NFS access to NFSD functions needed for LOCALIO, + * its lifetime is tightly coupled to the NFSD module and will always + * be available to NFS LOCALIO because any successful client<->server + * LOCALIO handshake results in a reference on the NFSD module (above), + * so NFS implicitly holds a reference to the NFSD module and its + * functions in the 'nfs_to' nfsd_localio_operations cannot disappear. + * + * If the last NFS client using LOCALIO disconnects (and its reference + * on NFSD dropped) then NFSD could be unloaded, resulting in 'nfs_to' + * functions being invalid pointers. But if NFSD isn't loaded then NFS + * will not be able to handshake with NFSD and will have no cause to + * try to call 'nfs_to' function pointers. If/when NFSD is reloaded it + * will reinitialize the 'nfs_to' function pointers and make LOCALIO + * possible. + */ +const struct nfsd_localio_operations *nfs_to; +EXPORT_SYMBOL_GPL(nfs_to); diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig index ec2ab6429e00..c0bd1509ccd4 100644 --- a/fs/nfsd/Kconfig +++ b/fs/nfsd/Kconfig @@ -7,6 +7,7 @@ config NFSD select LOCKD select SUNRPC select EXPORTFS + select NFS_COMMON select NFS_ACL_SUPPORT if NFSD_V2_ACL select NFS_ACL_SUPPORT if NFSD_V3_ACL depends on MULTIUSER diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile index b8736a82e57c..18cbd3fa7691 100644 --- a/fs/nfsd/Makefile +++ b/fs/nfsd/Makefile @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o +nfsd-$(CONFIG_NFS_LOCALIO) += localio.o diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 7bb4f2075ac5..c82d8e3e0d4f 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -1074,10 +1074,30 @@ static struct svc_export *exp_find(struct cache_detail *cd, return exp; } +/** + * check_nfsd_access - check if access to export is allowed. + * @exp: svc_export that is being accessed. + * @rqstp: svc_rqst attempting to access @exp (will be NULL for LOCALIO). + * + * Return values: + * %nfs_ok if access is granted, or + * %nfserr_wrongsec if access is denied + */ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp) { struct exp_flavor_info *f, *end = exp->ex_flavors + exp->ex_nflavors; - struct svc_xprt *xprt = rqstp->rq_xprt; + struct svc_xprt *xprt; + + /* + * If rqstp is NULL, this is a LOCALIO request which will only + * ever use a filehandle/credential pair for which access has + * been affirmed (by ACCESS or OPEN NFS requests) over the + * wire. So there is no need for further checks here. + */ + if (!rqstp) + return nfs_ok; + + xprt = rqstp->rq_xprt; if (exp->ex_xprtsec_modes & NFSEXP_XPRTSEC_NONE) { if (!test_bit(XPT_TLS_SESSION, &xprt->xpt_flags)) @@ -1098,17 +1118,17 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp) ok: /* legacy gss-only clients are always OK: */ if (exp->ex_client == rqstp->rq_gssclient) - return 0; + return nfs_ok; /* ip-address based client; check sec= export option: */ for (f = exp->ex_flavors; f < end; f++) { if (f->pseudoflavor == rqstp->rq_cred.cr_flavor) - return 0; + return nfs_ok; } /* defaults in absence of sec= options: */ if (exp->ex_nflavors == 0) { if (rqstp->rq_cred.cr_flavor == RPC_AUTH_NULL || rqstp->rq_cred.cr_flavor == RPC_AUTH_UNIX) - return 0; + return nfs_ok; } /* If the compound op contains a spo_must_allowed op, @@ -1118,7 +1138,7 @@ ok: */ if (nfsd4_spo_must_allow(rqstp)) - return 0; + return nfs_ok; denied: return nfserr_wrongsec; diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c index 24e8f1fbcebb..19bb88c7eebd 100644 --- a/fs/nfsd/filecache.c +++ b/fs/nfsd/filecache.c @@ -52,7 +52,7 @@ #define NFSD_FILE_CACHE_UP (0) /* We only care about NFSD_MAY_READ/WRITE for this cache */ -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE) +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO) static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits); static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions); @@ -390,6 +390,34 @@ nfsd_file_put(struct nfsd_file *nf) nfsd_file_free(nf); } +/** + * nfsd_file_put_local - put the reference to nfsd_file and local nfsd_serv + * @nf: nfsd_file of which to put the references + * + * First put the reference of the nfsd_file and then put the + * reference to the associated nn->nfsd_serv. + */ +void +nfsd_file_put_local(struct nfsd_file *nf) +{ + struct net *net = nf->nf_net; + + nfsd_file_put(nf); + nfsd_serv_put(net); +} + +/** + * nfsd_file_file - get the backing file of an nfsd_file + * @nf: nfsd_file of which to access the backing file. + * + * Return backing file for @nf. + */ +struct file * +nfsd_file_file(struct nfsd_file *nf) +{ + return nf->nf_file; +} + static void nfsd_file_dispose_list(struct list_head *dispose) { @@ -982,12 +1010,14 @@ nfsd_file_is_cached(struct inode *inode) } static __be32 -nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, +nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net, + struct svc_cred *cred, + struct auth_domain *client, + struct svc_fh *fhp, unsigned int may_flags, struct file *file, struct nfsd_file **pnf, bool want_gc) { unsigned char need = may_flags & NFSD_FILE_MAY_MASK; - struct net *net = SVC_NET(rqstp); struct nfsd_file *new, *nf; bool stale_retry = true; bool open_retry = true; @@ -996,8 +1026,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, int ret; retry: - status = fh_verify(rqstp, fhp, S_IFREG, - may_flags|NFSD_MAY_OWNER_OVERRIDE); + if (rqstp) { + status = fh_verify(rqstp, fhp, S_IFREG, + may_flags|NFSD_MAY_OWNER_OVERRIDE); + } else { + status = fh_verify_local(net, cred, client, fhp, S_IFREG, + may_flags|NFSD_MAY_OWNER_OVERRIDE); + } if (status != nfs_ok) return status; inode = d_inode(fhp->fh_dentry); @@ -1143,7 +1178,8 @@ __be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true); + return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL, + fhp, may_flags, NULL, pnf, true); } /** @@ -1167,7 +1203,55 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false); + return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL, + fhp, may_flags, NULL, pnf, false); +} + +/** + * nfsd_file_acquire_local - Get a struct nfsd_file with an open file for localio + * @net: The network namespace in which to perform a lookup + * @cred: the user credential with which to validate access + * @client: the auth_domain for LOCALIO lookup + * @fhp: the NFS filehandle of the file to be opened + * @may_flags: NFSD_MAY_ settings for the file + * @pnf: OUT: new or found "struct nfsd_file" object + * + * This file lookup interface provide access to a file given the + * filehandle and credential. No connection-based authorisation + * is performed and in that way it is quite different to other + * file access mediated by nfsd. It allows a kernel module such as the NFS + * client to reach across network and filesystem namespaces to access + * a file. The security implications of this should be carefully + * considered before use. + * + * The nfsd_file object returned by this API is reference-counted + * and garbage-collected. The object is retained for a few + * seconds after the final nfsd_file_put() in case the caller + * wants to re-use it. + * + * Return values: + * %nfs_ok - @pnf points to an nfsd_file with its reference + * count boosted. + * + * On error, an nfsstat value in network byte order is returned. + */ +__be32 +nfsd_file_acquire_local(struct net *net, struct svc_cred *cred, + struct auth_domain *client, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf) +{ + /* + * Save creds before calling nfsd_file_do_acquire() (which calls + * nfsd_setuser). Important because caller (LOCALIO) is from + * client context. + */ + const struct cred *save_cred = get_current_cred(); + __be32 beres; + + beres = nfsd_file_do_acquire(NULL, net, cred, client, + fhp, may_flags, NULL, pnf, true); + revert_creds(save_cred); + return beres; } /** @@ -1193,7 +1277,8 @@ nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct file *file, struct nfsd_file **pnf) { - return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false); + return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL, + fhp, may_flags, file, pnf, false); } /* diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h index 3fbec24eea6c..cadf3c2689c4 100644 --- a/fs/nfsd/filecache.h +++ b/fs/nfsd/filecache.h @@ -55,7 +55,9 @@ void nfsd_file_cache_shutdown(void); int nfsd_file_cache_start_net(struct net *net); void nfsd_file_cache_shutdown_net(struct net *net); void nfsd_file_put(struct nfsd_file *nf); +void nfsd_file_put_local(struct nfsd_file *nf); struct nfsd_file *nfsd_file_get(struct nfsd_file *nf); +struct file *nfsd_file_file(struct nfsd_file *nf); void nfsd_file_close_inode_sync(struct inode *inode); void nfsd_file_net_dispose(struct nfsd_net *nn); bool nfsd_file_is_cached(struct inode *inode); @@ -66,5 +68,8 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int may_flags, struct file *file, struct nfsd_file **nfp); +__be32 nfsd_file_acquire_local(struct net *net, struct svc_cred *cred, + struct auth_domain *client, struct svc_fh *fhp, + unsigned int may_flags, struct nfsd_file **pnf); int nfsd_file_cache_stats_show(struct seq_file *m, void *v); #endif /* _FS_NFSD_FILECACHE_H */ diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c new file mode 100644 index 000000000000..291e9c69cae4 --- /dev/null +++ b/fs/nfsd/localio.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * NFS server support for local clients to bypass network stack + * + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> + * Copyright (C) 2024 NeilBrown <neilb@suse.de> + */ + +#include <linux/exportfs.h> +#include <linux/sunrpc/svcauth.h> +#include <linux/sunrpc/clnt.h> +#include <linux/nfs.h> +#include <linux/nfs_common.h> +#include <linux/nfslocalio.h> +#include <linux/nfs_fs.h> +#include <linux/nfs_xdr.h> +#include <linux/string.h> + +#include "nfsd.h" +#include "vfs.h" +#include "netns.h" +#include "filecache.h" +#include "cache.h" + +static const struct nfsd_localio_operations nfsd_localio_ops = { + .nfsd_serv_try_get = nfsd_serv_try_get, + .nfsd_serv_put = nfsd_serv_put, + .nfsd_open_local_fh = nfsd_open_local_fh, + .nfsd_file_put_local = nfsd_file_put_local, + .nfsd_file_file = nfsd_file_file, +}; + +void nfsd_localio_ops_init(void) +{ + nfs_to = &nfsd_localio_ops; +} + +/** + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file + * + * @net: 'struct net' to get the proper nfsd_net required for LOCALIO access + * @dom: 'struct auth_domain' required for LOCALIO access + * @rpc_clnt: rpc_clnt that the client established + * @cred: cred that the client established + * @nfs_fh: filehandle to lookup + * @fmode: fmode_t to use for open + * + * This function maps a local fh to a path on a local filesystem. + * This is useful when the nfs client has the local server mounted - it can + * avoid all the NFS overhead with reads, writes and commits. + * + * On successful return, returned nfsd_file will have its nf_net member + * set. Caller (NFS client) is responsible for calling nfsd_serv_put and + * nfsd_file_put (via nfs_to->nfsd_file_put_local). + */ +struct nfsd_file * +nfsd_open_local_fh(struct net *net, struct auth_domain *dom, + struct rpc_clnt *rpc_clnt, const struct cred *cred, + const struct nfs_fh *nfs_fh, const fmode_t fmode) +{ + int mayflags = NFSD_MAY_LOCALIO; + struct svc_cred rq_cred; + struct svc_fh fh; + struct nfsd_file *localio; + __be32 beres; + + if (nfs_fh->size > NFS4_FHSIZE) + return ERR_PTR(-EINVAL); + + /* nfs_fh -> svc_fh */ + fh_init(&fh, NFS4_FHSIZE); + fh.fh_handle.fh_size = nfs_fh->size; + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size); + + if (fmode & FMODE_READ) + mayflags |= NFSD_MAY_READ; + if (fmode & FMODE_WRITE) + mayflags |= NFSD_MAY_WRITE; + + svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred); + + beres = nfsd_file_acquire_local(net, &rq_cred, dom, + &fh, mayflags, &localio); + if (beres) + localio = ERR_PTR(nfs_stat_to_errno(be32_to_cpu(beres))); + + fh_put(&fh); + if (rq_cred.cr_group_info) + put_group_info(rq_cred.cr_group_info); + + return localio; +} +EXPORT_SYMBOL_GPL(nfsd_open_local_fh); + +/* + * UUID_IS_LOCAL XDR functions + */ + +static __be32 localio_proc_null(struct svc_rqst *rqstp) +{ + return rpc_success; +} + +struct localio_uuidarg { + uuid_t uuid; +}; + +static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp) +{ + struct localio_uuidarg *argp = rqstp->rq_argp; + struct net *net = SVC_NET(rqstp); + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + nfs_uuid_is_local(&argp->uuid, &nn->local_clients, + net, rqstp->rq_client, THIS_MODULE); + + return rpc_success; +} + +static bool localio_decode_uuidarg(struct svc_rqst *rqstp, + struct xdr_stream *xdr) +{ + struct localio_uuidarg *argp = rqstp->rq_argp; + u8 uuid[UUID_SIZE]; + + if (decode_opaque_fixed(xdr, uuid, UUID_SIZE)) + return false; + import_uuid(&argp->uuid, uuid); + + return true; +} + +static const struct svc_procedure localio_procedures1[] = { + [LOCALIOPROC_NULL] = { + .pc_func = localio_proc_null, + .pc_decode = nfssvc_decode_voidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct nfsd_voidargs), + .pc_ressize = sizeof(struct nfsd_voidres), + .pc_cachetype = RC_NOCACHE, + .pc_xdrressize = 0, + .pc_name = "NULL", + }, + [LOCALIOPROC_UUID_IS_LOCAL] = { + .pc_func = localio_proc_uuid_is_local, + .pc_decode = localio_decode_uuidarg, + .pc_encode = nfssvc_encode_voidres, + .pc_argsize = sizeof(struct localio_uuidarg), + .pc_argzero = sizeof(struct localio_uuidarg), + .pc_ressize = sizeof(struct nfsd_voidres), + .pc_cachetype = RC_NOCACHE, + .pc_name = "UUID_IS_LOCAL", + }, +}; + +#define LOCALIO_NR_PROCEDURES ARRAY_SIZE(localio_procedures1) +static DEFINE_PER_CPU_ALIGNED(unsigned long, + localio_count[LOCALIO_NR_PROCEDURES]); +const struct svc_version localio_version1 = { + .vs_vers = 1, + .vs_nproc = LOCALIO_NR_PROCEDURES, + .vs_proc = localio_procedures1, + .vs_dispatch = nfsd_dispatch, + .vs_count = localio_count, + .vs_xdrsize = XDR_QUADLEN(UUID_SIZE), + .vs_hidden = true, +}; diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 37b8bfdcfeea..26f7b34d1a03 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -13,6 +13,7 @@ #include <linux/filelock.h> #include <linux/nfs4.h> #include <linux/percpu_counter.h> +#include <linux/percpu-refcount.h> #include <linux/siphash.h> #include <linux/sunrpc/stats.h> @@ -139,7 +140,9 @@ struct nfsd_net { struct svc_info nfsd_info; #define nfsd_serv nfsd_info.serv - + struct percpu_ref nfsd_serv_ref; + struct completion nfsd_serv_confirm_done; + struct completion nfsd_serv_free_done; /* * clientid and stateid data for construction of net unique COPY @@ -214,6 +217,10 @@ struct nfsd_net { /* last time an admin-revoke happened for NFSv4.0 */ time64_t nfs40_last_revoke; +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + /* Local clients to be invalidated when net is shut down */ + struct list_head local_clients; +#endif }; /* Simple check to find out if a given net was properly initialized */ @@ -222,6 +229,9 @@ struct nfsd_net { extern bool nfsd_support_version(int vers); extern unsigned int nfsd_net_id; +bool nfsd_serv_try_get(struct net *net); +void nfsd_serv_put(struct net *net); + void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn); void nfsd_reset_write_verifier(struct nfsd_net *nn); #endif /* __NFSD_NETNS_H__ */ diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 1c9e5b4bcb0a..3adbc05ebaac 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -18,6 +18,7 @@ #include <linux/sunrpc/svc.h> #include <linux/module.h> #include <linux/fsnotify.h> +#include <linux/nfslocalio.h> #include "idmap.h" #include "nfsd.h" @@ -2246,7 +2247,7 @@ static __net_init int nfsd_net_init(struct net *net) if (retval) goto out_repcache_error; memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats)); - nn->nfsd_svcstats.program = &nfsd_program; + nn->nfsd_svcstats.program = &nfsd_programs[0]; for (i = 0; i < sizeof(nn->nfsd_versions); i++) nn->nfsd_versions[i] = nfsd_support_version(i); for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++) @@ -2257,7 +2258,9 @@ static __net_init int nfsd_net_init(struct net *net) get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key)); seqlock_init(&nn->writeverf_lock); nfsd_proc_stat_init(net); - +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + INIT_LIST_HEAD(&nn->local_clients); +#endif return 0; out_repcache_error: @@ -2268,6 +2271,22 @@ out_export_error: return retval; } +#if IS_ENABLED(CONFIG_NFS_LOCALIO) +/** + * nfsd_net_pre_exit - Disconnect localio clients from net namespace + * @net: a network namespace that is about to be destroyed + * + * This invalidated ->net pointers held by localio clients + * while they can still safely access nn->counter. + */ +static __net_exit void nfsd_net_pre_exit(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + nfs_uuid_invalidate_clients(&nn->local_clients); +} +#endif + /** * nfsd_net_exit - Release the nfsd_net portion of a net namespace * @net: a network namespace that is about to be destroyed @@ -2285,6 +2304,9 @@ static __net_exit void nfsd_net_exit(struct net *net) static struct pernet_operations nfsd_net_ops = { .init = nfsd_net_init, +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + .pre_exit = nfsd_net_pre_exit, +#endif .exit = nfsd_net_exit, .id = &nfsd_net_id, .size = sizeof(struct nfsd_net), @@ -2322,6 +2344,7 @@ static int __init init_nfsd(void) retval = genl_register_family(&nfsd_nl_family); if (retval) goto out_free_all; + nfsd_localio_ops_init(); return 0; out_free_all: diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 4ccbf014a2c7..4b56ba1e8e48 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -85,7 +85,7 @@ struct nfsd_genl_rqstp { u32 rq_opnum[NFSD_MAX_OPS_PER_COMPOUND]; }; -extern struct svc_program nfsd_program; +extern struct svc_program nfsd_programs[]; extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4; extern struct mutex nfsd_mutex; extern spinlock_t nfsd_drc_lock; @@ -146,6 +146,10 @@ extern const struct svc_version nfsd_acl_version3; #endif #endif +#if IS_ENABLED(CONFIG_NFS_LOCALIO) +extern const struct svc_version localio_version1; +#endif + struct nfsd_net; enum vers_op {NFSD_SET, NFSD_CLEAR, NFSD_TEST, NFSD_AVAIL }; diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 50d23d56f403..40ad58a6a036 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -87,23 +87,24 @@ nfsd_mode_check(struct dentry *dentry, umode_t requested) return nfserr_wrong_type; } -static bool nfsd_originating_port_ok(struct svc_rqst *rqstp, int flags) +static bool nfsd_originating_port_ok(struct svc_rqst *rqstp, + struct svc_cred *cred, + struct svc_export *exp) { - if (flags & NFSEXP_INSECURE_PORT) + if (nfsexp_flags(cred, exp) & NFSEXP_INSECURE_PORT) return true; /* We don't require gss requests to use low ports: */ - if (rqstp->rq_cred.cr_flavor >= RPC_AUTH_GSS) + if (cred->cr_flavor >= RPC_AUTH_GSS) return true; return test_bit(RQ_SECURE, &rqstp->rq_flags); } static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp, + struct svc_cred *cred, struct svc_export *exp) { - int flags = nfsexp_flags(&rqstp->rq_cred, exp); - /* Check if the request originated from a secure port. */ - if (!nfsd_originating_port_ok(rqstp, flags)) { + if (rqstp && !nfsd_originating_port_ok(rqstp, cred, exp)) { RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]); dprintk("nfsd: request from insecure port %s!\n", svc_print_addr(rqstp, buf, sizeof(buf))); @@ -111,7 +112,7 @@ static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp, } /* Set user creds for this exportpoint */ - return nfserrno(nfsd_setuser(&rqstp->rq_cred, exp)); + return nfserrno(nfsd_setuser(cred, exp)); } static inline __be32 check_pseudo_root(struct dentry *dentry, @@ -141,7 +142,11 @@ static inline __be32 check_pseudo_root(struct dentry *dentry, * dentry. On success, the results are used to set fh_export and * fh_dentry. */ -static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) +static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net, + struct svc_cred *cred, + struct auth_domain *client, + struct auth_domain *gssclient, + struct svc_fh *fhp) { struct knfsd_fh *fh = &fhp->fh_handle; struct fid *fid = NULL; @@ -183,8 +188,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) data_left -= len; if (data_left < 0) return error; - exp = rqst_exp_find(&rqstp->rq_chandle, SVC_NET(rqstp), - rqstp->rq_client, rqstp->rq_gssclient, + exp = rqst_exp_find(rqstp ? &rqstp->rq_chandle : NULL, + net, client, gssclient, fh->fh_fsid_type, fh->fh_fsid); fid = (struct fid *)(fh->fh_fsid + len); @@ -219,7 +224,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) put_cred(override_creds(new)); put_cred(new); } else { - error = nfsd_setuser_and_check_port(rqstp, exp); + error = nfsd_setuser_and_check_port(rqstp, cred, exp); if (error) goto out; } @@ -266,20 +271,20 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) fhp->fh_dentry = dentry; fhp->fh_export = exp; - switch (rqstp->rq_vers) { - case 4: + switch (fhp->fh_maxsize) { + case NFS4_FHSIZE: if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR) fhp->fh_no_atomic_attr = true; fhp->fh_64bit_cookies = true; break; - case 3: + case NFS3_FHSIZE: if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC) fhp->fh_no_wcc = true; fhp->fh_64bit_cookies = true; if (exp->ex_flags & NFSEXP_V4ROOT) goto out; break; - case 2: + case NFS_FHSIZE: fhp->fh_no_wcc = true; if (EX_WGATHER(exp)) fhp->fh_use_wgather = true; @@ -294,42 +299,33 @@ out: } /** - * fh_verify - filehandle lookup and access checking - * @rqstp: pointer to current rpc request + * __fh_verify - filehandle lookup and access checking + * @rqstp: RPC transaction context, or NULL + * @net: net namespace in which to perform the export lookup + * @cred: RPC user credential + * @client: RPC auth domain + * @gssclient: RPC GSS auth domain, or NULL * @fhp: filehandle to be verified * @type: expected type of object pointed to by filehandle * @access: type of access needed to object * - * Look up a dentry from the on-the-wire filehandle, check the client's - * access to the export, and set the current task's credentials. - * - * Regardless of success or failure of fh_verify(), fh_put() should be - * called on @fhp when the caller is finished with the filehandle. - * - * fh_verify() may be called multiple times on a given filehandle, for - * example, when processing an NFSv4 compound. The first call will look - * up a dentry using the on-the-wire filehandle. Subsequent calls will - * skip the lookup and just perform the other checks and possibly change - * the current task's credentials. - * - * @type specifies the type of object expected using one of the S_IF* - * constants defined in include/linux/stat.h. The caller may use zero - * to indicate that it doesn't care, or a negative integer to indicate - * that it expects something not of the given type. - * - * @access is formed from the NFSD_MAY_* constants defined in - * fs/nfsd/vfs.h. + * See fh_verify() for further descriptions of @fhp, @type, and @access. */ -__be32 -fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) +static __be32 +__fh_verify(struct svc_rqst *rqstp, + struct net *net, struct svc_cred *cred, + struct auth_domain *client, + struct auth_domain *gssclient, + struct svc_fh *fhp, umode_t type, int access) { - struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id); + struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct svc_export *exp = NULL; struct dentry *dentry; __be32 error; if (!fhp->fh_dentry) { - error = nfsd_set_fh_dentry(rqstp, fhp); + error = nfsd_set_fh_dentry(rqstp, net, cred, client, + gssclient, fhp); if (error) goto out; } @@ -358,7 +354,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) if (error) goto out; - error = nfsd_setuser_and_check_port(rqstp, exp); + error = nfsd_setuser_and_check_port(rqstp, cred, exp); if (error) goto out; @@ -388,7 +384,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) skip_pseudoflavor_check: /* Finally, check access permissions. */ - error = nfsd_permission(&rqstp->rq_cred, exp, dentry, access); + error = nfsd_permission(cred, exp, dentry, access); out: trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error); if (error == nfserr_stale) @@ -396,6 +392,63 @@ out: return error; } +/** + * fh_verify_local - filehandle lookup and access checking + * @net: net namespace in which to perform the export lookup + * @cred: RPC user credential + * @client: RPC auth domain + * @fhp: filehandle to be verified + * @type: expected type of object pointed to by filehandle + * @access: type of access needed to object + * + * This API can be used by callers who do not have an RPC + * transaction context (ie are not running in an nfsd thread). + * + * See fh_verify() for further descriptions of @fhp, @type, and @access. + */ +__be32 +fh_verify_local(struct net *net, struct svc_cred *cred, + struct auth_domain *client, struct svc_fh *fhp, + umode_t type, int access) +{ + return __fh_verify(NULL, net, cred, client, NULL, + fhp, type, access); +} + +/** + * fh_verify - filehandle lookup and access checking + * @rqstp: pointer to current rpc request + * @fhp: filehandle to be verified + * @type: expected type of object pointed to by filehandle + * @access: type of access needed to object + * + * Look up a dentry from the on-the-wire filehandle, check the client's + * access to the export, and set the current task's credentials. + * + * Regardless of success or failure of fh_verify(), fh_put() should be + * called on @fhp when the caller is finished with the filehandle. + * + * fh_verify() may be called multiple times on a given filehandle, for + * example, when processing an NFSv4 compound. The first call will look + * up a dentry using the on-the-wire filehandle. Subsequent calls will + * skip the lookup and just perform the other checks and possibly change + * the current task's credentials. + * + * @type specifies the type of object expected using one of the S_IF* + * constants defined in include/linux/stat.h. The caller may use zero + * to indicate that it doesn't care, or a negative integer to indicate + * that it expects something not of the given type. + * + * @access is formed from the NFSD_MAY_* constants defined in + * fs/nfsd/vfs.h. + */ +__be32 +fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access) +{ + return __fh_verify(rqstp, SVC_NET(rqstp), &rqstp->rq_cred, + rqstp->rq_client, rqstp->rq_gssclient, + fhp, type, access); +} /* * Compose a file handle for an NFS reply. diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 8d46e203d139..5b7394801dc4 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -217,6 +217,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp); * Function prototypes */ __be32 fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int); +__be32 fh_verify_local(struct net *, struct svc_cred *, struct auth_domain *, + struct svc_fh *, umode_t, int); __be32 fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *); __be32 fh_update(struct svc_fh *); void fh_put(struct svc_fh *); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index defc430f912f..e236135ddc63 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -19,6 +19,7 @@ #include <linux/sunrpc/svc_xprt.h> #include <linux/lockd/bind.h> #include <linux/nfsacl.h> +#include <linux/nfslocalio.h> #include <linux/seq_file.h> #include <linux/inetdevice.h> #include <net/addrconf.h> @@ -35,7 +36,6 @@ #define NFSDDBG_FACILITY NFSDDBG_SVC atomic_t nfsd_th_cnt = ATOMIC_INIT(0); -extern struct svc_program nfsd_program; static int nfsd(void *vrqstp); #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) static int nfsd_acl_rpcbind_set(struct net *, @@ -80,6 +80,15 @@ DEFINE_SPINLOCK(nfsd_drc_lock); unsigned long nfsd_drc_max_mem; unsigned long nfsd_drc_mem_used; +#if IS_ENABLED(CONFIG_NFS_LOCALIO) +static const struct svc_version *localio_versions[] = { + [1] = &localio_version1, +}; + +#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions) + +#endif /* CONFIG_NFS_LOCALIO */ + #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) static const struct svc_version *nfsd_acl_version[] = { # if defined(CONFIG_NFSD_V2_ACL) @@ -90,20 +99,9 @@ static const struct svc_version *nfsd_acl_version[] = { # endif }; -#define NFSD_ACL_MINVERS 2 +#define NFSD_ACL_MINVERS 2 #define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version) -static struct svc_program nfsd_acl_program = { - .pg_prog = NFS_ACL_PROGRAM, - .pg_nvers = NFSD_ACL_NRVERS, - .pg_vers = nfsd_acl_version, - .pg_name = "nfsacl", - .pg_class = "nfsd", - .pg_authenticate = &svc_set_client, - .pg_init_request = nfsd_acl_init_request, - .pg_rpcbind_set = nfsd_acl_rpcbind_set, -}; - #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */ static const struct svc_version *nfsd_version[NFSD_MAXVERS+1] = { @@ -116,18 +114,41 @@ static const struct svc_version *nfsd_version[NFSD_MAXVERS+1] = { #endif }; -struct svc_program nfsd_program = { -#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) - .pg_next = &nfsd_acl_program, -#endif +struct svc_program nfsd_programs[] = { + { .pg_prog = NFS_PROGRAM, /* program number */ .pg_nvers = NFSD_MAXVERS+1, /* nr of entries in nfsd_version */ .pg_vers = nfsd_version, /* version table */ .pg_name = "nfsd", /* program name */ .pg_class = "nfsd", /* authentication class */ - .pg_authenticate = &svc_set_client, /* export authentication */ + .pg_authenticate = svc_set_client, /* export authentication */ .pg_init_request = nfsd_init_request, .pg_rpcbind_set = nfsd_rpcbind_set, + }, +#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) + { + .pg_prog = NFS_ACL_PROGRAM, + .pg_nvers = NFSD_ACL_NRVERS, + .pg_vers = nfsd_acl_version, + .pg_name = "nfsacl", + .pg_class = "nfsd", + .pg_authenticate = svc_set_client, + .pg_init_request = nfsd_acl_init_request, + .pg_rpcbind_set = nfsd_acl_rpcbind_set, + }, +#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */ +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + { + .pg_prog = NFS_LOCALIO_PROGRAM, + .pg_nvers = NFSD_LOCALIO_NRVERS, + .pg_vers = localio_versions, + .pg_name = "nfslocalio", + .pg_class = "nfsd", + .pg_authenticate = svc_set_client, + .pg_init_request = svc_generic_init_request, + .pg_rpcbind_set = svc_generic_rpcbind_set, + } +#endif /* CONFIG_NFS_LOCALIO */ }; bool nfsd_support_version(int vers) @@ -193,6 +214,34 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change return 0; } +bool nfsd_serv_try_get(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + return (nn && percpu_ref_tryget_live(&nn->nfsd_serv_ref)); +} + +void nfsd_serv_put(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + + percpu_ref_put(&nn->nfsd_serv_ref); +} + +static void nfsd_serv_done(struct percpu_ref *ref) +{ + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); + + complete(&nn->nfsd_serv_confirm_done); +} + +static void nfsd_serv_free(struct percpu_ref *ref) +{ + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); + + complete(&nn->nfsd_serv_free_done); +} + /* * Maximum number of nfsd processes */ @@ -392,6 +441,7 @@ static void nfsd_shutdown_net(struct net *net) lockd_down(net); nn->lockd_up = false; } + percpu_ref_exit(&nn->nfsd_serv_ref); nn->nfsd_net_up = false; nfsd_shutdown_generic(); } @@ -471,6 +521,13 @@ void nfsd_destroy_serv(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id); struct svc_serv *serv = nn->nfsd_serv; + lockdep_assert_held(&nfsd_mutex); + + percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done); + wait_for_completion(&nn->nfsd_serv_confirm_done); + wait_for_completion(&nn->nfsd_serv_free_done); + /* percpu_ref_exit is called in nfsd_shutdown_net */ + spin_lock(&nfsd_notifier_lock); nn->nfsd_serv = NULL; spin_unlock(&nfsd_notifier_lock); @@ -595,10 +652,18 @@ int nfsd_create_serv(struct net *net) if (nn->nfsd_serv) return 0; + error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free, + 0, GFP_KERNEL); + if (error) + return error; + init_completion(&nn->nfsd_serv_free_done); + init_completion(&nn->nfsd_serv_confirm_done); + if (nfsd_max_blksize == 0) nfsd_max_blksize = nfsd_get_default_max_blksize(); nfsd_reset_versions(nn); - serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats, + serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs), + &nn->nfsd_svcstats, nfsd_max_blksize, nfsd); if (serv == NULL) return -ENOMEM; @@ -905,7 +970,7 @@ nfsd(void *vrqstp) } /** - * nfsd_dispatch - Process an NFS or NFSACL Request + * nfsd_dispatch - Process an NFS or NFSACL or LOCALIO Request * @rqstp: incoming request * * This RPC dispatcher integrates the NFS server's duplicate reply cache. diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h index 7ab66497e261..c625966cfcf3 100644 --- a/fs/nfsd/trace.h +++ b/fs/nfsd/trace.h @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode); { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \ { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \ { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \ - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }) + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \ + { NFSD_MAY_LOCALIO, "LOCALIO" }) TRACE_EVENT(nfsd_compound, TP_PROTO( @@ -193,7 +194,7 @@ TRACE_EVENT(nfsd_compound_encode_err, { S_IFIFO, "FIFO" }, \ { S_IFSOCK, "SOCK" }) -TRACE_EVENT(nfsd_fh_verify, +TRACE_EVENT_CONDITION(nfsd_fh_verify, TP_PROTO( const struct svc_rqst *rqstp, const struct svc_fh *fhp, @@ -201,6 +202,7 @@ TRACE_EVENT(nfsd_fh_verify, int access ), TP_ARGS(rqstp, fhp, type, access), + TP_CONDITION(rqstp != NULL), TP_STRUCT__entry( __field(unsigned int, netns_ino) __sockaddr(server, rqstp->rq_xprt->xpt_remotelen) @@ -239,7 +241,7 @@ TRACE_EVENT_CONDITION(nfsd_fh_verify_err, __be32 error ), TP_ARGS(rqstp, fhp, type, access, error), - TP_CONDITION(error), + TP_CONDITION(rqstp != NULL && error), TP_STRUCT__entry( __field(unsigned int, netns_ino) __sockaddr(server, rqstp->rq_xprt->xpt_remotelen) @@ -295,12 +297,13 @@ DECLARE_EVENT_CLASS(nfsd_fh_err_class, __entry->status) ) -#define DEFINE_NFSD_FH_ERR_EVENT(name) \ -DEFINE_EVENT(nfsd_fh_err_class, nfsd_##name, \ - TP_PROTO(struct svc_rqst *rqstp, \ - struct svc_fh *fhp, \ - int status), \ - TP_ARGS(rqstp, fhp, status)) +#define DEFINE_NFSD_FH_ERR_EVENT(name) \ +DEFINE_EVENT_CONDITION(nfsd_fh_err_class, nfsd_##name, \ + TP_PROTO(struct svc_rqst *rqstp, \ + struct svc_fh *fhp, \ + int status), \ + TP_ARGS(rqstp, fhp, status), \ + TP_CONDITION(rqstp != NULL)) DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badexport); DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badhandle); diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h index 01947561d375..3ff146522556 100644 --- a/fs/nfsd/vfs.h +++ b/fs/nfsd/vfs.h @@ -33,6 +33,8 @@ #define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */ +#define NFSD_MAY_LOCALIO 0x2000 /* for tracing, reflects when localio used */ + #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE) #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC) diff --git a/include/linux/nfs.h b/include/linux/nfs.h index ceb70a926b95..9ad727ddfedb 100644 --- a/include/linux/nfs.h +++ b/include/linux/nfs.h @@ -8,11 +8,20 @@ #ifndef _LINUX_NFS_H #define _LINUX_NFS_H +#include <linux/cred.h> +#include <linux/sunrpc/auth.h> #include <linux/sunrpc/msg_prot.h> #include <linux/string.h> #include <linux/crc32.h> #include <uapi/linux/nfs.h> +/* The LOCALIO program is entirely private to Linux and is + * NOT part of the uapi. + */ +#define NFS_LOCALIO_PROGRAM 400122 +#define LOCALIOPROC_NULL 0 +#define LOCALIOPROC_UUID_IS_LOCAL 1 + /* * This is the kernel NFS client file handle representation */ diff --git a/include/linux/nfs_common.h b/include/linux/nfs_common.h new file mode 100644 index 000000000000..5fc02df88252 --- /dev/null +++ b/include/linux/nfs_common.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * This file contains constants and methods used by both NFS client and server. + */ +#ifndef _LINUX_NFS_COMMON_H +#define _LINUX_NFS_COMMON_H + +#include <linux/errno.h> +#include <uapi/linux/nfs.h> + +/* Mapping from NFS error code to "errno" error code. */ +#define errno_NFSERR_IO EIO + +int nfs_stat_to_errno(enum nfs_stat status); +int nfs4_stat_to_errno(int stat); + +#endif /* _LINUX_NFS_COMMON_H */ diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 1df86ab98c77..853df3fcd4c2 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -8,6 +8,7 @@ #include <linux/wait.h> #include <linux/nfs_xdr.h> #include <linux/sunrpc/xprt.h> +#include <linux/nfslocalio.h> #include <linux/atomic.h> #include <linux/refcount.h> @@ -49,6 +50,7 @@ struct nfs_client { #define NFS_CS_DS 7 /* - Server is a DS */ #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */ #define NFS_CS_PNFS 9 /* - Server used for pnfs */ +#define NFS_CS_LOCAL_IO 10 /* - client is local */ struct sockaddr_storage cl_addr; /* server identifier */ size_t cl_addrlen; char * cl_hostname; /* hostname of server */ @@ -125,6 +127,13 @@ struct nfs_client { struct net *cl_net; struct list_head pending_cb_stateids; struct rcu_head rcu; + +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + struct timespec64 cl_nfssvc_boot; + seqlock_t cl_boot_lock; + nfs_uuid_t cl_uuid; + spinlock_t cl_localio_lock; +#endif /* CONFIG_NFS_LOCALIO */ }; /* @@ -158,6 +167,7 @@ struct nfs_server { #define NFS_MOUNT_WRITE_WAIT 0x02000000 #define NFS_MOUNT_TRUNK_DISCOVERY 0x04000000 #define NFS_MOUNT_SHUTDOWN 0x08000000 +#define NFS_MOUNT_NO_ALIGNWRITE 0x10000000 unsigned int fattr_valid; /* Valid attributes */ unsigned int caps; /* server capabilities */ @@ -234,8 +244,7 @@ struct nfs_server { /* the following fields are protected by nfs_client->cl_lock */ struct rb_root state_owners; #endif - struct ida openowner_id; - struct ida lockowner_id; + atomic64_t owner_ctr; struct list_head state_owners_lru; struct list_head layouts; struct list_head delegations; diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 45623af3e7b8..12d8e47bc5a3 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -446,7 +446,7 @@ struct nfs42_clone_res { struct stateowner_id { __u64 create_time; - __u32 uniquifier; + __u64 uniquifier; }; struct nfs4_open_delegation { @@ -1854,6 +1854,24 @@ struct nfs_rpc_ops { }; /* + * Helper functions used by NFS client and/or server + */ +static inline void encode_opaque_fixed(struct xdr_stream *xdr, + const void *buf, size_t len) +{ + WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0); +} + +static inline int decode_opaque_fixed(struct xdr_stream *xdr, + void *buf, size_t len) +{ + ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len); + if (unlikely(ret < 0)) + return -EIO; + return 0; +} + +/* * Function vectors etc. for the NFS client */ extern const struct nfs_rpc_ops nfs_v2_clientops; @@ -1866,4 +1884,4 @@ extern const struct rpc_version nfs_version4; extern const struct rpc_version nfsacl_version3; extern const struct rpc_program nfsacl_program; -#endif +#endif /* _LINUX_NFS_XDR_H */ diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h new file mode 100644 index 000000000000..b353abe00357 --- /dev/null +++ b/include/linux/nfslocalio.h @@ -0,0 +1,74 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> + * Copyright (C) 2024 NeilBrown <neilb@suse.de> + */ +#ifndef __LINUX_NFSLOCALIO_H +#define __LINUX_NFSLOCALIO_H + +/* nfsd_file structure is purposely kept opaque to NFS client */ +struct nfsd_file; + +#if IS_ENABLED(CONFIG_NFS_LOCALIO) + +#include <linux/module.h> +#include <linux/list.h> +#include <linux/uuid.h> +#include <linux/sunrpc/clnt.h> +#include <linux/sunrpc/svcauth.h> +#include <linux/nfs.h> +#include <net/net_namespace.h> + +/* + * Useful to allow a client to negotiate if localio + * possible with its server. + * + * See Documentation/filesystems/nfs/localio.rst for more detail. + */ +typedef struct { + uuid_t uuid; + struct list_head list; + struct net __rcu *net; /* nfsd's network namespace */ + struct auth_domain *dom; /* auth_domain for localio */ +} nfs_uuid_t; + +void nfs_uuid_begin(nfs_uuid_t *); +void nfs_uuid_end(nfs_uuid_t *); +void nfs_uuid_is_local(const uuid_t *, struct list_head *, + struct net *, struct auth_domain *, struct module *); +void nfs_uuid_invalidate_clients(struct list_head *list); +void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid); + +/* localio needs to map filehandle -> struct nfsd_file */ +extern struct nfsd_file * +nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *, + const struct cred *, const struct nfs_fh *, + const fmode_t) __must_hold(rcu); + +struct nfsd_localio_operations { + bool (*nfsd_serv_try_get)(struct net *); + void (*nfsd_serv_put)(struct net *); + struct nfsd_file *(*nfsd_open_local_fh)(struct net *, + struct auth_domain *, + struct rpc_clnt *, + const struct cred *, + const struct nfs_fh *, + const fmode_t); + void (*nfsd_file_put_local)(struct nfsd_file *); + struct file *(*nfsd_file_file)(struct nfsd_file *); +} ____cacheline_aligned; + +extern void nfsd_localio_ops_init(void); +extern const struct nfsd_localio_operations *nfs_to; + +struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *, + struct rpc_clnt *, const struct cred *, + const struct nfs_fh *, const fmode_t); + +#else /* CONFIG_NFS_LOCALIO */ +static inline void nfsd_localio_ops_init(void) +{ +} +#endif /* CONFIG_NFS_LOCALIO */ + +#endif /* __LINUX_NFSLOCALIO_H */ diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 0c77ba488bba..fec1e8a1570c 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -151,13 +151,15 @@ struct rpc_task_setup { #define RPC_WAS_SENT(t) ((t)->tk_flags & RPC_TASK_SENT) #define RPC_IS_MOVEABLE(t) ((t)->tk_flags & RPC_TASK_MOVEABLE) -#define RPC_TASK_RUNNING 0 -#define RPC_TASK_QUEUED 1 -#define RPC_TASK_ACTIVE 2 -#define RPC_TASK_NEED_XMIT 3 -#define RPC_TASK_NEED_RECV 4 -#define RPC_TASK_MSG_PIN_WAIT 5 -#define RPC_TASK_SIGNALLED 6 +enum { + RPC_TASK_RUNNING, + RPC_TASK_QUEUED, + RPC_TASK_ACTIVE, + RPC_TASK_NEED_XMIT, + RPC_TASK_NEED_RECV, + RPC_TASK_MSG_PIN_WAIT, + RPC_TASK_SIGNALLED, +}; #define rpc_test_and_set_running(t) \ test_and_set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index c419a61f60e5..e68fecf6eab5 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -67,9 +67,10 @@ enum { * We currently do not support more than one RPC program per daemon. */ struct svc_serv { - struct svc_program * sv_program; /* RPC program */ + struct svc_program * sv_programs; /* RPC programs */ struct svc_stat * sv_stats; /* RPC statistics */ spinlock_t sv_lock; + unsigned int sv_nprogs; /* Number of sv_programs */ unsigned int sv_nrthreads; /* # of server threads */ unsigned int sv_maxconn; /* max connections allowed or * '0' causing max to be based @@ -360,10 +361,9 @@ struct svc_process_info { }; /* - * List of RPC programs on the same transport endpoint + * RPC program - an array of these can use the same transport endpoint */ struct svc_program { - struct svc_program * pg_next; /* other programs (same xprt) */ u32 pg_prog; /* program number */ unsigned int pg_lovers; /* lowest version */ unsigned int pg_hivers; /* highest version */ @@ -441,6 +441,7 @@ bool svc_rqst_replace_page(struct svc_rqst *rqstp, void svc_rqst_release_pages(struct svc_rqst *rqstp); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *prog, + unsigned int nprog, struct svc_stat *stats, unsigned int bufsize, int (*threadfn)(void *data)); diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index 63cf6fb26dcc..2e111153f7cd 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -14,6 +14,7 @@ #include <linux/sunrpc/msg_prot.h> #include <linux/sunrpc/cache.h> #include <linux/sunrpc/gss_api.h> +#include <linux/sunrpc/clnt.h> #include <linux/hash.h> #include <linux/stringhash.h> #include <linux/cred.h> @@ -157,6 +158,10 @@ extern enum svc_auth_status svc_set_client(struct svc_rqst *rqstp); extern int svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops); extern void svc_auth_unregister(rpc_authflavor_t flavor); +extern void svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt, + const struct cred *, + struct svc_cred *); + extern struct auth_domain *unix_domain_find(char *name); extern void auth_domain_put(struct auth_domain *item); extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new); diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c index 95ff74706104..4f31e73dc34d 100644 --- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -731,11 +731,10 @@ static bool cache_defer_req(struct cache_req *req, struct cache_head *item) static void cache_revisit_request(struct cache_head *item) { struct cache_deferred_req *dreq; - struct list_head pending; struct hlist_node *tmp; int hash = DFR_HASH(item); + LIST_HEAD(pending); - INIT_LIST_HEAD(&pending); spin_lock(&cache_defer_lock); hlist_for_each_entry_safe(dreq, tmp, &cache_defer_hash[hash], hash) @@ -756,10 +755,8 @@ static void cache_revisit_request(struct cache_head *item) void cache_clean_deferred(void *owner) { struct cache_deferred_req *dreq, *tmp; - struct list_head pending; + LIST_HEAD(pending); - - INIT_LIST_HEAD(&pending); spin_lock(&cache_defer_lock); list_for_each_entry_safe(dreq, tmp, &cache_defer_list, recent) { @@ -1085,9 +1082,8 @@ static void cache_dequeue(struct cache_detail *detail, struct cache_head *ch) { struct cache_queue *cq, *tmp; struct cache_request *cr; - struct list_head dequeued; + LIST_HEAD(dequeued); - INIT_LIST_HEAD(&dequeued); spin_lock(&queue_lock); list_for_each_entry_safe(cq, tmp, &detail->queue, list) if (!cq->reader) { diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 09f29a95f2bc..0090162ee8c3 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -48,13 +48,8 @@ # define RPCDBG_FACILITY RPCDBG_CALL #endif -/* - * All RPC clients are linked into this list - */ - static DECLARE_WAIT_QUEUE_HEAD(destroy_wait); - static void call_start(struct rpc_task *task); static void call_reserve(struct rpc_task *task); static void call_reserveresult(struct rpc_task *task); @@ -546,7 +541,7 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args) .connect_timeout = args->connect_timeout, .reconnect_timeout = args->reconnect_timeout, }; - char servername[48]; + char servername[RPC_MAXNETNAMELEN]; struct rpc_clnt *clnt; int i; @@ -1893,12 +1888,6 @@ call_allocate(struct rpc_task *task) if (req->rq_buffer) return; - if (proc->p_proc != 0) { - BUG_ON(proc->p_arglen == 0); - if (proc->p_decode != NULL) - BUG_ON(proc->p_replen == 0); - } - /* * Calculate the size (in quads) of the RPC call * and reply headers, and convert both values diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 9aff845196ce..7e7f4e0390c7 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -440,10 +440,11 @@ EXPORT_SYMBOL_GPL(svc_rpcb_cleanup); static int svc_uses_rpcbind(struct svc_serv *serv) { - struct svc_program *progp; - unsigned int i; + unsigned int p, i; + + for (p = 0; p < serv->sv_nprogs; p++) { + struct svc_program *progp = &serv->sv_programs[p]; - for (progp = serv->sv_program; progp; progp = progp->pg_next) { for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; @@ -480,7 +481,7 @@ __svc_init_bc(struct svc_serv *serv) * Create an RPC service */ static struct svc_serv * -__svc_create(struct svc_program *prog, struct svc_stat *stats, +__svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats, unsigned int bufsize, int npools, int (*threadfn)(void *data)) { struct svc_serv *serv; @@ -491,7 +492,8 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats, if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL))) return NULL; serv->sv_name = prog->pg_name; - serv->sv_program = prog; + serv->sv_programs = prog; + serv->sv_nprogs = nprogs; serv->sv_stats = stats; if (bufsize > RPCSVC_MAXPAYLOAD) bufsize = RPCSVC_MAXPAYLOAD; @@ -499,17 +501,18 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats, serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE); serv->sv_threadfn = threadfn; xdrsize = 0; - while (prog) { - prog->pg_lovers = prog->pg_nvers-1; - for (vers=0; vers<prog->pg_nvers ; vers++) - if (prog->pg_vers[vers]) { - prog->pg_hivers = vers; - if (prog->pg_lovers > vers) - prog->pg_lovers = vers; - if (prog->pg_vers[vers]->vs_xdrsize > xdrsize) - xdrsize = prog->pg_vers[vers]->vs_xdrsize; + for (i = 0; i < nprogs; i++) { + struct svc_program *progp = &prog[i]; + + progp->pg_lovers = progp->pg_nvers-1; + for (vers = 0; vers < progp->pg_nvers ; vers++) + if (progp->pg_vers[vers]) { + progp->pg_hivers = vers; + if (progp->pg_lovers > vers) + progp->pg_lovers = vers; + if (progp->pg_vers[vers]->vs_xdrsize > xdrsize) + xdrsize = progp->pg_vers[vers]->vs_xdrsize; } - prog = prog->pg_next; } serv->sv_xdrsize = xdrsize; INIT_LIST_HEAD(&serv->sv_tempsocks); @@ -558,13 +561,14 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats, struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize, int (*threadfn)(void *data)) { - return __svc_create(prog, NULL, bufsize, 1, threadfn); + return __svc_create(prog, 1, NULL, bufsize, 1, threadfn); } EXPORT_SYMBOL_GPL(svc_create); /** * svc_create_pooled - Create an RPC service with pooled threads - * @prog: the RPC program the new service will handle + * @prog: Array of RPC programs the new service will handle + * @nprogs: Number of programs in the array * @stats: the stats struct if desired * @bufsize: maximum message size for @prog * @threadfn: a function to service RPC requests for @prog @@ -572,6 +576,7 @@ EXPORT_SYMBOL_GPL(svc_create); * Returns an instantiated struct svc_serv object or NULL. */ struct svc_serv *svc_create_pooled(struct svc_program *prog, + unsigned int nprogs, struct svc_stat *stats, unsigned int bufsize, int (*threadfn)(void *data)) @@ -579,7 +584,7 @@ struct svc_serv *svc_create_pooled(struct svc_program *prog, struct svc_serv *serv; unsigned int npools = svc_pool_map_get(); - serv = __svc_create(prog, stats, bufsize, npools, threadfn); + serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn); if (!serv) goto out_err; serv->sv_is_pooled = true; @@ -602,16 +607,16 @@ svc_destroy(struct svc_serv **servp) *servp = NULL; - dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name); + dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name); timer_shutdown_sync(&serv->sv_temptimer); /* * Remaining transports at this point are not expected. */ WARN_ONCE(!list_empty(&serv->sv_permsocks), - "SVC: permsocks remain for %s\n", serv->sv_program->pg_name); + "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name); WARN_ONCE(!list_empty(&serv->sv_tempsocks), - "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name); + "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name); cache_clean_deferred(serv); @@ -1148,15 +1153,16 @@ int svc_register(const struct svc_serv *serv, struct net *net, const int family, const unsigned short proto, const unsigned short port) { - struct svc_program *progp; - unsigned int i; + unsigned int p, i; int error = 0; WARN_ON_ONCE(proto == 0 && port == 0); if (proto == 0 && port == 0) return -EINVAL; - for (progp = serv->sv_program; progp; progp = progp->pg_next) { + for (p = 0; p < serv->sv_nprogs; p++) { + struct svc_program *progp = &serv->sv_programs[p]; + for (i = 0; i < progp->pg_nvers; i++) { error = progp->pg_rpcbind_set(net, progp, i, @@ -1208,13 +1214,14 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi static void svc_unregister(const struct svc_serv *serv, struct net *net) { struct sighand_struct *sighand; - struct svc_program *progp; unsigned long flags; - unsigned int i; + unsigned int p, i; clear_thread_flag(TIF_SIGPENDING); - for (progp = serv->sv_program; progp; progp = progp->pg_next) { + for (p = 0; p < serv->sv_nprogs; p++) { + struct svc_program *progp = &serv->sv_programs[p]; + for (i = 0; i < progp->pg_nvers; i++) { if (progp->pg_vers[i] == NULL) continue; @@ -1320,7 +1327,7 @@ svc_process_common(struct svc_rqst *rqstp) struct svc_process_info process; enum svc_auth_status auth_res; unsigned int aoffset; - int rc; + int pr, rc; __be32 *p; /* Will be turned off only when NFSv4 Sessions are used */ @@ -1344,9 +1351,12 @@ svc_process_common(struct svc_rqst *rqstp) rqstp->rq_vers = be32_to_cpup(p++); rqstp->rq_proc = be32_to_cpup(p); - for (progp = serv->sv_program; progp; progp = progp->pg_next) + for (pr = 0; pr < serv->sv_nprogs; pr++) { + progp = &serv->sv_programs[pr]; + if (rqstp->rq_prog == progp->pg_prog) break; + } /* * Decode auth data, and add verifier to reply buffer. diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c index 53ebc719ff5a..43c57124de52 100644 --- a/net/sunrpc/svc_xprt.c +++ b/net/sunrpc/svc_xprt.c @@ -268,7 +268,7 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name, spin_unlock(&svc_xprt_class_lock); newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags); if (IS_ERR(newxprt)) { - trace_svc_xprt_create_err(serv->sv_program->pg_name, + trace_svc_xprt_create_err(serv->sv_programs->pg_name, xcl->xcl_name, sap, len, newxprt); module_put(xcl->xcl_owner); diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c index 93d9e949e265..55b4d2874188 100644 --- a/net/sunrpc/svcauth.c +++ b/net/sunrpc/svcauth.c @@ -18,6 +18,7 @@ #include <linux/sunrpc/svcauth.h> #include <linux/err.h> #include <linux/hash.h> +#include <linux/user_namespace.h> #include <trace/events/sunrpc.h> @@ -175,6 +176,33 @@ rpc_authflavor_t svc_auth_flavor(struct svc_rqst *rqstp) } EXPORT_SYMBOL_GPL(svc_auth_flavor); +/** + * svcauth_map_clnt_to_svc_cred_local - maps a generic cred + * to a svc_cred suitable for use in nfsd. + * @clnt: rpc_clnt associated with nfs client + * @cred: generic cred associated with nfs client + * @svc: returned svc_cred that is suitable for use in nfsd + */ +void svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt, + const struct cred *cred, + struct svc_cred *svc) +{ + struct user_namespace *userns = clnt->cl_cred ? + clnt->cl_cred->user_ns : &init_user_ns; + + memset(svc, 0, sizeof(struct svc_cred)); + + svc->cr_uid = KUIDT_INIT(from_kuid_munged(userns, cred->fsuid)); + svc->cr_gid = KGIDT_INIT(from_kgid_munged(userns, cred->fsgid)); + svc->cr_flavor = clnt->cl_auth->au_flavor; + if (cred->group_info) + svc->cr_group_info = get_group_info(cred->group_info); + /* These aren't relevant for local (network is bypassed) */ + svc->cr_principal = NULL; + svc->cr_gss_mech = NULL; +} +EXPORT_SYMBOL_GPL(svcauth_map_clnt_to_svc_cred_local); + /************************************************** * 'auth_domains' are stored in a hash table indexed by name. * When the last reference to an 'auth_domain' is dropped, diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index 04b45588ae6f..8ca98b146ec8 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -697,7 +697,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp) rqstp->rq_auth_stat = rpc_autherr_badcred; ipm = ip_map_cached_get(xprt); if (ipm == NULL) - ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class, + ipm = __ip_map_lookup(sn->ip_map_cache, + rqstp->rq_server->sv_programs->pg_class, &sin6->sin6_addr); if (ipm == NULL) |