diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-03-11 09:38:17 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-03-11 09:38:17 -0700 |
commit | 7ea65c89d864f1e9aa892eec85625a96fa9acee6 (patch) | |
tree | 423ab3b73e04a64c37c830ceaa7d6a82748eadbf /fs | |
parent | 97ec9715a84e6f0979242e1ea98b9af1a39acf3b (diff) | |
parent | 09406ad8e5105729291a7639160e0cd51c9e0c6c (diff) |
Merge tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Misc features, cleanups, and fixes for vfs and individual filesystems.
Features:
- Support idmapped mounts for hugetlbfs.
- Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug
where the passed offset is ignored if the file is O_APPEND. The new
flag allows a caller to enforce that the offset is honored to
conform to posix even if the file was opened in append mode.
- Move i_mmap_rwsem in struct address_space to avoid false sharing
between i_mmap and i_mmap_rwsem.
- Convert efs, qnx4, and coda to use the new mount api.
- Add a generic is_dot_dotdot() helper that's used by various
filesystems and the VFS code instead of open-coding it multiple
times.
- Recently we've added stable offsets which allows stable ordering
when iterating directories exported through NFS on e.g., tmpfs
filesystems. Originally an xarray was used for the offset map but
that caused slab fragmentation issues over time. This switches the
offset map to the maple tree which has a dense mode that handles
this scenario a lot better. Includes tests.
- Finally merge the case-insensitive improvement series Gabriel has
been working on for a long time. This cleanly propagates case
insensitive operations through ->s_d_op which in turn allows us to
remove the quite ugly generic_set_encrypted_ci_d_ops() operations.
It also improves performance by trying a case-sensitive comparison
first and then fallback to case-insensitive lookup if that fails.
This also fixes a bug where overlayfs would be able to be mounted
over a case insensitive directory which would lead to all sort of
odd behaviors.
Cleanups:
- Make file_dentry() a simple accessor now that ->d_real() is
simplified because of the backing file work we did the last two
cycles.
- Use the dedicated file_mnt_idmap helper in ntfs3.
- Use smp_load_acquire/store_release() in the i_size_read/write
helpers and thus remove the hack to handle i_size reads in the
filemap code.
- The SLAB_MEM_SPREAD is a nop now. Remove it from various places in
fs/
- It's no longer necessary to perform a second built-in initramfs
unpack call because we retain the contents of the previous
extraction. Remove it.
- Now that we have removed various allocators kfree_rcu() always
works with kmem caches and kmalloc(). So simplify various places
that only use an rcu callback in order to handle the kmem cache
case.
- Convert the pipe code to use a lockdep comparison function instead
of open-coding the nesting making lockdep validation easier.
- Move code into fs-writeback.c that was located in a header but can
be made static as it's only used in that one file.
- Rewrite the alignment checking iterators for iovec and bvec to be
easier to read, and also significantly more compact in terms of
generated code. This saves 270 bytes of text on x86-64 (with
clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also
saves a bit of time for the same workload.
- Switch various places to use KMEM_CACHE instead of
kmem_cache_create().
- Use inode_set_ctime_to_ts() in inode_set_ctime_current()
- Use kzalloc() in name_to_handle_at() to avoid kernel infoleak.
- Various smaller cleanups for eventfds.
Fixes:
- Fix various comments and typos, and unneeded initializations.
- Fix stack allocation hack for clang in the select code.
- Improve dump_mapping() debug code on a best-effort basis.
- Fix build errors in various selftests.
- Avoid wrap-around instrumentation in various places.
- Don't allow user namespaces without an idmapping to be used for
idmapped mounts.
- Fix sysv sb_read() call.
- Fix fallback implementation of the get_name() export operation"
* tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits)
hugetlbfs: support idmapped mounts
qnx4: convert qnx4 to use the new mount api
fs: use inode_set_ctime_to_ts to set inode ctime to current time
libfs: Drop generic_set_encrypted_ci_d_ops
ubifs: Configure dentry operations at dentry-creation time
f2fs: Configure dentry operations at dentry-creation time
ext4: Configure dentry operations at dentry-creation time
libfs: Add helper to choose dentry operations at mount-time
libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops
fscrypt: Drop d_revalidate once the key is added
fscrypt: Drop d_revalidate for valid dentries during lookup
fscrypt: Factor out a helper to configure the lookup dentry
ovl: Always reject mounting over case-insensitive directories
libfs: Attempt exact-match comparison first during casefolded lookup
efs: remove SLAB_MEM_SPREAD flag usage
jfs: remove SLAB_MEM_SPREAD flag usage
minix: remove SLAB_MEM_SPREAD flag usage
openpromfs: remove SLAB_MEM_SPREAD flag usage
proc: remove SLAB_MEM_SPREAD flag usage
qnx6: remove SLAB_MEM_SPREAD flag usage
...
Diffstat (limited to 'fs')
45 files changed, 494 insertions, 390 deletions
diff --git a/fs/attr.c b/fs/attr.c index 5a13f0c8495f..49d23b5dbab4 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -352,7 +352,7 @@ int may_setattr(struct mnt_idmap *idmap, struct inode *inode, EXPORT_SYMBOL(may_setattr); /** - * notify_change - modify attributes of a filesytem object + * notify_change - modify attributes of a filesystem object * @idmap: idmap of the mount the inode was found from * @dentry: object affected * @attr: new attributes diff --git a/fs/backing-file.c b/fs/backing-file.c index a681f38d84d8..740185198db3 100644 --- a/fs/backing-file.c +++ b/fs/backing-file.c @@ -325,9 +325,7 @@ EXPORT_SYMBOL_GPL(backing_file_mmap); static int __init backing_aio_init(void) { - backing_aio_cachep = kmem_cache_create("backing_aio", - sizeof(struct backing_aio), - 0, SLAB_HWCACHE_ALIGN, NULL); + backing_aio_cachep = KMEM_CACHE(backing_aio, SLAB_HWCACHE_ALIGN); if (!backing_aio_cachep) return -ENOMEM; diff --git a/fs/buffer.c b/fs/buffer.c index d3bcf601d3e5..9a54077de87d 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -464,7 +464,7 @@ EXPORT_SYMBOL(mark_buffer_async_write); * a successful fsync(). For example, ext2 indirect blocks need to be * written back and waited upon before fsync() returns. * - * The functions mark_buffer_inode_dirty(), fsync_inode_buffers(), + * The functions mark_buffer_dirty_inode(), fsync_inode_buffers(), * inode_has_buffers() and invalidate_inode_buffers() are provided for the * management of a list of dependent buffers at ->i_mapping->i_private_list. * @@ -3121,12 +3121,8 @@ void __init buffer_init(void) unsigned long nrpages; int ret; - bh_cachep = kmem_cache_create("buffer_head", - sizeof(struct buffer_head), 0, - (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC| - SLAB_MEM_SPREAD), - NULL); - + bh_cachep = KMEM_CACHE(buffer_head, + SLAB_RECLAIM_ACCOUNT|SLAB_PANIC); /* * Limit the bh occupancy to 10% of ZONE_NORMAL */ diff --git a/fs/coda/inode.c b/fs/coda/inode.c index 0c7c2528791e..a50356c541f6 100644 --- a/fs/coda/inode.c +++ b/fs/coda/inode.c @@ -24,6 +24,8 @@ #include <linux/pid_namespace.h> #include <linux/uaccess.h> #include <linux/fs.h> +#include <linux/fs_context.h> +#include <linux/fs_parser.h> #include <linux/vmalloc.h> #include <linux/coda.h> @@ -87,10 +89,10 @@ void coda_destroy_inodecache(void) kmem_cache_destroy(coda_inode_cachep); } -static int coda_remount(struct super_block *sb, int *flags, char *data) +static int coda_reconfigure(struct fs_context *fc) { - sync_filesystem(sb); - *flags |= SB_NOATIME; + sync_filesystem(fc->root->d_sb); + fc->sb_flags |= SB_NOATIME; return 0; } @@ -102,78 +104,102 @@ static const struct super_operations coda_super_operations = .evict_inode = coda_evict_inode, .put_super = coda_put_super, .statfs = coda_statfs, - .remount_fs = coda_remount, }; -static int get_device_index(struct coda_mount_data *data) +struct coda_fs_context { + int idx; +}; + +enum { + Opt_fd, +}; + +static const struct fs_parameter_spec coda_param_specs[] = { + fsparam_fd ("fd", Opt_fd), + {} +}; + +static int coda_parse_fd(struct fs_context *fc, int fd) { + struct coda_fs_context *ctx = fc->fs_private; struct fd f; struct inode *inode; int idx; - if (data == NULL) { - pr_warn("%s: Bad mount data\n", __func__); - return -1; - } - - if (data->version != CODA_MOUNT_VERSION) { - pr_warn("%s: Bad mount version\n", __func__); - return -1; - } - - f = fdget(data->fd); + f = fdget(fd); if (!f.file) - goto Ebadf; + return -EBADF; inode = file_inode(f.file); if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) { fdput(f); - goto Ebadf; + return invalf(fc, "code: Not coda psdev"); } idx = iminor(inode); fdput(f); - if (idx < 0 || idx >= MAX_CODADEVS) { - pr_warn("%s: Bad minor number\n", __func__); - return -1; + if (idx < 0 || idx >= MAX_CODADEVS) + return invalf(fc, "coda: Bad minor number"); + ctx->idx = idx; + return 0; +} + +static int coda_parse_param(struct fs_context *fc, struct fs_parameter *param) +{ + struct fs_parse_result result; + int opt; + + opt = fs_parse(fc, coda_param_specs, param, &result); + if (opt < 0) + return opt; + + switch (opt) { + case Opt_fd: + return coda_parse_fd(fc, result.uint_32); } - return idx; -Ebadf: - pr_warn("%s: Bad file\n", __func__); - return -1; + return 0; +} + +/* + * Parse coda's binary mount data form. We ignore any errors and go with index + * 0 if we get one for backward compatibility. + */ +static int coda_parse_monolithic(struct fs_context *fc, void *_data) +{ + struct coda_mount_data *data = _data; + + if (!data) + return invalf(fc, "coda: Bad mount data"); + + if (data->version != CODA_MOUNT_VERSION) + return invalf(fc, "coda: Bad mount version"); + + coda_parse_fd(fc, data->fd); + return 0; } -static int coda_fill_super(struct super_block *sb, void *data, int silent) +static int coda_fill_super(struct super_block *sb, struct fs_context *fc) { + struct coda_fs_context *ctx = fc->fs_private; struct inode *root = NULL; struct venus_comm *vc; struct CodaFid fid; int error; - int idx; - - if (task_active_pid_ns(current) != &init_pid_ns) - return -EINVAL; - - idx = get_device_index((struct coda_mount_data *) data); - /* Ignore errors in data, for backward compatibility */ - if(idx == -1) - idx = 0; - - pr_info("%s: device index: %i\n", __func__, idx); + infof(fc, "coda: device index: %i\n", ctx->idx); - vc = &coda_comms[idx]; + vc = &coda_comms[ctx->idx]; mutex_lock(&vc->vc_mutex); if (!vc->vc_inuse) { - pr_warn("%s: No pseudo device\n", __func__); + errorf(fc, "coda: No pseudo device"); error = -EINVAL; goto unlock_out; } if (vc->vc_sb) { - pr_warn("%s: Device already mounted\n", __func__); + errorf(fc, "coda: Device already mounted"); error = -EBUSY; goto unlock_out; } @@ -313,18 +339,45 @@ static int coda_statfs(struct dentry *dentry, struct kstatfs *buf) return 0; } -/* init_coda: used by filesystems.c to register coda */ +static int coda_get_tree(struct fs_context *fc) +{ + if (task_active_pid_ns(current) != &init_pid_ns) + return -EINVAL; -static struct dentry *coda_mount(struct file_system_type *fs_type, - int flags, const char *dev_name, void *data) + return get_tree_nodev(fc, coda_fill_super); +} + +static void coda_free_fc(struct fs_context *fc) { - return mount_nodev(fs_type, flags, data, coda_fill_super); + kfree(fc->fs_private); +} + +static const struct fs_context_operations coda_context_ops = { + .free = coda_free_fc, + .parse_param = coda_parse_param, + .parse_monolithic = coda_parse_monolithic, + .get_tree = coda_get_tree, + .reconfigure = coda_reconfigure, +}; + +static int coda_init_fs_context(struct fs_context *fc) +{ + struct coda_fs_context *ctx; + + ctx = kzalloc(sizeof(struct coda_fs_context), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + + fc->fs_private = ctx; + fc->ops = &coda_context_ops; + return 0; } struct file_system_type coda_fs_type = { .owner = THIS_MODULE, .name = "coda", - .mount = coda_mount, + .init_fs_context = coda_init_fs_context, + .parameters = coda_param_specs, .kill_sb = kill_anon_super, .fs_flags = FS_BINARY_MOUNTDATA, }; diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c index 7b3fc189593a..0ad52fbe51c9 100644 --- a/fs/crypto/fname.c +++ b/fs/crypto/fname.c @@ -74,13 +74,7 @@ struct fscrypt_nokey_name { static inline bool fscrypt_is_dot_dotdot(const struct qstr *str) { - if (str->len == 1 && str->name[0] == '.') - return true; - - if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.') - return true; - - return false; + return is_dot_dotdot(str->name, str->len); } /** diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c index 52504dd478d3..104771c3d3f6 100644 --- a/fs/crypto/hooks.c +++ b/fs/crypto/hooks.c @@ -102,11 +102,8 @@ int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry, if (err && err != -ENOENT) return err; - if (fname->is_nokey_name) { - spin_lock(&dentry->d_lock); - dentry->d_flags |= DCACHE_NOKEY_NAME; - spin_unlock(&dentry->d_lock); - } + fscrypt_prepare_dentry(dentry, fname->is_nokey_name); + return err; } EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup); @@ -131,12 +128,10 @@ EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup); int fscrypt_prepare_lookup_partial(struct inode *dir, struct dentry *dentry) { int err = fscrypt_get_encryption_info(dir, true); + bool is_nokey_name = (!err && !fscrypt_has_encryption_key(dir)); + + fscrypt_prepare_dentry(dentry, is_nokey_name); - if (!err && !fscrypt_has_encryption_key(dir)) { - spin_lock(&dentry->d_lock); - dentry->d_flags |= DCACHE_NOKEY_NAME; - spin_unlock(&dentry->d_lock); - } return err; } EXPORT_SYMBOL_GPL(fscrypt_prepare_lookup_partial); diff --git a/fs/dcache.c b/fs/dcache.c index 6ebccba33336..71a8e943a0fa 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3139,7 +3139,7 @@ static void __init dcache_init(void) * of the dcache. */ dentry_cache = KMEM_CACHE_USERCOPY(dentry, - SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT, + SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT, d_iname); /* Hash may have been set up in dcache_init_early */ diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c index 03bd55069d86..2fe0f3af1a08 100644 --- a/fs/ecryptfs/crypto.c +++ b/fs/ecryptfs/crypto.c @@ -1949,16 +1949,6 @@ out: return rc; } -static bool is_dot_dotdot(const char *name, size_t name_size) -{ - if (name_size == 1 && name[0] == '.') - return true; - else if (name_size == 2 && name[0] == '.' && name[1] == '.') - return true; - - return false; -} - /** * ecryptfs_decode_and_decrypt_filename - converts the encoded cipher text name to decoded plaintext * @plaintext_name: The plaintext name diff --git a/fs/efs/super.c b/fs/efs/super.c index f17fdac76b2e..e4421c10caeb 100644 --- a/fs/efs/super.c +++ b/fs/efs/super.c @@ -14,19 +14,14 @@ #include <linux/buffer_head.h> #include <linux/vfs.h> #include <linux/blkdev.h> - +#include <linux/fs_context.h> +#include <linux/fs_parser.h> #include "efs.h" #include <linux/efs_vh.h> #include <linux/efs_fs_sb.h> static int efs_statfs(struct dentry *dentry, struct kstatfs *buf); -static int efs_fill_super(struct super_block *s, void *d, int silent); - -static struct dentry *efs_mount(struct file_system_type *fs_type, - int flags, const char *dev_name, void *data) -{ - return mount_bdev(fs_type, flags, dev_name, data, efs_fill_super); -} +static int efs_init_fs_context(struct fs_context *fc); static void efs_kill_sb(struct super_block *s) { @@ -35,15 +30,6 @@ static void efs_kill_sb(struct super_block *s) kfree(sbi); } -static struct file_system_type efs_fs_type = { - .owner = THIS_MODULE, - .name = "efs", - .mount = efs_mount, - .kill_sb = efs_kill_sb, - .fs_flags = FS_REQUIRES_DEV, -}; -MODULE_ALIAS_FS("efs"); - static struct pt_types sgi_pt_types[] = { {0x00, "SGI vh"}, {0x01, "SGI trkrepl"}, @@ -63,6 +49,27 @@ static struct pt_types sgi_pt_types[] = { {0, NULL} }; +enum { + Opt_explicit_open, +}; + +static const struct fs_parameter_spec efs_param_spec[] = { + fsparam_flag ("explicit-open", Opt_explicit_open), + {} +}; + +/* + * File system definition and registration. + */ +static struct file_system_type efs_fs_type = { + .owner = THIS_MODULE, + .name = "efs", + .kill_sb = efs_kill_sb, + .fs_flags = FS_REQUIRES_DEV, + .init_fs_context = efs_init_fs_context, + .parameters = efs_param_spec, +}; +MODULE_ALIAS_FS("efs"); static struct kmem_cache * efs_inode_cachep; @@ -91,8 +98,8 @@ static int __init init_inodecache(void) { efs_inode_cachep = kmem_cache_create("efs_inode_cache", sizeof(struct efs_inode_info), 0, - SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD| - SLAB_ACCOUNT, init_once); + SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT, + init_once); if (efs_inode_cachep == NULL) return -ENOMEM; return 0; @@ -108,18 +115,10 @@ static void destroy_inodecache(void) kmem_cache_destroy(efs_inode_cachep); } -static int efs_remount(struct super_block *sb, int *flags, char *data) -{ - sync_filesystem(sb); - *flags |= SB_RDONLY; - return 0; -} - static const struct super_operations efs_superblock_operations = { .alloc_inode = efs_alloc_inode, .free_inode = efs_free_inode, .statfs = efs_statfs, - .remount_fs = efs_remount, }; static const struct export_operations efs_export_ops = { @@ -249,26 +248,26 @@ static int efs_validate_super(struct efs_sb_info *sb, struct efs_super *super) { return 0; } -static int efs_fill_super(struct super_block *s, void *d, int silent) +static int efs_fill_super(struct super_block *s, struct fs_context *fc) { struct efs_sb_info *sb; struct buffer_head *bh; struct inode *root; - sb = kzalloc(sizeof(struct efs_sb_info), GFP_KERNEL); + sb = kzalloc(sizeof(struct efs_sb_info), GFP_KERNEL); if (!sb) return -ENOMEM; s->s_fs_info = sb; s->s_time_min = 0; s->s_time_max = U32_MAX; - + s->s_magic = EFS_SUPER_MAGIC; if (!sb_set_blocksize(s, EFS_BLOCKSIZE)) { pr_err("device does not support %d byte blocks\n", EFS_BLOCKSIZE); return -EINVAL; } - + /* read the vh (volume header) block */ bh = sb_bread(s, 0); @@ -294,7 +293,7 @@ static int efs_fill_super(struct super_block *s, void *d, int silent) pr_err("cannot read superblock\n"); return -EIO; } - + if (efs_validate_super(sb, (struct efs_super *) bh->b_data)) { #ifdef DEBUG pr_warn("invalid superblock at block %u\n", @@ -328,6 +327,61 @@ static int efs_fill_super(struct super_block *s, void *d, int silent) return 0; } +static void efs_free_fc(struct fs_context *fc) +{ + kfree(fc->fs_private); +} + +static int efs_get_tree(struct fs_context *fc) +{ + return get_tree_bdev(fc, efs_fill_super); +} + +static int efs_parse_param(struct fs_context *fc, struct fs_parameter *param) +{ + int token; + struct fs_parse_result result; + + token = fs_parse(fc, efs_param_spec, param, &result); + if (token < 0) + return token; + return 0; +} + +static int efs_reconfigure(struct fs_context *fc) +{ + sync_filesystem(fc->root->d_sb); + + return 0; +} + +struct efs_context { + unsigned long s_mount_opts; +}; + +static const struct fs_context_operations efs_context_opts = { + .parse_param = efs_parse_param, + .get_tree = efs_get_tree, + .reconfigure = efs_reconfigure, + .free = efs_free_fc, +}; + +/* + * Set up the filesystem mount context. + */ +static int efs_init_fs_context(struct fs_context *fc) +{ + struct efs_context *ctx; + + ctx = kzalloc(sizeof(struct efs_context), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + fc->fs_private = ctx; + fc->ops = &efs_context_opts; + + return 0; +} + static int efs_statfs(struct dentry *dentry, struct kstatfs *buf) { struct super_block *sb = dentry->d_sb; struct efs_sb_info *sbi = SUPER_INFO(sb); diff --git a/fs/eventfd.c b/fs/eventfd.c index ad8186d47ba7..9afdb722fa92 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -251,7 +251,7 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c ssize_t res; __u64 ucnt; - if (count < sizeof(ucnt)) + if (count != sizeof(ucnt)) return -EINVAL; if (copy_from_user(&ucnt, buf, sizeof(ucnt))) return -EFAULT; @@ -283,13 +283,18 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c static void eventfd_show_fdinfo(struct seq_file *m, struct file *f) { struct eventfd_ctx *ctx = f->private_data; + __u64 cnt; spin_lock_irq(&ctx->wqh.lock); - seq_printf(m, "eventfd-count: %16llx\n", - (unsigned long long)ctx->count); + cnt = ctx->count; spin_unlock_irq(&ctx->wqh.lock); - seq_printf(m, "eventfd-id: %d\n", ctx->id); - seq_printf(m, "eventfd-semaphore: %d\n", + + seq_printf(m, + "eventfd-count: %16llx\n" + "eventfd-id: %d\n" + "eventfd-semaphore: %d\n", + cnt, + ctx->id, !!(ctx->flags & EFD_SEMAPHORE)); } #endif @@ -383,6 +388,7 @@ static int do_eventfd(unsigned int count, int flags) /* Check the EFD_* constants for consistency. */ BUILD_BUG_ON(EFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(EFD_NONBLOCK != O_NONBLOCK); + BUILD_BUG_ON(EFD_SEMAPHORE != (1 << 0)); if (flags & ~EFD_FLAGS_SET) return -EINVAL; diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3534d36a1474..39ac6fdf8bca 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -206,7 +206,7 @@ struct eventpoll { */ struct epitem *ovflist; - /* wakeup_source used when ep_scan_ready_list is running */ + /* wakeup_source used when ep_send_events or __ep_eventpoll_poll is running */ struct wakeup_source *ws; /* The user that created the eventpoll descriptor */ @@ -678,12 +678,6 @@ static void ep_done_scan(struct eventpoll *ep, write_unlock_irq(&ep->lock); } -static void epi_rcu_free(struct rcu_head *head) -{ - struct epitem *epi = container_of(head, struct epitem, rcu); - kmem_cache_free(epi_cache, epi); -} - static void ep_get(struct eventpoll *ep) { refcount_inc(&ep->refcount); @@ -767,7 +761,7 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force) * ep->mtx. The rcu read side, reverse_path_check_proc(), does not make * use of the rbn field. */ - call_rcu(&epi->rcu, epi_rcu_free); + kfree_rcu(epi, rcu); percpu_counter_dec(&ep->user->epoll_watches); return ep_refcount_dec_and_test(ep); @@ -1153,7 +1147,7 @@ static inline bool chain_epi_lockless(struct epitem *epi) * This callback takes a read lock in order not to contend with concurrent * events from another file descriptor, thus all modifications to ->rdllist * or ->ovflist are lockless. Read lock is paired with the write lock from - * ep_scan_ready_list(), which stops all list modifications and guarantees + * ep_start/done_scan(), which stops all list modifications and guarantees * that lists state is seen correctly. * * Another thing worth to mention is that ep_poll_callback() can be called @@ -1751,7 +1745,7 @@ static int ep_send_events(struct eventpoll *ep, * availability. At this point, no one can insert * into ep->rdllist besides us. The epoll_ctl() * callers are locked out by - * ep_scan_ready_list() holding "mtx" and the + * ep_send_events() holding "mtx" and the * poll callback will queue them in ep->ovflist. */ list_add_tail(&epi->rdllink, &ep->rdllist); @@ -1904,7 +1898,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, __set_current_state(TASK_INTERRUPTIBLE); /* - * Do the final check under the lock. ep_scan_ready_list() + * Do the final check under the lock. ep_start/done_scan() * plays with two lists (->rdllist and ->ovflist) and there * is always a race when both lists are empty for short * period of time although events are pending, so lock is diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 3ae0154c5680..07ea3d62b298 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -255,7 +255,7 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len, container_of(ctx, struct getdents_callback, ctx); buf->sequence++; - if (buf->ino == ino && len <= NAME_MAX) { + if (buf->ino == ino && len <= NAME_MAX && !is_dot_dotdot(name, len)) { memcpy(buf->name, name, len); buf->name[len] = '\0'; buf->found = 1; diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 05b647e6bc19..5e4f65c14dfb 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1762,7 +1762,6 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir, struct buffer_head *bh; err = ext4_fname_prepare_lookup(dir, dentry, &fname); - generic_set_encrypted_ci_d_ops(dentry); if (err == -ENOENT) return NULL; if (err) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0f931d0c227d..d068d550d271 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5484,6 +5484,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) goto failed_mount4; } + generic_set_sb_d_ops(sb); sb->s_root = d_make_root(root); if (!sb->s_root) { ext4_msg(sb, KERN_ERR, "get root dentry failed"); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 65294e3b0bef..dda0aed95464 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3364,17 +3364,6 @@ static inline bool f2fs_cp_error(struct f2fs_sb_info *sbi) return is_set_ckpt_flags(sbi, CP_ERROR_FLAG); } -static inline bool is_dot_dotdot(const u8 *name, size_t len) -{ - if (len == 1 && name[0] == '.') - return true; - - if (len == 2 && name[0] == '.' && name[1] == '.') - return true; - - return false; -} - static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi, size_t size, gfp_t flags) { diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index b3bb815fc6aa..f7f63a567d86 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -531,7 +531,6 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry, } err = f2fs_prepare_lookup(dir, dentry, &fname); - generic_set_encrypted_ci_d_ops(dentry); if (err == -ENOENT) goto out_splice; if (err) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index d45ab0992ae5..8d9bd880ffd8 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4660,6 +4660,7 @@ try_onemore: goto free_node_inode; } + generic_set_sb_d_ops(sb); sb->s_root = d_make_root(root); /* allocate root dentry */ if (!sb->s_root) { err = -ENOMEM; diff --git a/fs/fcntl.c b/fs/fcntl.c index c80a6acad742..c3e342eb74af 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -846,12 +846,6 @@ int send_sigurg(struct fown_struct *fown) static DEFINE_SPINLOCK(fasync_lock); static struct kmem_cache *fasync_cache __ro_after_init; -static void fasync_free_rcu(struct rcu_head *head) -{ - kmem_cache_free(fasync_cache, - container_of(head, struct fasync_struct, fa_rcu)); -} - /* * Remove a fasync entry. If successfully removed, return * positive and clear the FASYNC flag. If no entry exists, @@ -877,7 +871,7 @@ int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp) write_unlock_irq(&fa->fa_lock); *fp = fa->fa_next; - call_rcu(&fa->fa_rcu, fasync_free_rcu); + kfree_rcu(fa, fa_rcu); filp->f_flags &= ~FASYNC; result = 1; break; diff --git a/fs/fhandle.c b/fs/fhandle.c index 18b3ba8dc8ea..57a12614addf 100644 --- a/fs/fhandle.c +++ b/fs/fhandle.c @@ -36,7 +36,7 @@ static long do_sys_name_to_handle(const struct path *path, if (f_handle.handle_bytes > MAX_HANDLE_SZ) return -EINVAL; - handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes, + handle = kzalloc(sizeof(struct file_handle) + f_handle.handle_bytes, GFP_KERNEL); if (!handle) return -ENOMEM; diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 3d84fcc471c6..e4f17c53ddfc 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -141,6 +141,31 @@ static void wb_wakeup(struct bdi_writeback *wb) spin_unlock_irq(&wb->work_lock); } +/* + * This function is used when the first inode for this wb is marked dirty. It + * wakes-up the corresponding bdi thread which should then take care of the + * periodic background write-out of dirty inodes. Since the write-out would + * starts only 'dirty_writeback_interval' centisecs from now anyway, we just + * set up a timer which wakes the bdi thread up later. + * + * Note, we wouldn't bother setting up the timer, but this function is on the + * fast-path (used by '__mark_inode_dirty()'), so we save few context switches + * by delaying the wake-up. + * + * We have to be careful not to postpone flush work if it is scheduled for + * earlier. Thus we use queue_delayed_work(). + */ +static void wb_wakeup_delayed(struct bdi_writeback *wb) +{ + unsigned long timeout; + + timeout = msecs_to_jiffies(dirty_writeback_interval * 10); + spin_lock_irq(&wb->work_lock); + if (test_bit(WB_registered, &wb->state)) + queue_delayed_work(bdi_wq, &wb->dwork, timeout); + spin_unlock_irq(&wb->work_lock); +} + static void finish_writeback_work(struct bdi_writeback *wb, struct wb_writeback_work *work) { diff --git a/fs/fs_parser.c b/fs/fs_parser.c index edb3712dcfa5..a4d6ca0b8971 100644 --- a/fs/fs_parser.c +++ b/fs/fs_parser.c @@ -83,8 +83,8 @@ static const struct fs_parameter_spec *fs_lookup_key( } /* - * fs_parse - Parse a filesystem configuration parameter - * @fc: The filesystem context to log errors through. + * __fs_parse - Parse a filesystem configuration parameter + * @log: The filesystem context to log errors through. * @desc: The parameter description to use. * @param: The parameter. * @result: Where to place the result of the parse diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c index b0cb70400996..ce9346099c72 100644 --- a/fs/hfsplus/wrapper.c +++ b/fs/hfsplus/wrapper.c @@ -30,7 +30,7 @@ struct hfsplus_wd { * @sector: block to read or write, for blocks of HFSPLUS_SECTOR_SIZE bytes * @buf: buffer for I/O * @data: output pointer for location of requested data - * @opf: request op flags + * @opf: I/O operation type and flags * * The unit of I/O is hfsplus_min_io_size(sb), which may be bigger than * HFSPLUS_SECTOR_SIZE, and @buf must be sized accordingly. On reads diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index d746866ae3b6..6502c7e776d1 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -933,7 +933,7 @@ static int hugetlbfs_setattr(struct mnt_idmap *idmap, unsigned int ia_valid = attr->ia_valid; struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode); - error = setattr_prepare(&nop_mnt_idmap, dentry, attr); + error = setattr_prepare(idmap, dentry, attr); if (error) return error; @@ -950,7 +950,7 @@ static int hugetlbfs_setattr(struct mnt_idmap *idmap, hugetlb_vmtruncate(inode, newsize); } - setattr_copy(&nop_mnt_idmap, inode, attr); + setattr_copy(idmap, inode, attr); mark_inode_dirty(inode); return 0; } @@ -985,6 +985,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb, static struct lock_class_key hugetlbfs_i_mmap_rwsem_key; static struct inode *hugetlbfs_get_inode(struct super_block *sb, + struct mnt_idmap *idmap, struct inode *dir, umode_t mode, dev_t dev) { @@ -1006,7 +1007,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb, struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode); inode->i_ino = get_next_ino(); - inode_init_owner(&nop_mnt_idmap, inode, dir, mode); + inode_init_owner(idmap, inode, dir, mode); lockdep_set_class(&inode->i_mapping->i_mmap_rwsem, &hugetlbfs_i_mmap_rwsem_key); inode->i_mapping->a_ops = &hugetlbfs_aops; @@ -1050,7 +1051,7 @@ static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir, { struct inode *inode; - inode = hugetlbfs_get_inode(dir->i_sb, dir, mode, dev); + inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode, dev); if (!inode) return -ENOSPC; inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); @@ -1062,7 +1063,7 @@ static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir, static int hugetlbfs_mkdir(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode) { - int retval = hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry, + int retval = hugetlbfs_mknod(idmap, dir, dentry, mode | S_IFDIR, 0); if (!retval) inc_nlink(dir); @@ -1073,7 +1074,7 @@ static int hugetlbfs_create(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, umode_t mode, bool excl) { - return hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFREG, 0); + return hugetlbfs_mknod(idmap, dir, dentry, mode | S_IFREG, 0); } static int hugetlbfs_tmpfile(struct mnt_idmap *idmap, @@ -1082,7 +1083,7 @@ static int hugetlbfs_tmpfile(struct mnt_idmap *idmap, { struct inode *inode; - inode = hugetlbfs_get_inode(dir->i_sb, dir, mode | S_IFREG, 0); + inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode | S_IFREG, 0); if (!inode) return -ENOSPC; inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir)); @@ -1094,10 +1095,11 @@ static int hugetlbfs_symlink(struct mnt_idmap *idmap, struct inode *dir, struct dentry *dentry, const char *symname) { + const umode_t mode = S_IFLNK|S_IRWXUGO; struct inode *inode; int error = -ENOSPC; - inode = hugetlbfs_get_inode(dir->i_sb, dir, S_IFLNK|S_IRWXUGO, 0); + inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode, 0); if (inode) { int l = strlen(symname)+1; error = page_symlink(inode, symname, l); @@ -1566,6 +1568,7 @@ static struct file_system_type hugetlbfs_fs_type = { .init_fs_context = hugetlbfs_init_fs_context, .parameters = hugetlb_fs_parameters, .kill_sb = kill_litter_super, + .fs_flags = FS_ALLOW_IDMAP, }; static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE]; @@ -1619,7 +1622,9 @@ struct file *hugetlb_file_setup(const char *name, size_t size, } file = ERR_PTR(-ENOSPC); - inode = hugetlbfs_get_inode(mnt->mnt_sb, NULL, S_IFREG | S_IRWXUGO, 0); + /* hugetlbfs_vfsmount[] mounts do not use idmapped mounts. */ + inode = hugetlbfs_get_inode(mnt->mnt_sb, &nop_mnt_idmap, NULL, + S_IFREG | S_IRWXUGO, 0); if (!inode) goto out; if (creat_flags == HUGETLB_SHMFS_INODE) diff --git a/fs/inode.c b/fs/inode.c index 91048c4c9c9e..5c8a5250d0ac 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -588,7 +588,8 @@ void dump_mapping(const struct address_space *mapping) } dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias); - if (get_kernel_nofault(dentry, dentry_ptr)) { + if (get_kernel_nofault(dentry, dentry_ptr) || + !dentry.d_parent || !dentry.d_name.name) { pr_warn("aops:%ps ino:%lx invalid dentry:%px\n", a_ops, ino, dentry_ptr); return; @@ -2285,7 +2286,7 @@ void __init inode_init(void) sizeof(struct inode), 0, (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC| - SLAB_MEM_SPREAD|SLAB_ACCOUNT), + SLAB_ACCOUNT), init_once); /* Hash may have been set up in inode_init_early */ @@ -2509,7 +2510,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode) { struct timespec64 now = current_time(inode); - inode_set_ctime(inode, now.tv_sec, now.tv_nsec); + inode_set_ctime_to_ts(inode, now); return now; } EXPORT_SYMBOL(inode_set_ctime_current); diff --git a/fs/jfs/super.c b/fs/jfs/super.c index 8d8e556bd610..73f09a762b79 100644 --- a/fs/jfs/super.c +++ b/fs/jfs/super.c @@ -932,7 +932,7 @@ static int __init init_jfs_fs(void) jfs_inode_cachep = kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info), - 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT, + 0, SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT, offsetof(struct jfs_inode_info, i_inline_all), sizeof_field(struct jfs_inode_info, i_inline_all), init_once); diff --git a/fs/libfs.c b/fs/libfs.c index eec6031b0155..78c71a9e2e18 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -240,17 +240,22 @@ const struct inode_operations simple_dir_inode_operations = { }; EXPORT_SYMBOL(simple_dir_inode_operations); -static void offset_set(struct dentry *dentry, u32 offset) +/* 0 is '.', 1 is '..', so always start with offset 2 or more */ +enum { + DIR_OFFSET_MIN = 2, +}; + +static void offset_set(struct dentry *dentry, long offset) { - dentry->d_fsdata = (void *)((uintptr_t)(offset)); + dentry->d_fsdata = (void *)offset; } -static u32 dentry2offset(struct dentry *dentry) +static long dentry2offset(struct dentry *dentry) { - return (u32)((uintptr_t)(dentry->d_fsdata)); + return (long)dentry->d_fsdata; } -static struct lock_class_key simple_offset_xa_lock; +static struct lock_class_key simple_offset_lock_class; /** * simple_offset_init - initialize an offset_ctx @@ -259,11 +264,9 @@ static struct lock_class_key simple_offset_xa_lock; */ void simple_offset_init(struct offset_ctx *octx) { - xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1); - lockdep_set_class(&octx->xa.xa_lock, &simple_offset_xa_lock); - - /* 0 is '.', 1 is '..', so always start with offset 2 */ - octx->next_offset = 2; + mt_init_flags(&octx->mt, MT_FLAGS_ALLOC_RANGE); + lockdep_set_class(&octx->mt.ma_lock, &simple_offset_lock_class); + octx->next_offset = DIR_OFFSET_MIN; } /** @@ -271,20 +274,19 @@ void simple_offset_init(struct offset_ctx *octx) * @octx: directory offset ctx to be updated * @dentry: new dentry being added * - * Returns zero on success. @so_ctx and the dentry offset are updated. + * Returns zero on success. @octx and the dentry's offset are updated. * Otherwise, a negative errno value is returned. */ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) { - static const struct xa_limit limit = XA_LIMIT(2, U32_MAX); - u32 offset; + unsigned long offset; int ret; if (dentry2offset(dentry) != 0) return -EBUSY; - ret = xa_alloc_cyclic(&octx->xa, &offset, dentry, limit, - &octx->next_offset, GFP_KERNEL); + ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN, + LONG_MAX, &octx->next_offset, GFP_KERNEL); if (ret < 0) return ret; @@ -300,17 +302,49 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) */ void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry) { - u32 offset; + long offset; offset = dentry2offset(dentry); if (offset == 0) return; - xa_erase(&octx->xa, offset); + mtree_erase(&octx->mt, offset); offset_set(dentry, 0); } /** + * simple_offset_empty - Check if a dentry can be unlinked + * @dentry: dentry to be tested + * + * Returns 0 if @dentry is a non-empty directory; otherwise returns 1. + */ +int simple_offset_empty(struct dentry *dentry) +{ + struct inode *inode = d_inode(dentry); + struct offset_ctx *octx; + struct dentry *child; + unsigned long index; + int ret = 1; + + if (!inode || !S_ISDIR(inode->i_mode)) + return ret; + + index = DIR_OFFSET_MIN; + octx = inode->i_op->get_offset_ctx(inode); + mt_for_each(&octx->mt, child, index, LONG_MAX) { + spin_lock(&child->d_lock); + if (simple_positive(child)) { + spin_unlock(&child->d_lock); + ret = 0; + break; + } + spin_unlock(&child->d_lock); + } + + return ret; +} + +/** * simple_offset_rename_exchange - exchange rename with directory offsets * @old_dir: parent of dentry being moved * @old_dentry: dentry being moved @@ -327,8 +361,8 @@ int simple_offset_rename_exchange(struct inode *old_dir, { struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir); struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir); - u32 old_index = dentry2offset(old_dentry); - u32 new_index = dentry2offset(new_dentry); + long old_index = dentry2offset(old_dentry); + long new_index = dentry2offset(new_dentry); int ret; simple_offset_remove(old_ctx, old_dentry); @@ -354,9 +388,9 @@ int simple_offset_rename_exchange(struct inode *old_dir, out_restore: offset_set(old_dentry, old_index); - xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL); + mtree_store(&old_ctx->mt, old_index, old_dentry, GFP_KERNEL); offset_set(new_dentry, new_index); - xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL); + mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL); return ret; } @@ -369,7 +403,7 @@ out_restore: */ void simple_offset_destroy(struct offset_ctx *octx) { - xa_destroy(&octx->xa); + mtree_destroy(&octx->mt); } /** @@ -399,15 +433,16 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) /* In this case, ->private_data is protected by f_pos_lock */ file->private_data = NULL; - return vfs_setpos(file, offset, U32_MAX); + return vfs_setpos(file, offset, LONG_MAX); } -static struct dentry *offset_find_next(struct xa_state *xas) +static struct dentry *offset_find_next(struct offset_ctx *octx, loff_t offset) { + MA_STATE(mas, &octx->mt, offset, offset); struct dentry *child, *found = NULL; rcu_read_lock(); - child = xas_next_entry(xas, U32_MAX); + child = mas_find(&mas, LONG_MAX); if (!child) goto out; spin_lock(&child->d_lock); @@ -421,8 +456,8 @@ out: static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) { - u32 offset = dentry2offset(dentry); struct inode *inode = d_inode(dentry); + long offset = dentry2offset(dentry); return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset, inode->i_ino, fs_umode_to_dtype(inode->i_mode)); @@ -430,12 +465,11 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) { - struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); - XA_STATE(xas, &so_ctx->xa, ctx->pos); + struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); struct dentry *dentry; while (true) { - dentry = offset_find_next(&xas); + dentry = offset_find_next(octx, ctx->pos); if (!dentry) return ERR_PTR(-ENOENT); @@ -444,8 +478,8 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) break; } + ctx->pos = dentry2offset(dentry) + 1; dput(dentry); - ctx->pos = xas.xa_index + 1; } return NULL; } @@ -481,7 +515,7 @@ static int offset_readdir(struct file *file, struct dir_context *ctx) return 0; /* In this case, ->private_data is protected by f_pos_lock */ - if (ctx->pos == 2) + if (ctx->pos == DIR_OFFSET_MIN) file->private_data = NULL; else if (file->private_data == ERR_PTR(-ENOENT)) return 0; @@ -1704,16 +1738,28 @@ bool is_empty_dir_inode(struct inode *inode) static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, const char *str, const struct qstr *name) { - const struct dentry *parent = READ_ONCE(dentry->d_parent); - const struct inode *dir = READ_ONCE(parent->d_inode); - const struct super_block *sb = dentry->d_sb; - const struct unicode_map *um = sb->s_encoding; - struct qstr qstr = QSTR_INIT(str, len); + const struct dentry *parent; + const struct inode *dir; char strbuf[DNAME_INLINE_LEN]; - int ret; + struct qstr qstr; + /* + * Attempt a case-sensitive match first. It is cheaper and + * should cover most lookups, including all the sane + * applications that expect a case-sensitive filesystem. + * + * This comparison is safe under RCU because the caller + * guarantees the consistency between str and len. See + * __d_lookup_rcu_op_compare() for details. + */ + if (len == name->len && !memcmp(str, name->name, len)) + return 0; + + parent = READ_ONCE(dentry->d_parent); + dir = READ_ONCE(parent->d_inode); if (!dir || !IS_CASEFOLDED(dir)) - goto fallback; + return 1; + /* * If the dentry name is stored in-line, then it may be concurrently * modified by a rename. If this happens, the VFS will eventually retry @@ -1724,20 +1770,14 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, if (len <= DNAME_INLINE_LEN - 1) { memcpy(strbuf, str, len); strbuf[len] = 0; - qstr.name = strbuf; + str = strbuf; /* prevent compiler from optimizing out the temporary buffer */ barrier(); } - ret = utf8_strncasecmp(um, name, &qstr); - if (ret >= 0) - return ret; + qstr.len = len; + qstr.name = str; - if (sb_has_strict_encoding(sb)) - return -EINVAL; -fallback: - if (len != name->len) - return 1; - return !!memcmp(str, name->name, len); + return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr); } /** @@ -1752,7 +1792,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) const struct inode *dir = READ_ONCE(dentry->d_inode); struct super_block *sb = dentry->d_sb; const struct unicode_map *um = sb->s_encoding; - int ret = 0; + int ret; if (!dir || !IS_CASEFOLDED(dir)) return 0; @@ -1766,73 +1806,45 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) static const struct dentry_operations generic_ci_dentry_ops = { .d_hash = generic_ci_d_hash, .d_compare = generic_ci_d_compare, -}; -#endif - #ifdef CONFIG_FS_ENCRYPTION -static const struct dentry_operations generic_encrypted_dentry_ops = { .d_revalidate = fscrypt_d_revalidate, +#endif }; #endif -#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE) -static const struct dentry_operations generic_encrypted_ci_dentry_ops = { - .d_hash = generic_ci_d_hash, - .d_compare = generic_ci_d_compare, +#ifdef CONFIG_FS_ENCRYPTION +static const struct dentry_operations generic_encrypted_dentry_ops = { .d_revalidate = fscrypt_d_revalidate, }; #endif /** - * generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry - * @dentry: dentry to set ops on + * generic_set_sb_d_ops - helper for choosing the set of + * filesystem-wide dentry operations for the enabled features + * @sb: superblock to be configured * - * Casefolded directories need d_hash and d_compare set, so that the dentries - * contained in them are handled case-insensitively. Note that these operations - * are needed on the parent directory rather than on the dentries in it, and - * while the casefolding flag can be toggled on and off on an empty directory, - * dentry_operations can't be changed later. As a result, if the filesystem has - * casefolding support enabled at all, we have to give all dentries the - * casefolding operations even if their inode doesn't have the casefolding flag - * currently (and thus the casefolding ops would be no-ops for now). - * - * Encryption works differently in that the only dentry operation it needs is - * d_revalidate, which it only needs on dentries that have the no-key name flag. - * The no-key flag can't be set "later", so we don't have to worry about that. - * - * Finally, to maximize compatibility with overlayfs (which isn't compatible - * with certain dentry operations) and to avoid taking an unnecessary - * performance hit, we use custom dentry_operations for each possible - * combination rather than always installing all operations. + * Filesystems supporting casefolding and/or fscrypt can call this + * helper at mount-time to configure sb->s_d_op to best set of dentry + * operations required for the enabled features. The helper must be + * called after these have been configured, but before the root dentry + * is created. */ -void generic_set_encrypted_ci_d_ops(struct dentry *dentry) +void generic_set_sb_d_ops(struct super_block *sb) { -#ifdef CONFIG_FS_ENCRYPTION - bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME; -#endif #if IS_ENABLED(CONFIG_UNICODE) - bool needs_ci_ops = dentry->d_sb->s_encoding; -#endif -#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE) - if (needs_encrypt_ops && needs_ci_ops) { - d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops); + if (sb->s_encoding) { + sb->s_d_op = &generic_ci_dentry_ops; return; } #endif #ifdef CONFIG_FS_ENCRYPTION - if (needs_encrypt_ops) { - d_set_d_op(dentry, &generic_encrypted_dentry_ops); - return; - } -#endif -#if IS_ENABLED(CONFIG_UNICODE) - if (needs_ci_ops) { - d_set_d_op(dentry, &generic_ci_dentry_ops); + if (sb->s_cop) { + sb->s_d_op = &generic_encrypted_dentry_ops; return; } #endif } -EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops); +EXPORT_SYMBOL(generic_set_sb_d_ops); /** * inode_maybe_inc_iversion - increments i_version diff --git a/fs/mbcache.c b/fs/mbcache.c index 82aa7a35db26..e60a840999aa 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -426,9 +426,7 @@ EXPORT_SYMBOL(mb_cache_destroy); static int __init mbcache_init(void) { - mb_entry_cache = kmem_cache_create("mbcache", - sizeof(struct mb_cache_entry), 0, - SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL); + mb_entry_cache = KMEM_CACHE(mb_cache_entry, SLAB_RECLAIM_ACCOUNT); if (!mb_entry_cache) return -ENOMEM; return 0; diff --git a/fs/minix/inode.c b/fs/minix/inode.c index 73f37f298087..7cbd2b9f4d11 100644 --- a/fs/minix/inode.c +++ b/fs/minix/inode.c @@ -87,7 +87,7 @@ static int __init init_inodecache(void) minix_inode_cachep = kmem_cache_create("minix_inode_cache", sizeof(struct minix_inode_info), 0, (SLAB_RECLAIM_ACCOUNT| - SLAB_MEM_SPREAD|SLAB_ACCOUNT), + SLAB_ACCOUNT), init_once); if (minix_inode_cachep == NULL) return -ENOMEM; diff --git a/fs/mnt_idmapping.c b/fs/mnt_idmapping.c index 64c5205e2b5e..3c60f1eaca61 100644 --- a/fs/mnt_idmapping.c +++ b/fs/mnt_idmapping.c @@ -214,7 +214,7 @@ static int copy_mnt_idmap(struct uid_gid_map *map_from, * anything at all. */ if (nr_extents == 0) - return 0; + return -EINVAL; /* * Here we know that nr_extents is greater than zero which means diff --git a/fs/namei.c b/fs/namei.c index 9342fa6a38c2..d0c4a3e9278e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2680,10 +2680,8 @@ static int lookup_one_common(struct mnt_idmap *idmap, if (!len) return -EACCES; - if (unlikely(name[0] == '.')) { - if (len < 2 || (len == 2 && name[1] == '.')) - return -EACCES; - } + if (is_dot_dotdot(name, len)) + return -EACCES; while (len--) { unsigned int c = *(const unsigned char *)name++; diff --git a/fs/ntfs3/namei.c b/fs/ntfs3/namei.c index cae41db0aaa7..084d19d78397 100644 --- a/fs/ntfs3/namei.c +++ b/fs/ntfs3/namei.c @@ -431,7 +431,7 @@ static int ntfs_atomic_open(struct inode *dir, struct dentry *dentry, * fnd contains tree's path to insert to. * If fnd is not NULL then dir is locked. */ - inode = ntfs_create_inode(mnt_idmap(file->f_path.mnt), dir, dentry, uni, + inode = ntfs_create_inode(file_mnt_idmap(file), dir, dentry, uni, mode, 0, NULL, 0, fnd); err = IS_ERR(inode) ? PTR_ERR(inode) : finish_open(file, dentry, ntfs_file_open); diff --git a/fs/openpromfs/inode.c b/fs/openpromfs/inode.c index c4b65a6d41cc..4a0779e3ef79 100644 --- a/fs/openpromfs/inode.c +++ b/fs/openpromfs/inode.c @@ -446,7 +446,7 @@ static int __init init_openprom_fs(void) sizeof(struct op_inode_info), 0, (SLAB_RECLAIM_ACCOUNT | - SLAB_MEM_SPREAD | SLAB_ACCOUNT), + SLAB_ACCOUNT), op_inode_init_once); if (!op_inode_cachep) return -ENOMEM; diff --git a/fs/overlayfs/params.c b/fs/overlayfs/params.c index 112b4b12f825..36dcc530ac28 100644 --- a/fs/overlayfs/params.c +++ b/fs/overlayfs/params.c @@ -280,12 +280,20 @@ static int ovl_mount_dir_check(struct fs_context *fc, const struct path *path, { struct ovl_fs_context *ctx = fc->fs_private; - if (ovl_dentry_weird(path->dentry)) - return invalfc(fc, "filesystem on %s not supported", name); - if (!d_is_dir(path->dentry)) return invalfc(fc, "%s is not a directory", name); + /* + * Root dentries of case-insensitive capable filesystems might + * not have the dentry operations set, but still be incompatible + * with overlayfs. Check explicitly to prevent post-mount + * failures. + */ + if (sb_has_encoding(path->mnt->mnt_sb)) + return invalfc(fc, "case-insensitive capable filesystem on %s not supported", name); + + if (ovl_dentry_weird(path->dentry)) + return invalfc(fc, "filesystem on %s not supported", name); /* * Check whether upper path is read-only here to report failures diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 2eef6c70b2ae..36d4b8b1f784 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -28,41 +28,38 @@ MODULE_LICENSE("GPL"); struct ovl_dir_cache; -static struct dentry *ovl_d_real(struct dentry *dentry, - const struct inode *inode) +static struct dentry *ovl_d_real(struct dentry *dentry, enum d_real_type type) { - struct dentry *real = NULL, *lower; + struct dentry *upper, *lower; int err; - /* - * vfs is only expected to call d_real() with NULL from d_real_inode() - * and with overlay inode from file_dentry() on an overlay file. - * - * TODO: remove @inode argument from d_real() API, remove code in this - * function that deals with non-NULL @inode and remove d_real() call - * from file_dentry(). - */ - if (inode && d_inode(dentry) == inode) - return dentry; - else if (inode) + switch (type) { + case D_REAL_DATA: + case D_REAL_METADATA: + break; + default: goto bug; + } if (!d_is_reg(dentry)) { /* d_real_inode() is only relevant for regular files */ return dentry; } - real = ovl_dentry_upper(dentry); - if (real && (inode == d_inode(real))) - return real; + upper = ovl_dentry_upper(dentry); + if (upper && (type == D_REAL_METADATA || + ovl_has_upperdata(d_inode(dentry)))) + return upper; - if (real && !inode && ovl_has_upperdata(d_inode(dentry))) - return real; + if (type == D_REAL_METADATA) { + lower = ovl_dentry_lower(dentry); + goto real_lower; + } /* - * Best effort lazy lookup of lowerdata for !inode case to return + * Best effort lazy lookup of lowerdata for D_REAL_DATA case to return * the real lowerdata dentry. The only current caller of d_real() with - * NULL inode is d_real_inode() from trace_uprobe and this caller is + * D_REAL_DATA is d_real_inode() from trace_uprobe and this caller is * likely going to be followed reading from the file, before placing * uprobes on offset within the file, so lowerdata should be available * when setting the uprobe. @@ -73,18 +70,13 @@ static struct dentry *ovl_d_real(struct dentry *dentry, lower = ovl_dentry_lowerdata(dentry); if (!lower) goto bug; - real = lower; - /* Handle recursion */ - real = d_real(real, inode); +real_lower: + /* Handle recursion into stacked lower fs */ + return d_real(lower, type); - if (!inode || inode == d_inode(real)) - return real; bug: - WARN(1, "%s(%pd4, %s:%lu): real dentry (%p/%lu) not found\n", - __func__, dentry, inode ? inode->i_sb->s_id : "NULL", - inode ? inode->i_ino : 0, real, - real && d_inode(real) ? d_inode(real)->i_ino : 0); + WARN(1, "%s(%pd4, %d): real dentry not found\n", __func__, dentry, type); return dentry; } diff --git a/fs/pipe.c b/fs/pipe.c index f1adbfe743d4..50c8a8596b52 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -76,18 +76,20 @@ static unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR; * -- Manfred Spraul <manfred@colorfullife.com> 2002-05-09 */ -static void pipe_lock_nested(struct pipe_inode_info *pipe, int subclass) +#define cmp_int(l, r) ((l > r) - (l < r)) + +#ifdef CONFIG_PROVE_LOCKING +static int pipe_lock_cmp_fn(const struct lockdep_map *a, + const struct lockdep_map *b) { - if (pipe->files) - mutex_lock_nested(&pipe->mutex, subclass); + return cmp_int((unsigned long) a, (unsigned long) b); } +#endif void pipe_lock(struct pipe_inode_info *pipe) { - /* - * pipe_lock() nests non-pipe inode locks (for writing to a file) - */ - pipe_lock_nested(pipe, I_MUTEX_PARENT); + if (pipe->files) + mutex_lock(&pipe->mutex); } EXPORT_SYMBOL(pipe_lock); @@ -98,28 +100,16 @@ void pipe_unlock(struct pipe_inode_info *pipe) } EXPORT_SYMBOL(pipe_unlock); -static inline void __pipe_lock(struct pipe_inode_info *pipe) -{ - mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT); -} - -static inline void __pipe_unlock(struct pipe_inode_info *pipe) -{ - mutex_unlock(&pipe->mutex); -} - void pipe_double_lock(struct pipe_inode_info *pipe1, struct pipe_inode_info *pipe2) { BUG_ON(pipe1 == pipe2); - if (pipe1 < pipe2) { - pipe_lock_nested(pipe1, I_MUTEX_PARENT); - pipe_lock_nested(pipe2, I_MUTEX_CHILD); - } else { - pipe_lock_nested(pipe2, I_MUTEX_PARENT); - pipe_lock_nested(pipe1, I_MUTEX_CHILD); - } + if (pipe1 > pipe2) + swap(pipe1, pipe2); + + pipe_lock(pipe1); + pipe_lock(pipe2); } static void anon_pipe_buf_release(struct pipe_inode_info *pipe, @@ -271,7 +261,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) return 0; ret = 0; - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); /* * We only wake up writers if the pipe was full when we started @@ -368,7 +358,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) ret = -EAGAIN; break; } - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); /* * We only get here if we didn't actually read anything. @@ -400,13 +390,13 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to) if (wait_event_interruptible_exclusive(pipe->rd_wait, pipe_readable(pipe)) < 0) return -ERESTARTSYS; - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage); wake_next_reader = true; } if (pipe_empty(pipe->head, pipe->tail)) wake_next_reader = false; - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); if (was_full) wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM); @@ -462,7 +452,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) if (unlikely(total_len == 0)) return 0; - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); if (!pipe->readers) { send_sig(SIGPIPE, current, 0); @@ -582,19 +572,19 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) * after waiting we need to re-check whether the pipe * become empty while we dropped the lock. */ - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); if (was_empty) wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM); kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe)); - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); was_empty = pipe_empty(pipe->head, pipe->tail); wake_next_writer = true; } out: if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) wake_next_writer = false; - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); /* * If we do do a wakeup event, we do a 'sync' wakeup, because we @@ -629,7 +619,7 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) switch (cmd) { case FIONREAD: - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); count = 0; head = pipe->head; tail = pipe->tail; @@ -639,16 +629,16 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) count += pipe->bufs[tail & mask].len; tail++; } - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); return put_user(count, (int __user *)arg); #ifdef CONFIG_WATCH_QUEUE case IOC_WATCH_QUEUE_SET_SIZE: { int ret; - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); ret = watch_queue_set_size(pipe, arg); - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); return ret; } @@ -734,7 +724,7 @@ pipe_release(struct inode *inode, struct file *file) { struct pipe_inode_info *pipe = file->private_data; - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); if (file->f_mode & FMODE_READ) pipe->readers--; if (file->f_mode & FMODE_WRITE) @@ -747,7 +737,7 @@ pipe_release(struct inode *inode, struct file *file) kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); put_pipe_info(inode, pipe); return 0; @@ -759,7 +749,7 @@ pipe_fasync(int fd, struct file *filp, int on) struct pipe_inode_info *pipe = filp->private_data; int retval = 0; - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); if (filp->f_mode & FMODE_READ) retval = fasync_helper(fd, filp, on, &pipe->fasync_readers); if ((filp->f_mode & FMODE_WRITE) && retval >= 0) { @@ -768,7 +758,7 @@ pipe_fasync(int fd, struct file *filp, int on) /* this can happen only if on == T */ fasync_helper(-1, filp, 0, &pipe->fasync_readers); } - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); return retval; } @@ -834,6 +824,7 @@ struct pipe_inode_info *alloc_pipe_info(void) pipe->nr_accounted = pipe_bufs; pipe->user = user; mutex_init(&pipe->mutex); + lock_set_cmp_fn(&pipe->mutex, pipe_lock_cmp_fn, NULL); return pipe; } @@ -1144,7 +1135,7 @@ static int fifo_open(struct inode *inode, struct file *filp) filp->private_data = pipe; /* OK, we have a pipe and it's pinned down */ - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); /* We can only do regular read/write on fifos */ stream_open(inode, filp); @@ -1214,7 +1205,7 @@ static int fifo_open(struct inode *inode, struct file *filp) } /* Ok! */ - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); return 0; err_rd: @@ -1230,7 +1221,7 @@ err_wr: goto err; err: - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); put_pipe_info(inode, pipe); return ret; @@ -1411,7 +1402,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned int arg) if (!pipe) return -EBADF; - __pipe_lock(pipe); + mutex_lock(&pipe->mutex); switch (cmd) { case F_SETPIPE_SZ: @@ -1425,7 +1416,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned int arg) break; } - __pipe_unlock(pipe); + mutex_unlock(&pipe->mutex); return ret; } diff --git a/fs/proc/inode.c b/fs/proc/inode.c index 05350f3c2812..dcd513dccf55 100644 --- a/fs/proc/inode.c +++ b/fs/proc/inode.c @@ -92,7 +92,7 @@ void __init proc_init_kmemcache(void) proc_inode_cachep = kmem_cache_create("proc_inode_cache", sizeof(struct proc_inode), 0, (SLAB_RECLAIM_ACCOUNT| - SLAB_MEM_SPREAD|SLAB_ACCOUNT| + SLAB_ACCOUNT| SLAB_PANIC), init_once); pde_opener_cache = diff --git a/fs/qnx4/inode.c b/fs/qnx4/inode.c index 6eb9bb369b57..7b5711f76709 100644 --- a/fs/qnx4/inode.c +++ b/fs/qnx4/inode.c @@ -21,6 +21,7 @@ #include <linux/buffer_head.h> #include <linux/writeback.h> #include <linux/statfs.h> +#include <linux/fs_context.h> #include "qnx4.h" #define QNX4_VERSION 4 @@ -30,28 +31,33 @@ static const struct super_operations qnx4_sops; static struct inode *qnx4_alloc_inode(struct super_block *sb); static void qnx4_free_inode(struct inode *inode); -static int qnx4_remount(struct super_block *sb, int *flags, char *data); static int qnx4_statfs(struct dentry *, struct kstatfs *); +static int qnx4_get_tree(struct fs_context *fc); static const struct super_operations qnx4_sops = { .alloc_inode = qnx4_alloc_inode, .free_inode = qnx4_free_inode, .statfs = qnx4_statfs, - .remount_fs = qnx4_remount, }; -static int qnx4_remount(struct super_block *sb, int *flags, char *data) +static int qnx4_reconfigure(struct fs_context *fc) { + struct super_block *sb = fc->root->d_sb; struct qnx4_sb_info *qs; sync_filesystem(sb); qs = qnx4_sb(sb); qs->Version = QNX4_VERSION; - *flags |= SB_RDONLY; + fc->sb_flags |= SB_RDONLY; return 0; } +static const struct fs_context_operations qnx4_context_opts = { + .get_tree = qnx4_get_tree, + .reconfigure = qnx4_reconfigure, +}; + static int qnx4_get_block( struct inode *inode, sector_t iblock, struct buffer_head *bh, int create ) { unsigned long phys; @@ -183,12 +189,13 @@ static const char *qnx4_checkroot(struct super_block *sb, return "bitmap file not found."; } -static int qnx4_fill_super(struct super_block *s, void *data, int silent) +static int qnx4_fill_super(struct super_block *s, struct fs_context *fc) { struct buffer_head *bh; struct inode *root; const char *errmsg; struct qnx4_sb_info *qs; + int silent = fc->sb_flags & SB_SILENT; qs = kzalloc(sizeof(struct qnx4_sb_info), GFP_KERNEL); if (!qs) @@ -216,7 +223,7 @@ static int qnx4_fill_super(struct super_block *s, void *data, int silent) errmsg = qnx4_checkroot(s, (struct qnx4_super_block *) bh->b_data); brelse(bh); if (errmsg != NULL) { - if (!silent) + if (!silent) printk(KERN_ERR "qnx4: %s\n", errmsg); return -EINVAL; } @@ -235,6 +242,18 @@ static int qnx4_fill_super(struct super_block *s, void *data, int silent) return 0; } +static int qnx4_get_tree(struct fs_context *fc) +{ + return get_tree_bdev(fc, qnx4_fill_super); +} + +static int qnx4_init_fs_context(struct fs_context *fc) +{ + fc->ops = &qnx4_context_opts; + + return 0; +} + static void qnx4_kill_sb(struct super_block *sb) { struct qnx4_sb_info *qs = qnx4_sb(sb); @@ -376,18 +395,12 @@ static void destroy_inodecache(void) kmem_cache_destroy(qnx4_inode_cachep); } -static struct dentry *qnx4_mount(struct file_system_type *fs_type, - int flags, const char *dev_name, void *data) -{ - return mount_bdev(fs_type, flags, dev_name, data, qnx4_fill_super); -} - static struct file_system_type qnx4_fs_type = { - .owner = THIS_MODULE, - .name = "qnx4", - .mount = qnx4_mount, - .kill_sb = qnx4_kill_sb, - .fs_flags = FS_REQUIRES_DEV, + .owner = THIS_MODULE, + .name = "qnx4", + .kill_sb = qnx4_kill_sb, + .fs_flags = FS_REQUIRES_DEV, + .init_fs_context = qnx4_init_fs_context, }; MODULE_ALIAS_FS("qnx4"); diff --git a/fs/qnx6/inode.c b/fs/qnx6/inode.c index a286c545717f..405913f4faff 100644 --- a/fs/qnx6/inode.c +++ b/fs/qnx6/inode.c @@ -615,7 +615,7 @@ static int init_inodecache(void) qnx6_inode_cachep = kmem_cache_create("qnx6_inode_cache", sizeof(struct qnx6_inode_info), 0, (SLAB_RECLAIM_ACCOUNT| - SLAB_MEM_SPREAD|SLAB_ACCOUNT), + SLAB_ACCOUNT), init_once); if (!qnx6_inode_cachep) return -ENOMEM; diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 67b5510beded..2cc469d481a2 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -670,7 +670,6 @@ static int __init init_inodecache(void) sizeof(struct reiserfs_inode_info), 0, (SLAB_RECLAIM_ACCOUNT| - SLAB_MEM_SPREAD| SLAB_ACCOUNT), init_once); if (reiserfs_inode_cachep == NULL) diff --git a/fs/romfs/super.c b/fs/romfs/super.c index 545ad44f96b8..85641c3f5782 100644 --- a/fs/romfs/super.c +++ b/fs/romfs/super.c @@ -630,8 +630,8 @@ static int __init init_romfs_fs(void) romfs_inode_cachep = kmem_cache_create("romfs_i", sizeof(struct romfs_inode_info), 0, - SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD | - SLAB_ACCOUNT, romfs_i_init_once); + SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT, + romfs_i_init_once); if (!romfs_inode_cachep) { pr_err("Failed to initialise inode cache\n"); diff --git a/fs/select.c b/fs/select.c index 0ee55af1a55c..9515c3fa1a03 100644 --- a/fs/select.c +++ b/fs/select.c @@ -476,7 +476,7 @@ static inline void wait_key_set(poll_table *wait, unsigned long in, wait->_key |= POLLOUT_SET; } -static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) +static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) { ktime_t expire, *to = NULL; struct poll_wqueues table; @@ -839,7 +839,7 @@ SYSCALL_DEFINE1(old_select, struct sel_arg_struct __user *, arg) struct poll_list { struct poll_list *next; - int len; + unsigned int len; struct pollfd entries[]; }; @@ -975,14 +975,15 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds, struct timespec64 *end_time) { struct poll_wqueues table; - int err = -EFAULT, fdcount, len; + int err = -EFAULT, fdcount; /* Allocate small arguments on the stack to save memory and be faster - use long to make sure the buffer is aligned properly on 64 bit archs to avoid unaligned access */ long stack_pps[POLL_STACK_ALLOC/sizeof(long)]; struct poll_list *const head = (struct poll_list *)stack_pps; struct poll_list *walk = head; - unsigned long todo = nfds; + unsigned int todo = nfds; + unsigned int len; if (nfds > rlimit(RLIMIT_NOFILE)) return -EINVAL; @@ -998,9 +999,9 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds, sizeof(struct pollfd) * walk->len)) goto out_fds; - todo -= walk->len; - if (!todo) + if (walk->len >= todo) break; + todo -= walk->len; len = min(todo, POLLFD_PER_PAGE); walk = walk->next = kmalloc(struct_size(walk, entries, len), @@ -1020,7 +1021,7 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds, for (walk = head; walk; walk = walk->next) { struct pollfd *fds = walk->entries; - int j; + unsigned int j; for (j = walk->len; j; fds++, ufds++, j--) unsafe_put_user(fds->revents, &ufds->revents, Efault); diff --git a/fs/sysv/inode.c b/fs/sysv/inode.c index 5a915b2e68f5..76bc2d5e75a9 100644 --- a/fs/sysv/inode.c +++ b/fs/sysv/inode.c @@ -336,7 +336,7 @@ int __init sysv_init_icache(void) { sysv_inode_cachep = kmem_cache_create("sysv_inode_cache", sizeof(struct sysv_inode_info), 0, - SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT, + SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT, init_once); if (!sysv_inode_cachep) return -ENOMEM; diff --git a/fs/sysv/itree.c b/fs/sysv/itree.c index 410ab2a44d2f..19bcb51a2203 100644 --- a/fs/sysv/itree.c +++ b/fs/sysv/itree.c @@ -83,9 +83,6 @@ static inline sysv_zone_t *block_end(struct buffer_head *bh) return (sysv_zone_t*)((char*)bh->b_data + bh->b_size); } -/* - * Requires read_lock(&pointers_lock) or write_lock(&pointers_lock) - */ static Indirect *get_branch(struct inode *inode, int depth, int offsets[], @@ -105,15 +102,18 @@ static Indirect *get_branch(struct inode *inode, bh = sb_bread(sb, block); if (!bh) goto failure; + read_lock(&pointers_lock); if (!verify_chain(chain, p)) goto changed; add_chain(++p, bh, (sysv_zone_t*)bh->b_data + *++offsets); + read_unlock(&pointers_lock); if (!p->key) goto no_block; } return NULL; changed: + read_unlock(&pointers_lock); brelse(bh); *err = -EAGAIN; goto no_block; @@ -219,9 +219,7 @@ static int get_block(struct inode *inode, sector_t iblock, struct buffer_head *b goto out; reread: - read_lock(&pointers_lock); partial = get_branch(inode, depth, offsets, chain, &err); - read_unlock(&pointers_lock); /* Simplest case - block found, no allocation needed */ if (!partial) { @@ -291,9 +289,9 @@ static Indirect *find_shared(struct inode *inode, *top = 0; for (k = depth; k > 1 && !offsets[k-1]; k--) ; + partial = get_branch(inode, k, offsets, chain, &err); write_lock(&pointers_lock); - partial = get_branch(inode, k, offsets, chain, &err); if (!partial) partial = chain + k-1; /* diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c index e413a9cf8ee3..551148de66cd 100644 --- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -205,7 +205,6 @@ static struct dentry *ubifs_lookup(struct inode *dir, struct dentry *dentry, dbg_gen("'%pd' in dir ino %lu", dentry, dir->i_ino); err = fscrypt_prepare_lookup(dir, dentry, &nm); - generic_set_encrypted_ci_d_ops(dentry); if (err == -ENOENT) return d_splice_alias(NULL, dentry); if (err) diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index 09e270d6ed02..304646b03e99 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -2239,6 +2239,7 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent) goto out_umount; } + generic_set_sb_d_ops(sb); sb->s_root = d_make_root(root); if (!sb->s_root) { err = -ENOMEM; |