diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-29 15:54:19 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2016-07-29 15:54:19 -0700 |
commit | a867d7349e94b6409b08629886a819f802377e91 (patch) | |
tree | cf26734d638bbeee4e8f1ec58161933a55b922e2 /include | |
parent | 601f887d6105ddd28dc569a1504595bdf8df8a5b (diff) | |
parent | aeaa4a79ff6a5ed912b7362f206cf8576fca538b (diff) |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns vfs updates from Eric Biederman:
"This tree contains some very long awaited work on generalizing the
user namespace support for mounting filesystems to include filesystems
with a backing store. The real world target is fuse but the goal is
to update the vfs to allow any filesystem to be supported. This
patchset is based on a lot of code review and testing to approach that
goal.
While looking at what is needed to support the fuse filesystem it
became clear that there were things like xattrs for security modules
that needed special treatment. That the resolution of those concerns
would not be fuse specific. That sorting out these general issues
made most sense at the generic level, where the right people could be
drawn into the conversation, and the issues could be solved for
everyone.
At a high level what this patchset does a couple of simple things:
- Add a user namespace owner (s_user_ns) to struct super_block.
- Teach the vfs to handle filesystem uids and gids not mapping into
to kuids and kgids and being reported as INVALID_UID and
INVALID_GID in vfs data structures.
By assigning a user namespace owner filesystems that are mounted with
only user namespace privilege can be detected. This allows security
modules and the like to know which mounts may not be trusted. This
also allows the set of uids and gids that are communicated to the
filesystem to be capped at the set of kuids and kgids that are in the
owning user namespace of the filesystem.
One of the crazier corner casees this handles is the case of inodes
whose i_uid or i_gid are not mapped into the vfs. Most of the code
simply doesn't care but it is easy to confuse the inode writeback path
so no operation that could cause an inode write-back is permitted for
such inodes (aka only reads are allowed).
This set of changes starts out by cleaning up the code paths involved
in user namespace permirted mounts. Then when things are clean enough
adds code that cleanly sets s_user_ns. Then additional restrictions
are added that are possible now that the filesystem superblock
contains owner information.
These changes should not affect anyone in practice, but there are some
parts of these restrictions that are changes in behavior.
- Andy's restriction on suid executables that does not honor the
suid bit when the path is from another mount namespace (think
/proc/[pid]/fd/) or when the filesystem was mounted by a less
privileged user.
- The replacement of the user namespace implicit setting of MNT_NODEV
with implicitly setting SB_I_NODEV on the filesystem superblock
instead.
Using SB_I_NODEV is a stronger form that happens to make this state
user invisible. The user visibility can be managed but it caused
problems when it was introduced from applications reasonably
expecting mount flags to be what they were set to.
There is a little bit of work remaining before it is safe to support
mounting filesystems with backing store in user namespaces, beyond
what is in this set of changes.
- Verifying the mounter has permission to read/write the block device
during mount.
- Teaching the integrity modules IMA and EVM to handle filesystems
mounted with only user namespace root and to reduce trust in their
security xattrs accordingly.
- Capturing the mounters credentials and using that for permission
checks in d_automount and the like. (Given that overlayfs already
does this, and we need the work in d_automount it make sense to
generalize this case).
Furthermore there are a few changes that are on the wishlist:
- Get all filesystems supporting posix acls using the generic posix
acls so that posix_acl_fix_xattr_from_user and
posix_acl_fix_xattr_to_user may be removed. [Maintainability]
- Reducing the permission checks in places such as remount to allow
the superblock owner to perform them.
- Allowing the superblock owner to chown files with unmapped uids and
gids to something that is mapped so the files may be treated
normally.
I am not considering even obvious relaxations of permission checks
until it is clear there are no more corner cases that need to be
locked down and handled generically.
Many thanks to Seth Forshee who kept this code alive, and putting up
with me rewriting substantial portions of what he did to handle more
corner cases, and for his diligent testing and reviewing of my
changes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
fs: Call d_automount with the filesystems creds
fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
evm: Translate user/group ids relative to s_user_ns when computing HMAC
dquot: For now explicitly don't support filesystems outside of init_user_ns
quota: Handle quota data stored in s_user_ns in quota_setxquota
quota: Ensure qids map to the filesystem
vfs: Don't create inodes with a uid or gid unknown to the vfs
vfs: Don't modify inodes with a uid or gid unknown to the vfs
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Check for invalid i_uid in may_follow_link()
vfs: Verify acls are valid within superblock's s_user_ns.
userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
fs: Refuse uid/gid changes which don't map into s_user_ns
selinux: Add support for unprivileged mounts from user namespaces
Smack: Handle labels consistently in untrusted mounts
Smack: Add support for unprivileged mounts from user namespaces
fs: Treat foreign mounts as nosuid
fs: Limit file caps to the user namespace of the super block
userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
userns: Remove implicit MNT_NODEV fragility.
...
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/fs.h | 79 | ||||
-rw-r--r-- | include/linux/mount.h | 1 | ||||
-rw-r--r-- | include/linux/posix_acl.h | 2 | ||||
-rw-r--r-- | include/linux/quota.h | 10 | ||||
-rw-r--r-- | include/linux/uidgid.h | 4 | ||||
-rw-r--r-- | include/linux/user_namespace.h | 6 |
6 files changed, 70 insertions, 32 deletions
diff --git a/include/linux/fs.h b/include/linux/fs.h index f65a6801f609..577365a77b47 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -829,31 +829,6 @@ static inline void i_size_write(struct inode *inode, loff_t i_size) #endif } -/* Helper functions so that in most cases filesystems will - * not need to deal directly with kuid_t and kgid_t and can - * instead deal with the raw numeric values that are stored - * in the filesystem. - */ -static inline uid_t i_uid_read(const struct inode *inode) -{ - return from_kuid(&init_user_ns, inode->i_uid); -} - -static inline gid_t i_gid_read(const struct inode *inode) -{ - return from_kgid(&init_user_ns, inode->i_gid); -} - -static inline void i_uid_write(struct inode *inode, uid_t uid) -{ - inode->i_uid = make_kuid(&init_user_ns, uid); -} - -static inline void i_gid_write(struct inode *inode, gid_t gid) -{ - inode->i_gid = make_kgid(&init_user_ns, gid); -} - static inline unsigned iminor(const struct inode *inode) { return MINOR(inode->i_rdev); @@ -1320,6 +1295,10 @@ struct mm_struct; /* sb->s_iflags */ #define SB_I_CGROUPWB 0x00000001 /* cgroup-aware writeback enabled */ #define SB_I_NOEXEC 0x00000002 /* Ignore executables on this fs */ +#define SB_I_NODEV 0x00000004 /* Ignore devices on this fs */ + +/* sb->s_iflags to limit user namespace mounts */ +#define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */ /* Possible states of 'frozen' field */ enum { @@ -1423,6 +1402,13 @@ struct super_block { struct hlist_head s_pins; /* + * Owning user namespace and default context in which to + * interpret filesystem uids, gids, quotas, device nodes, + * xattrs and security labels. + */ + struct user_namespace *s_user_ns; + + /* * Keep the lru lists last in the structure so they always sit on their * own individual cachelines. */ @@ -1446,6 +1432,31 @@ struct super_block { struct list_head s_inodes_wb; /* writeback inodes */ }; +/* Helper functions so that in most cases filesystems will + * not need to deal directly with kuid_t and kgid_t and can + * instead deal with the raw numeric values that are stored + * in the filesystem. + */ +static inline uid_t i_uid_read(const struct inode *inode) +{ + return from_kuid(inode->i_sb->s_user_ns, inode->i_uid); +} + +static inline gid_t i_gid_read(const struct inode *inode) +{ + return from_kgid(inode->i_sb->s_user_ns, inode->i_gid); +} + +static inline void i_uid_write(struct inode *inode, uid_t uid) +{ + inode->i_uid = make_kuid(inode->i_sb->s_user_ns, uid); +} + +static inline void i_gid_write(struct inode *inode, gid_t gid) +{ + inode->i_gid = make_kgid(inode->i_sb->s_user_ns, gid); +} + extern struct timespec current_fs_time(struct super_block *sb); /* @@ -1588,6 +1599,7 @@ extern int vfs_whiteout(struct inode *, struct dentry *); */ extern void inode_init_owner(struct inode *inode, const struct inode *dir, umode_t mode); +extern bool may_open_dev(const struct path *path); /* * VFS FS_IOC_FIEMAP helper definitions. */ @@ -1858,6 +1870,11 @@ struct super_operations { #define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \ (inode)->i_rdev == WHITEOUT_DEV) +static inline bool HAS_UNMAPPED_ID(struct inode *inode) +{ + return !uid_valid(inode->i_uid) || !gid_valid(inode->i_gid); +} + /* * Inode state bits. Protected by inode->i_lock * @@ -2006,8 +2023,6 @@ struct file_system_type { #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ -#define FS_USERNS_DEV_MOUNT 16 /* A userns mount does not imply MNT_NODEV */ -#define FS_USERNS_VISIBLE 32 /* FS must already be visible */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); @@ -2028,8 +2043,9 @@ struct file_system_type { #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME) -extern struct dentry *mount_ns(struct file_system_type *fs_type, int flags, - void *data, int (*fill_super)(struct super_block *, void *, int)); +extern struct dentry *mount_ns(struct file_system_type *fs_type, + int flags, void *data, void *ns, struct user_namespace *user_ns, + int (*fill_super)(struct super_block *, void *, int)); extern struct dentry *mount_bdev(struct file_system_type *fs_type, int flags, const char *dev_name, void *data, int (*fill_super)(struct super_block *, void *, int)); @@ -2049,6 +2065,11 @@ void deactivate_locked_super(struct super_block *sb); int set_anon_super(struct super_block *s, void *data); int get_anon_bdev(dev_t *); void free_anon_bdev(dev_t); +struct super_block *sget_userns(struct file_system_type *type, + int (*test)(struct super_block *,void *), + int (*set)(struct super_block *,void *), + int flags, struct user_namespace *user_ns, + void *data); struct super_block *sget(struct file_system_type *type, int (*test)(struct super_block *,void *), int (*set)(struct super_block *,void *), diff --git a/include/linux/mount.h b/include/linux/mount.h index f822c3c11377..54a594d49733 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -81,6 +81,7 @@ extern void mntput(struct vfsmount *mnt); extern struct vfsmount *mntget(struct vfsmount *mnt); extern struct vfsmount *mnt_clone_internal(struct path *path); extern int __mnt_is_readonly(struct vfsmount *mnt); +extern bool mnt_may_suid(struct vfsmount *mnt); struct path; extern struct vfsmount *clone_private_mount(struct path *path); diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index c818772d9f9d..d5d3d741f028 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -79,7 +79,7 @@ posix_acl_release(struct posix_acl *acl) extern void posix_acl_init(struct posix_acl *, int); extern struct posix_acl *posix_acl_alloc(int, gfp_t); -extern int posix_acl_valid(const struct posix_acl *); +extern int posix_acl_valid(struct user_namespace *, const struct posix_acl *); extern int posix_acl_permission(struct inode *, const struct posix_acl *, int); extern struct posix_acl *posix_acl_from_mode(umode_t, gfp_t); extern int posix_acl_equiv_mode(const struct posix_acl *, umode_t *); diff --git a/include/linux/quota.h b/include/linux/quota.h index 8486d27cf360..55107a8ff887 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -179,6 +179,16 @@ static inline struct kqid make_kqid_projid(kprojid_t projid) return kqid; } +/** + * qid_has_mapping - Report if a qid maps into a user namespace. + * @ns: The user namespace to see if a value maps into. + * @qid: The kernel internal quota identifier to test. + */ +static inline bool qid_has_mapping(struct user_namespace *ns, struct kqid qid) +{ + return from_kqid(ns, qid) != (qid_t) -1; +} + extern spinlock_t dq_data_lock; diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h index 03835522dfcb..25e9d9216340 100644 --- a/include/linux/uidgid.h +++ b/include/linux/uidgid.h @@ -177,12 +177,12 @@ static inline gid_t from_kgid_munged(struct user_namespace *to, kgid_t kgid) static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid) { - return true; + return uid_valid(uid); } static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid) { - return true; + return gid_valid(gid); } #endif /* CONFIG_USER_NS */ diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 8297e5b341d8..9217169c64cb 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -72,6 +72,7 @@ extern ssize_t proc_projid_map_write(struct file *, const char __user *, size_t, extern ssize_t proc_setgroups_write(struct file *, const char __user *, size_t, loff_t *); extern int proc_setgroups_show(struct seq_file *m, void *v); extern bool userns_may_setgroups(const struct user_namespace *ns); +extern bool current_in_userns(const struct user_namespace *target_ns); #else static inline struct user_namespace *get_user_ns(struct user_namespace *ns) @@ -100,6 +101,11 @@ static inline bool userns_may_setgroups(const struct user_namespace *ns) { return true; } + +static inline bool current_in_userns(const struct user_namespace *target_ns) +{ + return true; +} #endif #endif /* _LINUX_USER_H */ |