Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The major update to this release is that there's a new arch config
option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.
Currently, only x86_64 enables it. All the ftrace callbacks now take a
struct ftrace_regs instead of a struct pt_regs. If the architecture
has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
have enough information to read the arguments of the function being
traced, as well as access to the stack pointer.
This way, if a user (like live kernel patching) only cares about the
arguments, then it can avoid using the heavier weight "regs" callback,
that puts in enough information in the struct ftrace_regs to simulate
a breakpoint exception (needed for kprobes).
A new config option that audits the timestamps of the ftrace ring
buffer at most every event recorded.
Ftrace recursion protection has been cleaned up to move the protection
to the callback itself (this saves on an extra function call for those
callbacks).
Perf now handles its own RCU protection and does not depend on ftrace
to do it for it (saving on that extra function call).
New debug option to add "recursed_functions" file to tracefs that
lists all the places that triggered the recursion protection of the
function tracer. This will show where things need to be fixed as
recursion slows down the function tracer.
The eval enum mapping updates done at boot up are now offloaded to a
work queue, as it caused a noticeable pause on slow embedded boards.
Various clean ups and last minute fixes"
* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Offload eval map updates to a work queue
Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
ring-buffer: Add rb_check_bpage in __rb_allocate_pages
ring-buffer: Fix two typos in comments
tracing: Drop unneeded assignment in ring_buffer_resize()
tracing: Disable ftrace selftests when any tracer is running
seq_buf: Avoid type mismatch for seq_buf_init
ring-buffer: Fix a typo in function description
ring-buffer: Remove obsolete rb_event_is_commit()
ring-buffer: Add test to validate the time stamp deltas
ftrace/documentation: Fix RST C code blocks
tracing: Clean up after filter logic rewriting
tracing: Remove the useless value assignment in test_create_synth_event()
livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
MAINTAINERS: assign ./fs/tracefs to TRACING
tracing: Fix some typos in comments
ftrace: Remove unused varible 'ret'
ring-buffer: Add recording of ring buffer recursion into recursed_functions
...
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- NFSv3: Add emulation of lookupp() to improve open_by_filehandle()
support
- A series of patches to improve readdir performance, particularly
with large directories
- Basic support for using NFS/RDMA with the pNFS files and flexfiles
drivers
- Micro-optimisations for RDMA
- RDMA tracing improvements
Bugfixes:
- Fix a long standing bug with xs_read_xdr_buf() when receiving
partial pages (Dan Aloni)
- Various fixes for getxattr and listxattr, when used over non-TCP
transports
- Fixes for containerised NFS from Sargun Dhillon
- switch nfsiod to be an UNBOUND workqueue (Neil Brown)
- READDIR should not ask for security label information if there is
no LSM policy (Olga Kornievskaia)
- Avoid using interval-based rebinding with TCP in lockd (Calum
Mackay)
- A series of RPC and NFS layer fixes to support the NFSv4.2
READ_PLUS code
- A couple of fixes for pnfs/flexfiles read failover
Cleanups:
- Various cleanups for the SUNRPC xdr code in conjunction with the
READ_PLUS fixes"
* tag 'nfs-for-5.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (90 commits)
NFS/pNFS: Fix a typo in ff_layout_resend_pnfs_read()
pNFS/flexfiles: Avoid spurious layout returns in ff_layout_choose_ds_for_read
NFSv4/pnfs: Add tracing for the deviceid cache
fs/lockd: convert comma to semicolon
NFSv4.2: fix error return on memory allocation failure
NFSv4.2/pnfs: Don't use READ_PLUS with pNFS yet
NFSv4.2: Deal with potential READ_PLUS data extent buffer overflow
NFSv4.2: Don't error when exiting early on a READ_PLUS buffer overflow
NFSv4.2: Handle hole lengths that exceed the READ_PLUS read buffer
NFSv4.2: decode_read_plus_hole() needs to check the extent offset
NFSv4.2: decode_read_plus_data() must skip padding after data segment
NFSv4.2: Ensure we always reset the result->count in decode_read_plus()
SUNRPC: When expanding the buffer, we may need grow the sparse pages
SUNRPC: Cleanup - constify a number of xdr_buf helpers
SUNRPC: Clean up open coded setting of the xdr_stream 'nwords' field
SUNRPC: _copy_to/from_pages() now check for zero length
SUNRPC: Cleanup xdr_shrink_bufhead()
SUNRPC: Fix xdr_expand_hole()
SUNRPC: Fixes for xdr_align_data()
SUNRPC: _shift_data_left/right_pages should check the shift length
...
|
|
Pull ceph updates from Ilya Dryomov:
"The big ticket item here is support for msgr2 on-wire protocol, which
adds the option of full in-transit encryption using AES-GCM algorithm
(myself).
On top of that we have a series to avoid intermittent errors during
recovery with recover_session=clean and some MDS request encoding work
from Jeff, a cap handling fix and assorted observability improvements
from Luis and Xiubo and a good number of cleanups.
Luis also ran into a corner case with quotas which sadly means that we
are back to denying cross-quota-realm renames"
* tag 'ceph-for-5.11-rc1' of git://github.com/ceph/ceph-client: (59 commits)
libceph: drop ceph_auth_{create,update}_authorizer()
libceph, ceph: make use of __ceph_auth_get_authorizer() in msgr1
libceph, ceph: implement msgr2.1 protocol (crc and secure modes)
libceph: introduce connection modes and ms_mode option
libceph, rbd: ignore addr->type while comparing in some cases
libceph, ceph: get and handle cluster maps with addrvecs
libceph: factor out finish_auth()
libceph: drop ac->ops->name field
libceph: amend cephx init_protocol() and build_request()
libceph, ceph: incorporate nautilus cephx changes
libceph: safer en/decoding of cephx requests and replies
libceph: more insight into ticket expiry and invalidation
libceph: move msgr1 protocol specific fields to its own struct
libceph: move msgr1 protocol implementation to its own file
libceph: separate msgr1 protocol implementation
libceph: export remaining protocol independent infrastructure
libceph: export zero_page
libceph: rename and export con->flags bits
libceph: rename and export con->state states
libceph: make con->state an int
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
- Allow unprivileged mounting in a user namespace.
For quite some time the security model of overlayfs has been that
operations on underlying layers shall be performed with the
privileges of the mounting task.
This way an unprvileged user cannot gain privileges by the act of
mounting an overlayfs instance. A full audit of all function calls
made by the overlayfs code has been performed to see whether they
conform to this model, and this branch contains some fixes in this
regard.
- Support running on copied filesystem images by optionally disabling
UUID verification.
- Bug fixes as well as documentation updates.
* tag 'ovl-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: unprivieged mounts
ovl: do not get metacopy for userxattr
ovl: do not fail because of O_NOATIME
ovl: do not fail when setting origin xattr
ovl: user xattr
ovl: simplify file splice
ovl: make ioctl() safe
ovl: check privs before decoding file handle
vfs: verify source area in vfs_dedupe_file_range_one()
vfs: move cap_convert_nscap() call into vfs_setxattr()
ovl: fix incorrect extent info in metacopy case
ovl: expand warning in ovl_d_real()
ovl: document lower modification caveats
ovl: warn about orphan metacopy
ovl: doc clarification
ovl: introduce new "uuid=off" option for inodes index feature
ovl: propagate ovl_fs to ovl_decode_real_fh and ovl_encode_real_fh
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Improve performance of virtio-fs in mixed read/write workloads
- Try to revalidate cache before returning EEXIST on exclusive create
- Add a couple of miscellaneous bug fixes as well as some code cleanups
* tag 'fuse-update-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: fix bad inode
fuse: support SB_NOSEC flag to improve write performance
fuse: add a flag FUSE_OPEN_KILL_SUIDGID for open() request
fuse: don't send ATTR_MODE to kill suid/sgid for handle_killpriv_v2
fuse: setattr should set FATTR_KILL_SUIDGID
fuse: set FUSE_WRITE_KILL_SUIDGID in cached write path
fuse: rename FUSE_WRITE_KILL_PRIV to FUSE_WRITE_KILL_SUIDGID
fuse: introduce the notion of FUSE_HANDLE_KILLPRIV_V2
fuse: always revalidate if exclusive create
virtiofs: clean up error handling in virtio_fs_get_tree()
fuse: add fuse_sb_destroy() helper
fuse: simplify get_fuse_conn*()
fuse: get rid of fuse_mount refcount
virtiofs: simplify sb setup
virtiofs fix leak in setup
fuse: launder page should wait for page writeback
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've made more work into per-file compression support.
For example, F2FS_IOC_GET | SET_COMPRESS_OPTION provides a way to
change the algorithm or cluster size per file. F2FS_IOC_COMPRESS |
DECOMPRESS_FILE provides a way to compress and decompress the existing
normal files manually.
There is also a new mount option, compress_mode=fs|user, which can
control who compresses the data.
Chao also added a checksum feature with a mount option so that
we are able to detect any corrupted cluster.
In addition, Daniel contributed casefolding with encryption patch,
which will be used for Android devices.
Summary:
Enhancements:
- add ioctls and mount option to manage per-file compression feature
- support casefolding with encryption
- support checksum for compressed cluster
- avoid IO starvation by replacing mutex with rwsem
- add sysfs, max_io_bytes, to control max bio size
Bug fixes:
- fix use-after-free issue when compression and fsverity are enabled
- fix consistency corruption during fault injection test
- fix data offset for lseek
- get rid of buffer_head which has 32bits limit in fiemap
- fix some bugs in multi-partitions support
- fix nat entry count calculation in shrinker
- fix some stat information
And, we've refactored some logics and fix minor bugs as well"
* tag 'f2fs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (36 commits)
f2fs: compress: fix compression chksum
f2fs: fix shift-out-of-bounds in sanity_check_raw_super()
f2fs: fix race of pending_pages in decompression
f2fs: fix to account inline xattr correctly during recovery
f2fs: inline: fix wrong inline inode stat
f2fs: inline: correct comment in f2fs_recover_inline_data
f2fs: don't check PAGE_SIZE again in sanity_check_raw_super()
f2fs: convert to F2FS_*_INO macro
f2fs: introduce max_io_bytes, a sysfs entry, to limit bio size
f2fs: don't allow any writes on readonly mount
f2fs: avoid race condition for shrinker count
f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
f2fs: add compress_mode mount option
f2fs: Remove unnecessary unlikely()
f2fs: init dirty_secmap incorrectly
f2fs: remove buffer_head which has 32bits limit
f2fs: fix wrong block count instead of bytes
f2fs: use new conversion functions between blks and bytes
f2fs: rename logical_to_blk and blk_to_logical
f2fs: fix kbytes written stat for multi-device case
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, reiserfs, quota and writeback updates from Jan Kara:
- a couple of quota fixes (mostly for problems found by syzbot)
- several ext2 cleanups
- one fix for reiserfs crash on corrupted image
- a fix for spurious warning in writeback code
* tag 'for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
writeback: don't warn on an unregistered BDI in __mark_inode_dirty
fs: quota: fix array-index-out-of-bounds bug by passing correct argument to vfs_cleanup_quota_inode()
reiserfs: add check for an invalid ih_entry_count
ext2: Fix fall-through warnings for Clang
fs/ext2: Use ext2_put_page
docs: filesystems: Reduce ext2.rst to one top-level heading
quota: Sanity-check quota file headers on load
quota: Don't overflow quota file offsets
ext2: Remove unnecessary blank
fs/quota: update quota state flags scheme with project quota flags
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"A few fsnotify fixes from Amir fixing fallout from big fsnotify
overhaul a few months back and an improvement of defaults limiting
maximum number of inotify watches from Waiman"
* tag 'fsnotify_for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: fix events reported to watching parent and child
inotify: convert to handle_inode_event() interface
fsnotify: generalize handle_inode_event()
inotify: Increase default inotify.max_user_watches limit to 1048576
|
|
Don't bump the index twice.
Fixes: 563c53e73b8b ("NFS: Fix flexfiles read failover")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The callers of ff_layout_choose_ds_for_read() should decide whether or
not they want to return the layout on error. Sometimes, we may just want
to retry from the beginning.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Add tracepoints to allow debugging of the deviceid cache.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Pull block updates from Jens Axboe:
"Another series of killing more code than what is being added, again
thanks to Christoph's relentless cleanups and tech debt tackling.
This contains:
- blk-iocost improvements (Baolin Wang)
- part0 iostat fix (Jeffle Xu)
- Disable iopoll for split bios (Jeffle Xu)
- block tracepoint cleanups (Christoph Hellwig)
- Merging of struct block_device and hd_struct (Christoph Hellwig)
- Rework/cleanup of how block device sizes are updated (Christoph
Hellwig)
- Simplification of gendisk lookup and removal of block device
aliasing (Christoph Hellwig)
- Block device ioctl cleanups (Christoph Hellwig)
- Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig)
- Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig)
- sbitmap improvements (Pavel Begunkov)
- Hybrid polling fix (Pavel Begunkov)
- bvec iteration improvements (Pavel Begunkov)
- Zone revalidation fixes (Damien Le Moal)
- blk-throttle limit fix (Yu Kuai)
- Various little fixes"
* tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits)
blk-mq: fix msec comment from micro to milli seconds
blk-mq: update arg in comment of blk_mq_map_queue
blk-mq: add helper allocating tagset->tags
Revert "block: Fix a lockdep complaint triggered by request queue flushing"
nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
block: disable iopoll for split bio
block: Improve blk_revalidate_disk_zones() checks
sbitmap: simplify wrap check
sbitmap: replace CAS with atomic and
sbitmap: remove swap_lock
sbitmap: optimise sbitmap_deferred_clear()
blk-mq: skip hybrid polling if iopoll doesn't spin
blk-iocost: Factor out the base vrate change into a separate function
blk-iocost: Factor out the active iocgs' state check into a separate function
blk-iocost: Move the usage ratio calculation to the correct place
blk-iocost: Remove unnecessary advance declaration
blk-iocost: Fix some typos in comments
blktrace: fix up a kerneldoc comment
block: remove the request_queue to argument request based tracepoints
...
|
|
Pull io_uring updates from Jens Axboe:
"Fairly light set of changes this time around, and mostly some bits
that were pushed out to 5.11 instead of 5.10, fixes/cleanups, and a
few features. In particular:
- Cleanups around iovec import (David Laight, Pavel)
- Add timeout support for io_uring_enter(2), which enables us to
clean up liburing and avoid a timeout sqe submission in the
completion path.
The big win here is that it allows setups that split SQ and CQ
handling into separate threads to avoid locking, as the CQ side
will no longer submit when timeouts are needed when waiting for
events (Hao Xu)
- Add support for socket shutdown, and renameat/unlinkat.
- SQPOLL cleanups and improvements (Xiaoguang Wang)
- Allow SQPOLL setups for CAP_SYS_NICE, and enable regular
(non-fixed) files to be used.
- Cancelation improvements (Pavel)
- Fixed file reference improvements (Pavel)
- IOPOLL related race fixes (Pavel)
- Lots of other little fixes and cleanups (mostly Pavel)"
* tag 'for-5.11/io_uring-2020-12-14' of git://git.kernel.dk/linux-block: (43 commits)
io_uring: fix io_cqring_events()'s noflush
io_uring: fix racy IOPOLL flush overflow
io_uring: fix racy IOPOLL completions
io_uring: always let io_iopoll_complete() complete polled io
io_uring: add timeout update
io_uring: restructure io_timeout_cancel()
io_uring: fix files cancellation
io_uring: use bottom half safe lock for fixed file data
io_uring: fix miscounting ios_left
io_uring: change submit file state invariant
io_uring: check kthread stopped flag when sq thread is unparked
io_uring: share fixed_file_refs b/w multiple rsrcs
io_uring: replace inflight_wait with tctx->wait
io_uring: don't take fs for recvmsg/sendmsg
io_uring: only wake up sq thread while current task is in io worker context
io_uring: don't acquire uring_lock twice
io_uring: initialize 'timeout' properly in io_sq_thread()
io_uring: refactor io_sq_thread() handling
io_uring: always batch cancel in *cancel_files()
io_uring: pass files into kill timeouts/poll
...
|
|
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe:
"This sits on top of of the core entry/exit and x86 entry branch from
the tip tree, which contains the generic and x86 parts of this work.
Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL.
With that done, we can get rid of JOBCTL_TASK_WORK from task_work and
signal.c, and also remove a deadlock work-around in io_uring around
knowing that signal based task_work waking is invoked with the sighand
wait queue head lock.
The motivation for this work is to decouple signal notify based
task_work, of which io_uring is a heavy user of, from sighand. The
sighand lock becomes a huge contention point, particularly for
threaded workloads where it's shared between threads. Even outside of
threaded applications it's slower than it needs to be.
Roman Gershman <romger@amazon.com> reported that his networked
workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU
after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all
spent hammering on the sighand lock, showing 57% of the CPU time there
[1].
There are further cleanups possible on top of this. One example is
TIF_PATCH_PENDING, where a patch already exists to use
TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more
consolidation, but the work stands on its own as well"
[1] https://github.com/axboe/liburing/issues/215
* tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits)
io_uring: remove 'twa_signal_ok' deadlock work-around
kernel: remove checking for TIF_NOTIFY_SIGNAL
signal: kill JOBCTL_TASK_WORK
io_uring: JOBCTL_TASK_WORK is no longer used by task_work
task_work: remove legacy TWA_SIGNAL path
sparc: add support for TIF_NOTIFY_SIGNAL
riscv: add support for TIF_NOTIFY_SIGNAL
nds32: add support for TIF_NOTIFY_SIGNAL
ia64: add support for TIF_NOTIFY_SIGNAL
h8300: add support for TIF_NOTIFY_SIGNAL
c6x: add support for TIF_NOTIFY_SIGNAL
alpha: add support for TIF_NOTIFY_SIGNAL
xtensa: add support for TIF_NOTIFY_SIGNAL
arm: add support for TIF_NOTIFY_SIGNAL
microblaze: add support for TIF_NOTIFY_SIGNAL
hexagon: add support for TIF_NOTIFY_SIGNAL
csky: add support for TIF_NOTIFY_SIGNAL
openrisc: add support for TIF_NOTIFY_SIGNAL
sh: add support for TIF_NOTIFY_SIGNAL
um: add support for TIF_NOTIFY_SIGNAL
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"A change to increase the default maximum stack size on parisc to 100MB
and the ability to further increase the stack hard limit size at
runtime with ulimit for newly started processes.
The other patches fix compile warnings, utilize the Kbuild logic and
cleanups the parisc arch code"
* 'parisc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: pci-dma: fix warning unused-function
parisc/uapi: Use Kbuild logic to provide <asm/types.h>
parisc: Make user stack size configurable
parisc: Use _TIF_USER_WORK_MASK in entry.S
parisc: Drop loops_per_jiffy from per_cpu struct
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"The major change here is finally gaining seccomp constant-action
bitmaps, which internally reduces the seccomp overhead for many
real-world syscall filters to O(1), as discussed at Plumbers this
year.
- Improve seccomp performance via constant-action bitmaps (YiFei Zhu
& Kees Cook)
- Fix bogus __user annotations (Jann Horn)
- Add missed CONFIG for improved selftest coverage (Mickaël Salaün)"
* tag 'seccomp-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
selftests/seccomp: Update kernel config
seccomp: Remove bogus __user annotations
seccomp/cache: Report cache data through /proc/pid/seccomp_cache
xtensa: Enable seccomp architecture tracking
sh: Enable seccomp architecture tracking
s390: Enable seccomp architecture tracking
riscv: Enable seccomp architecture tracking
powerpc: Enable seccomp architecture tracking
parisc: Enable seccomp architecture tracking
csky: Enable seccomp architecture tracking
arm: Enable seccomp architecture tracking
arm64: Enable seccomp architecture tracking
selftests/seccomp: Compare bitmap vs filter overhead
x86: Enable seccomp architecture tracking
seccomp/cache: Add "emulator" to check if filter is constant allow
seccomp/cache: Lookup syscall allowlist bitmap for fast path
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
- Clean up unused but exposed API (Christoph Hellwig)
- Provide KCONFIG for default size of kmsg buffer (Vasile-Laurentiu
Stanimir)
* tag 'pstore-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Move kmsg_bytes default into Kconfig
pstore/blk: remove {un,}register_pstore_blk
pstore/blk: update the command line example
pstore/zone: cap the maximum device size
|
|
Replace a comma between expression statements by a semicolon.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Currently when an alloc_page fails the error return is not set in
variable err and a garbage initialized value is returned. Fix this
by setting err to -ENOMEM before taking the error return path.
Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: a1f26739ccdc ("NFSv4.2: improve page handling for GETXATTR")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
BDIs get unregistered during device removal, and this WARN can be
trivially triggered by hot-removing a NVMe device while running fsx
It is otherwise harmless as we still hold a BDI reference, and the
writeback has been shut down already.
Link: https://lore.kernel.org/r/20200928122613.434820-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kunit updates from Shuah Khan:
- documentation update and fix to kunit_tool to parse diagnostic
messages correctly from David Gow
- Support for Parameterized Testing and fs/ext4 test updates to use
KUnit parameterized testing feature from Arpitha Raghunandan
- Helper to derive file names depending on --build_dir argument from
Andy Shevchenko
* tag 'linux-kselftest-kunit-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
fs: ext4: Modify inode-test.c to use KUnit parameterized testing feature
kunit: Support for Parameterized Testing
kunit: kunit_tool: Correctly parse diagnostic messages
Documentation: kunit: provide guidance for testing many inputs
kunit: Introduce get_file_path() helper
|
|
Merge yet more updates from Andrew Morton:
- lots of little subsystems
- a few post-linux-next MM material. Most of the rest awaits more
merging of other trees.
Subsystems affected by this series: alpha, procfs, misc, core-kernel,
bitmap, lib, lz4, checkpatch, nilfs, kdump, rapidio, gcov, bfs, relay,
resource, ubsan, reboot, fault-injection, lzo, apparmor, and mm (swap,
memory-hotplug, pagemap, cleanups, and gup).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (86 commits)
mm: fix some spelling mistakes in comments
mm: simplify follow_pte{,pmd}
mm: unexport follow_pte_pmd
apparmor: remove duplicate macro list_entry_is_head()
lib/lzo/lzo1x_compress.c: make lzogeneric1x_1_compress() static
fault-injection: handle EI_ETYPE_TRUE
reboot: hide from sysfs not applicable settings
reboot: allow to override reboot type if quirks are found
reboot: remove cf9_safe from allowed types and rename cf9_force
reboot: allow to specify reboot mode via sysfs
reboot: refactor and comment the cpu selection code
lib/ubsan.c: mark type_check_kinds with static keyword
kcov: don't instrument with UBSAN
ubsan: expand tests and reporting
ubsan: remove UBSAN_MISC in favor of individual options
ubsan: enable for all*config builds
ubsan: disable UBSAN_TRAP for all*config
ubsan: disable object-size sanitizer under GCC
ubsan: move cc-option tests into Kconfig
ubsan: remove redundant -Wno-maybe-uninitialized
...
|
|
Merge __follow_pte_pmd, follow_pte_pmd and follow_pte into a single
follow_pte function and just pass two additional NULL arguments for the
two previous follow_pte callers.
[sfr@canb.auug.org.au: merge fix for "s390/pci: remove races against pte updates"]
Link: https://lkml.kernel.org/r/20201111221254.7f6a3658@canb.auug.org.au
Link: https://lkml.kernel.org/r/20201029101432.47011-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Make the printk() [bfs "printf" macro] seem less severe by changing
"WARNING:" to "NOTE:".
<asm-generic/bug.h> warns us about using WARNING or BUG in a format string
other than in WARN() or BUG() family macros. bfs/inode.c is doing just
that in a normal printk() call, so change the "WARNING" string to be
"NOTE".
Link: https://lkml.kernel.org/r/20201203212634.17278-1-rdunlap@infradead.org
Reported-by: syzbot+3fd34060f26e766536ff@syzkaller.appspotmail.com
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Tigran A. Aivazian" <aivazian.tigran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There some macros are unused and cause gcc warning. Remove them.
fs/nilfs2/segment.c:137:0: warning: macro "nilfs_cnt32_gt" is not used [-Wunused-macros]
fs/nilfs2/segment.c:144:0: warning: macro "nilfs_cnt32_le" is not used [-Wunused-macros]
fs/nilfs2/segment.c:143:0: warning: macro "nilfs_cnt32_lt" is not used [-Wunused-macros]
Link: https://lkml.kernel.org/r/1607552733-24292-1-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out
mathematical helpers.
At the same time convert users in header and lib folder to use new
header. Though for time being include new header back to kernel.h to
avoid twisted indirected includes for existing users.
[sfr@canb.auug.org.au: fix powerpc build]
Link: https://lkml.kernel.org/r/20201029150809.13059608@canb.auug.org.au
Link: https://lkml.kernel.org/r/20201028173212.41768-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We don't need pde_get()'s return value, so make pde_get() return nothing
Link: https://lkml.kernel.org/r/20201211061944.GA2387571@rlk
Signed-off-by: Hui Su <sh_def@163.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 1fde6f21d90f ("proc: fix /proc/net/* after setns(2)") only forced
revalidation of regular files under /proc/net/
However, /proc/net/ is unusual in the sense of /proc/net/foo handlers
take netns pointer from parent directory which is old netns.
Steps to reproduce:
(void)open("/proc/net/sctp/snmp", O_RDONLY);
unshare(CLONE_NEWNET);
int fd = open("/proc/net/sctp/snmp", O_RDONLY);
read(fd, &c, 1);
Read will read wrong data from original netns.
Patch forces lookup on every directory under /proc/net .
Link: https://lkml.kernel.org/r/20201205160916.GA109739@localhost.localdomain
Fixes: 1da4d377f943 ("proc: revalidate misc dentries")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Similar to speculation store bypass, show information about the indirect
branch speculation mode of a task in /proc/$pid/status.
For testing/benchmarking, I needed to see whether IB (Indirect Branch)
speculation (see Spectre-v2) is enabled on a task, to see whether an
IBPB instruction should be executed on an address space switch.
Unfortunately, this information isn't available anywhere else and
currently the only way to get it is to hack the kernel to expose it
(like this change). It also helped expose a bug with conditional IB
speculation on certain CPUs.
Another place this could be useful is to audit the system when using
sanboxing. With this change, I can confirm that seccomp-enabled
process have IB speculation force disabled as expected when the kernel
command line parameter `spectre_v2_user=seccomp`.
Since there's already a 'Speculation_Store_Bypass' field, I used that
as precedent for adding this one.
[amistry@google.com: remove underscores from field name to workaround documentation issue]
Link: https://lkml.kernel.org/r/20201106131015.v2.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid
Link: https://lkml.kernel.org/r/20201030172731.1.I7782b0cedb705384a634cfd8898eb7523562da99@changeid
Signed-off-by: Anand K Mistry <amistry@google.com>
Cc: Anthony Steinhauser <asteinhauser@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Anand K Mistry <amistry@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Delete repeated words in fs/proc/.
{the, which}
where "which which" was changed to "with which".
Link: https://lkml.kernel.org/r/20201028191525.13413-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exec-update-lock update from Eric Biederman:
"The key point of this is to transform exec_update_mutex into a
rw_semaphore so readers can be separated from writers.
This makes it easier to understand what the holders of the lock are
doing, and makes it harder to contend or deadlock on the lock.
The real deadlock fix wound up in perf_event_open"
* 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
exec: Transform exec_update_mutex into a rw_semaphore
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve updates from Eric Biederman:
"This set of changes ultimately fixes the interaction of posix file
lock and exec. Fundamentally most of the change is just moving where
unshare_files is called during exec, and tweaking the users of
files_struct so that the count of files_struct is not unnecessarily
played with.
Along the way fcheck and related helpers were renamed to more
accurately reflect what they do.
There were also many other small changes that fell out, as this is the
first time in a long time much of this code has been touched.
Benchmarks haven't turned up any practical issues but Al Viro has
observed a possibility for a lot of pounding on task_lock. So I have
some changes in progress to convert put_files_struct to always rcu
free files_struct. That wasn't ready for the merge window so that will
have to wait until next time"
* 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
exec: Move io_uring_task_cancel after the point of no return
coredump: Document coredump code exclusively used by cell spufs
file: Remove get_files_struct
file: Rename __close_fd_get_file close_fd_get_file
file: Replace ksys_close with close_fd
file: Rename __close_fd to close_fd and remove the files parameter
file: Merge __alloc_fd into alloc_fd
file: In f_dupfd read RLIMIT_NOFILE once.
file: Merge __fd_install into fd_install
proc/fd: In fdinfo seq_show don't use get_files_struct
bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu
proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu
file: Implement task_lookup_next_fd_rcu
kcmp: In get_file_raw_ptr use task_lookup_fd_rcu
proc/fd: In tid_fd_mode use task_lookup_fd_rcu
file: Implement task_lookup_fd_rcu
file: Rename fcheck lookup_fd_rcu
file: Replace fcheck_files with files_lookup_fd_rcu
file: Factor files_lookup_fd_locked out of fcheck_files
file: Rename __fcheck_files to files_lookup_fd_raw
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull close_range/openat2 updates from Christian Brauner:
"This contains a fix for openat2() to make RESOLVE_BENEATH and
RESOLVE_IN_ROOT mutually exclusive. It doesn't make sense to specify
both at the same time. The openat2() selftests have been extended to
verify that these two flags can't be specified together.
This also adds the CLOSE_RANGE_CLOEXEC flag to close_range() which
allows to mark a range of file descriptors as close-on-exec without
actually closing them.
This is useful in general but the use-case that triggered the patch is
installing a seccomp profile in the calling task before exec. If the
seccomp profile wants to block the close_range() syscall it obviously
can't use it to close all fds before exec. If it calls close_range()
before installing the seccomp profile it needs to take care not to
close fds that it will still need before the exec meaning it would
have to call close_range() multiple times on different ranges and then
still fall back to closing fds one by one right before the exec.
CLOSE_RANGE_CLOEXEC allows to solve this problem relying on the exec
codepath to get rid of the unwanted fds. The close_range() tests have
been expanded to verify that CLOSE_RANGE_CLOEXEC works"
* tag 'close-range-openat2-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: core: add tests for CLOSE_RANGE_CLOEXEC
fs, close_range: add flag CLOSE_RANGE_CLOEXEC
selftests: openat2: add RESOLVE_ conflict test
openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull epoll updates from Al Viro:
"Deal with epoll loop check/removal races sanely (among other things).
The solution merged last cycle (pinning a bunch of struct file
instances) had been forced by the wrong data structures; untangling
that takes a bunch of preparations, but it's worth doing - control
flow in there is ridiculously overcomplicated. Memory footprint has
also gone down, while we are at it.
This is not all I want to do in the area, but since I didn't get
around to posting the followups they'll have to wait for the next
cycle"
* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
epoll: take epitem list out of struct file
epoll: massage the check list insertion
lift rcu_read_lock() into reverse_path_check()
convert ->f_ep_links/->fllink to hlist
ep_insert(): move creation of wakeup source past the fl_ep_links insertion
fold ep_read_events_proc() into the only caller
take the common part of ep_eventpoll_poll() and ep_item_poll() into helper
ep_insert(): we only need tep->mtx around the insertion itself
ep_insert(): don't open-code ep_remove() on failure exits
lift locking/unlocking ep->mtx out of ep_{start,done}_scan()
ep_send_events_proc(): fold into the caller
lift the calls of ep_send_events_proc() into the callers
lift the calls of ep_read_events_proc() into the callers
ep_scan_ready_list(): prepare to splitup
ep_loop_check_proc(): saner calling conventions
get rid of ep_push_nested()
ep_loop_check_proc(): lift pushing the cookie into callers
clean reverse_path_check_proc() a bit
reverse_path_check_proc(): don't bother with cookies
reverse_path_check_proc(): sane arguments
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"This cycle we got rid of magical page->mapping type marks for
temporary pages which had some concern before, now such usage is
replaced with specific page->private.
Also switch to inplace I/O instead of allocating extra cached pages to
avoid direct reclaim under low memory scenario.
There are some bmap bugfix and minor cleanups as well"
* tag 'erofs-for-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: avoid using generic_block_bmap
erofs: force inplace I/O under low memory scenario
erofs: simplify try_to_claim_pcluster()
erofs: insert to managed cache after adding to pcl
erofs: get rid of magical Z_EROFS_MAPPING_STAGING
erofs: remove a void EROFS_VERSION macro set in Makefile
|
|
Pull nfsd updates from Chuck Lever:
"Several substantial changes this time around:
- Previously, exporting an NFS mount via NFSD was considered to be an
unsupported feature. With v5.11, the community has attempted to
make re-exporting a first-class feature of NFSD.
This would enable the Linux in-kernel NFS server to be used as an
intermediate cache for a remotely-located primary NFS server, for
example, even with other NFS server implementations, like a NetApp
filer, as the primary.
- A short series of patches brings support for multiple RPC/RDMA data
chunks per RPC transaction to the Linux NFS server's RPC/RDMA
transport implementation.
This is a part of the RPC/RDMA spec that the other premiere
NFS/RDMA implementation (Solaris) has had for a very long time, and
completes the implementation of RPC/RDMA version 1 in the Linux
kernel's NFS server.
- Long ago, NFSv4 support was introduced to NFSD using a series of C
macros that hid dprintk's and goto's. Over time, the kernel's XDR
implementation has been greatly improved, but these C macros have
remained and become fallow. A series of patches in this pull
request completely replaces those macros with the use of current
kernel XDR infrastructure. Benefits include:
- More robust input sanitization in NFSD's NFSv4 XDR decoders.
- Make it easier to use common kernel library functions that use
XDR stream APIs (for example, GSS-API).
- Align the structure of the source code with the RFCs so it is
easier to learn, verify, and maintain our XDR implementation.
- Removal of more than a hundred hidden dprintk() call sites.
- Removal of some explicit manipulation of pages to help make the
eventual transition to xdr->bvec smoother.
- On top of several related fixes in 5.10-rc, there are a few more
fixes to get the Linux NFSD implementation of NFSv4.2 inter-server
copy up to speed.
And as usual, there is a pinch of seasoning in the form of a
collection of unrelated minor bug fixes and clean-ups.
Many thanks to all who contributed this time around!"
* tag 'nfsd-5.11' of git://git.linux-nfs.org/projects/cel/cel-2.6: (131 commits)
nfsd: Record NFSv4 pre/post-op attributes as non-atomic
nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
exportfs: Add a function to return the raw output from fh_to_dentry()
nfsd: close cached files prior to a REMOVE or RENAME that would replace target
nfsd: allow filesystems to opt out of subtree checking
nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
Revert "nfsd4: support change_attr_type attribute"
nfsd4: don't query change attribute in v2/v3 case
nfsd: minor nfsd4_change_attribute cleanup
nfsd: simplify nfsd4_change_info
nfsd: only call inode_query_iversion in the I_VERSION case
nfs_common: need lock during iterate through the list
NFSD: Fix 5 seconds delay when doing inter server copy
NFSD: Fix sparse warning in nfs4proc.c
SUNRPC: Remove XDRBUF_SPARSE_PAGES flag in gss_proxy upcall
sunrpc: clean-up cache downcall
nfsd: Fix message level for normal termination
NFSD: Remove macros that are no longer used
NFSD: Replace READ* macros in nfsd4_decode_compound()
...
|
|
Pull jfs updates from David Kleikamp:
"A few jfs fixes"
* tag 'jfs-5.11' of git://github.com/kleikamp/linux-shaggy:
jfs: Fix array index bounds check in dbAdjTree
jfs: Fix memleak in dbAdjCtl
jfs: delete duplicated words + other fixes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set includes more low level communication layer cleanups.
The main change is the listening socket is no longer handled as a
special case of node connection sockets. There is one small fix for
checking the number of local connections"
* tag 'dlm-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: check on existing node address
fs: dlm: constify addr_compare
fs: dlm: fix check for multi-homed hosts
fs: dlm: listen socket out of connection hash
fs: dlm: refactor sctp sock parameter
fs: dlm: move shutdown action to node creation
fs: dlm: move connect callback in node creation
fs: dlm: add helper for init connection
fs: dlm: handle non blocked connect event
fs: dlm: flush othercon at close
fs: dlm: add get buffer error handling
fs: dlm: define max send buffer
fs: dlm: fix proper srcu api call
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"We have a mix of all kinds of changes, feature updates, core stuff,
performance improvements and lots of cleanups and preparatory changes.
User visible:
- export filesystem generation in sysfs
- new features for mount option 'rescue':
- what's currently supported is exported in sysfs
- 'ignorebadroots'/'ibadroots' - continue even if some essential
tree roots are not usable (extent, uuid, data reloc, device,
csum, free space)
- 'ignoredatacsums'/'idatacsums' - skip checksum verification on
data
- 'all' - now enables 'ignorebadroots' + 'ignoredatacsums' +
'nologreplay'
- export read mirror policy settings to sysfs, new policies will be
added in the future
- remove inode number cache feature (mount -o inode_cache), obsoleted
in 5.9
User visible fixes:
- async discard scheduling fixes on high loads
- update inode byte counter atomically so stat() does not report
wrong value in some cases
- free space tree fixes:
- correctly report status of v2 after remount
- clear v1 cache inodes when v2 is newly enabled after remount
Core:
- switch own tree lock implementation to standard rw semaphore:
- one-level lock nesting is not required anymore, the last use of
this was in free space that's now loaded asynchronously
- own implementation of adaptive spinning before taking mutex has
been part of rwsem
- performance seems to be better in general, much better (+tens
of percents) for some workloads
- lockdep does not complain
- finish direct IO conversion to iomap infrastructure, remove
temporary workaround for DSYNC after iomap API updates
- preparatory work to support data and metadata blocks smaller than
page:
- generalize code that assumes sectorsize == PAGE_SIZE, lots of
refactoring
- planned namely for 64K pages (eg. arm64, ppc64)
- scrub read-only support
- preparatory work for zoned allocation mode (SMR/ZBC/ZNS friendly):
- disable incompatible features
- round-robin superblock write
- free space cache (v1) is loaded asynchronously, remove tree path
recursion
- slightly improved time tacking for transaction kthread wake ups
Performance improvements (note that the numbers depend on load type or
other features and weren't run on the same machine):
- skip unnecessary work:
- do not start readahead for csum tree when scrubbing non-data
block groups
- do not start and wait for delalloc on snapshot roots on
transaction commit
- fix race when defragmenting leads to unnecessary IO
- dbench speedups (+throughput%/-max latency%):
- skip unnecessary searches for xattrs when logging an inode
(+10.8/-8.2)
- stop incrementing log batch when joining log transaction (1-2)
- unlock path before checking if extent is shared during nocow
writeback (+5.0/-20.5), on fio load +9.7% throughput/-9.8%
runtime
- several tree log improvements, eg. removing unnecessary
operations, fixing races that lead to additional work
(+12.7/-8.2)
- tree-checker error branches annotated with unlikely() (+3%
throughput)
Other:
- cleanups
- lockdep fixes
- more btrfs_inode conversions
- error variable cleanups"
* tag 'for-5.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (198 commits)
btrfs: scrub: allow scrub to work with subpage sectorsize
btrfs: scrub: support subpage data scrub
btrfs: scrub: support subpage tree block scrub
btrfs: scrub: always allocate one full page for one sector for RAID56
btrfs: scrub: reduce width of extent_len/stripe_len from 64 to 32 bits
btrfs: refactor btrfs_lookup_bio_sums to handle out-of-order bvecs
btrfs: remove btrfs_find_ordered_sum call from btrfs_lookup_bio_sums
btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
btrfs: update num_extent_pages to support subpage sized extent buffer
btrfs: don't allow tree block to cross page boundary for subpage support
btrfs: calculate inline extent buffer page size based on page size
btrfs: factor out btree page submission code to a helper
btrfs: make btrfs_verify_data_csum follow sector size
btrfs: pass bio_offset to check_data_csum() directly
btrfs: rename bio_offset of extent_submit_bio_start_t to dio_file_offset
btrfs: fix lockdep warning when creating free space tree
btrfs: skip space_cache v1 setup when not using it
btrfs: remove free space items when disabling space cache v1
btrfs: warn when remount will not change the free space tree
btrfs: use superblock state to print space_cache mount option
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking fixes from Jeff Layton:
"A fix for some undefined integer overflow behavior, a typo in a
comment header, and a fix for a potential deadlock involving internal
senders of SIGIO/SIGURG"
* tag 'locks-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fcntl: Fix potential deadlock in send_sig{io, urg}()
locks: fix a typo at a kernel-doc markup
locks: Fix UBSAN undefined behaviour in flock64_to_posix_lock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core updates for 5.11-rc1
This time there was a lot of different work happening here for some
reason:
- redo of the fwnode link logic, speeding it up greatly
- auxiliary bus added (this was a tag that will be pulled in from
other trees/maintainers this merge window as well, as driver
subsystems started to rely on it)
- platform driver core cleanups on the way to fixing some long-time
api updates in future releases
- minor fixes and tweaks.
All have been in linux-next with no (finally) reported issues. Testing
there did helped in shaking issues out a lot :)"
* tag 'driver-core-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (39 commits)
driver core: platform: don't oops in platform_shutdown() on unbound devices
ACPI: Use fwnode_init() to set up fwnode
misc: pvpanic: Replace OF headers by mod_devicetable.h
misc: pvpanic: Combine ACPI and platform drivers
usb: host: sl811: Switch to use platform_get_mem_or_io()
vfio: platform: Switch to use platform_get_mem_or_io()
driver core: platform: Introduce platform_get_mem_or_io()
dyndbg: fix use before null check
soc: fix comment for freeing soc_dev_attr
driver core: platform: use bus_type functions
driver core: platform: change logic implementing platform_driver_probe
driver core: platform: reorder functions
driver core: make driver_probe_device() static
driver core: Fix a couple of typos
driver core: Reorder devices on successful probe
driver core: Delete pointless parameter in fwnode_operations.add_links
driver core: Refactor fw_devlink feature
efi: Update implementation of add_links() to create fwnode links
of: property: Update implementation of add_links() to create fwnode links
driver core: Use device's fwnode to check if it is waiting for suppliers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- support "prefer busy polling" NAPI operation mode, where we defer
softirq for some time expecting applications to periodically busy
poll
- AF_XDP: improve efficiency by more batching and hindering the
adjacency cache prefetcher
- af_packet: make packet_fanout.arr size configurable up to 64K
- tcp: optimize TCP zero copy receive in presence of partial or
unaligned reads making zero copy a performance win for much smaller
messages
- XDP: add bulk APIs for returning / freeing frames
- sched: support fragmenting IP packets as they come out of conntrack
- net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs
BPF:
- BPF switch from crude rlimit-based to memcg-based memory accounting
- BPF type format information for kernel modules and related tracing
enhancements
- BPF implement task local storage for BPF LSM
- allow the FENTRY/FEXIT/RAW_TP tracing programs to use
bpf_sk_storage
Protocols:
- mptcp: improve multiple xmit streams support, memory accounting and
many smaller improvements
- TLS: support CHACHA20-POLY1305 cipher
- seg6: add support for SRv6 End.DT4/DT6 behavior
- sctp: Implement RFC 6951: UDP Encapsulation of SCTP
- ppp_generic: add ability to bridge channels directly
- bridge: Connectivity Fault Management (CFM) support as is defined
in IEEE 802.1Q section 12.14.
Drivers:
- mlx5: make use of the new auxiliary bus to organize the driver
internals
- mlx5: more accurate port TX timestamping support
- mlxsw:
- improve the efficiency of offloaded next hop updates by using
the new nexthop object API
- support blackhole nexthops
- support IEEE 802.1ad (Q-in-Q) bridging
- rtw88: major bluetooth co-existance improvements
- iwlwifi: support new 6 GHz frequency band
- ath11k: Fast Initial Link Setup (FILS)
- mt7915: dual band concurrent (DBDC) support
- net: ipa: add basic support for IPA v4.5
Refactor:
- a few pieces of in_interrupt() cleanup work from Sebastian Andrzej
Siewior
- phy: add support for shared interrupts; get rid of multiple driver
APIs and have the drivers write a full IRQ handler, slight growth
of driver code should be compensated by the simpler API which also
allows shared IRQs
- add common code for handling netdev per-cpu counters
- move TX packet re-allocation from Ethernet switch tag drivers to a
central place
- improve efficiency and rename nla_strlcpy
- number of W=1 warning cleanups as we now catch those in a patchwork
build bot
Old code removal:
- wan: delete the DLCI / SDLA drivers
- wimax: move to staging
- wifi: remove old WDS wifi bridging support"
* tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits)
net: hns3: fix expression that is currently always true
net: fix proc_fs init handling in af_packet and tls
nfc: pn533: convert comma to semicolon
af_vsock: Assign the vsock transport considering the vsock address flags
af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
vsock_addr: Check for supported flag values
vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
vm_sockets: Add flags field in the vsock address data structure
net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled
tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
nfc: s3fwrn5: Release the nfc firmware
net: vxget: clean up sparse warnings
mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router
mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3
mlxsw: spectrum_router_xm: Introduce basic XM cache flushing
mlxsw: reg: Add Router LPM Cache Enable Register
mlxsw: reg: Add Router LPM Cache ML Delete Register
mlxsw: spectrum_router_xm: Implement L-value tracking for M-index
mlxsw: reg: Add XM Router M Table Register
...
|
|
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...
|
|
With this change, when the knob is set to 0, it allows unprivileged users
to call userfaultfd, like when it is set to 1, but with the restriction
that page faults from only user-mode can be handled. In this mode, an
unprivileged user (without SYS_CAP_PTRACE capability) must pass
UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM.
This enables administrators to reduce the likelihood that an attacker with
access to userfaultfd can delay faulting kernel code to widen timing
windows for other exploits.
The default value of this knob is changed to 0. This is required for
correct functioning of pipe mutex. However, this will fail postcopy live
migration, which will be unnoticeable to the VM guests. To avoid this,
set 'vm.userfault = 1' in /sys/sysctl.conf.
The main reason this change is desirable as in the short term is that the
Android userland will behave as with the sysctl set to zero. So without
this commit, any Linux binary using userfaultfd to manage its memory would
behave differently if run within the Android userland. For more details,
refer to Andrea's reply [1].
[1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/
Link: https://lkml.kernel.org/r/20201120030411.2690816-3-lokeshgidra@google.com
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: <calin@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "Control over userfaultfd kernel-fault handling", v6.
This patch series is split from [1]. The other series enables SELinux
support for userfaultfd file descriptors so that its creation and movement
can be controlled.
It has been demonstrated on various occasions that suspending kernel code
execution for an arbitrary amount of time at any access to userspace
memory (copy_from_user()/copy_to_user()/...) can be exploited to change
the intended behavior of the kernel. For instance, handling page faults
in kernel-mode using userfaultfd has been exploited in [2, 3]. Likewise,
FUSE, which is similar to userfaultfd in this respect, has been exploited
in [4, 5] for similar outcome.
This small patch series adds a new flag to userfaultfd(2) that allows
callers to give up the ability to handle kernel-mode faults with the
resulting UFFD file object. It then adds a 'user-mode only' option to the
unprivileged_userfaultfd sysctl knob to require unprivileged callers to
use this new flag.
The purpose of this new interface is to decrease the chance of an
unprivileged userfaultfd user taking advantage of userfaultfd to enhance
security vulnerabilities by lengthening the race window in kernel code.
[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://duasynt.com/blog/linux-kernel-heap-spray
[3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
[4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
[5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808
This patch (of 2):
userfaultfd handles page faults from both user and kernel code. Add a new
UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting
userfaultfd object refuse to handle faults from kernel mode, treating
these faults as if SIGBUS were always raised, causing the kernel code to
fail with EFAULT.
A future patch adds a knob allowing administrators to give some processes
the ability to create userfaultfd file objects only if they pass
UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will
exploit userfaultfd's ability to delay kernel page faults to open timing
windows for future exploits.
Link: https://lkml.kernel.org/r/20201120030411.2690816-1-lokeshgidra@google.com
Link: https://lkml.kernel.org/r/20201120030411.2690816-2-lokeshgidra@google.com
Signed-off-by: Daniel Colascione <dancol@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <calin@google.com>
Cc: Daniel Colascione <dancol@dancol.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nitin Gupta <nigupta@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shaohua Li <shli@fb.com>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ARM is the only architecture that defines CONFIG_ARCH_HAS_HOLES_MEMORYMODEL
which in turn enables memmap_valid_within() function that is intended to
verify existence of struct page associated with a pfn when there are holes
in the memory map.
However, the ARCH_HAS_HOLES_MEMORYMODEL also enables HAVE_ARCH_PFN_VALID
and arch-specific pfn_valid() implementation that also deals with the holes
in the memory map.
The only two users of memmap_valid_within() call this function after
a call to pfn_valid() so the memmap_valid_within() check becomes redundant.
Remove CONFIG_ARCH_HAS_HOLES_MEMORYMODEL and memmap_valid_within() and rely
entirely on ARM's implementation of pfn_valid() that is now enabled
unconditionally.
Link: https://lkml.kernel.org/r/20201101170454.9567-9-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Michael Schmitz <schmitzmic@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As kernel expect to see only one of such mappings, any further operations
on the VMA-copy may be unexpected by the kernel. Maybe it's being on the
safe side, but there doesn't seem to be any expected use-case for this, so
restrict it now.
Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com
Fixes: commit e346b3813067 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
For many workloads, pagetable consumption is significant and it makes
sense to expose it in the memory.stat for the memory cgroups. However at
the moment, the pagetables are accounted per-zone. Converting them to
per-node and using the right interface will correctly account for the
memory cgroups as well.
[akpm@linux-foundation.org: export __mod_lruvec_page_state to modules for arch/mips/kvm/]
Link: https://lkml.kernel.org/r/20201130212541.2781790-3-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Running stress-ng on ocfs2 completely fills the kernel log with 'max
lookup times reached, filesystem may have nested directories.'
Let's ratelimit this message as done with others in the code.
Test-case:
# mkfs.ocfs2 --mount local $DEV
# mount $DEV $MNT
# cd $MNT
# dmesg -C
# stress-ng --dirdeep 1 --dirdeep-ops 1000
# dmesg | grep -c 'max lookup times reached'
Before:
# dmesg -C
# stress-ng --dirdeep 1 --dirdeep-ops 1000
...
stress-ng: info: [11116] successful run completed in 3.03s
# dmesg | grep -c 'max lookup times reached'
967
After:
# dmesg -C
# stress-ng --dirdeep 1 --dirdeep-ops 1000
...
stress-ng: info: [739] successful run completed in 0.96s
# dmesg | grep -c 'max lookup times reached'
10
# dmesg
[ 259.086086] ocfs2_check_if_ancestor: 1990 callbacks suppressed
[ 259.086092] (stress-ng-dirde,740,1):ocfs2_check_if_ancestor:1091 max lookup times reached, filesystem may have nested directories, src inode: 18007, dest inode: 17940.
...
Link: https://lkml.kernel.org/r/20201001224417.478263-1-mfo@canonical.com
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A break is not needed if it is preceded by a goto
Link: https://lkml.kernel.org/r/20201019175216.2329-1-trix@redhat.com
Signed-off-by: Tom Rix <trix@redhat.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|