Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
"Tracing fixes and clean ups:
- Replace strlcpy() with strscpy()
- Initialize the pipe cpumask to zero on allocation
- Use within_module() instead of open coding it
- Remove extra space in hwlat_detectory/mode output
- Use LIST_HEAD() instead of open coding it
- A bunch of clean ups and fixes for the cpumask filter
- Set local da_mon_##name to static
- Fix race in snapshot buffer between cpu write and swap"
* tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/filters: Fix coding style issues
tracing/filters: Change parse_pred() cpulist ternary into an if block
tracing/filters: Fix double-free of struct filter_pred.mask
tracing/filters: Fix error-handling of cpulist parsing buffer
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
ftrace: Use LIST_HEAD to initialize clear_hash
ftrace: Use within_module to check rec->ip within specified module.
tracing: Replace strlcpy with strscpy in trace/events/task.h
tracing: Fix race issue between cpu buffer write and swap
tracing: Remove extra space at the end of hwlat_detector/mode
rv: Set variable 'da_mon_##name' to static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix false positive 'softirq work is pending' messages on -rt kernels,
caused by a buggy factoring-out of existing code"
* tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/rcu: Fix false positive "softirq work is pending" messages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:
"Fix a CPU hotplug related deadlock between the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer"
* tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Prevent self deadlock on CPU hot-unplug
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Miscellaneous scheduler fixes: a reporting fix, a static symbol fix,
and a kernel-doc fix"
* tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE
sched/fair: Make update_entity_lag() static
sched/core: Add kernel-doc for set_cpus_allowed_ptr()
|
|
Sudip Mukherjee reports that the mips sb1250_swarm_defconfig build fails
with the current kernel. It isn't actually MIPS-specific, it's just
that that defconfig does not have CGROUP_SCHED enabled like most configs
do, and as such shows this error:
kernel/cgroup/cgroup.c: In function 'cgroup_local_stat_show':
kernel/cgroup/cgroup.c:3699:15: error: implicit declaration of function 'cgroup_tryget_css'; did you mean 'cgroup_tryget'? [-Werror=implicit-function-declaration]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^~~~~~~~~~~~~~~~~
| cgroup_tryget
kernel/cgroup/cgroup.c:3699:13: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^
because cgroup_tryget_css() only exists when CGROUP_SCHED is enabled,
and the cgroup_local_stat_show() function should similarly be guarded by
that config option.
Move things around a bit to fix this all.
Fixes: d1d4ff5d11a5 ("cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED")
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Recent commits have introduced some coding style issues, fix those up.
Link: https://lkml.kernel.org/r/20230901151039.125186-5-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Review comments noted that an if block would be clearer than a ternary, so
swap it out.
No change in behaviour intended
Link: https://lkml.kernel.org/r/20230901151039.125186-4-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When a cpulist filter is found to contain a single CPU, that CPU is saved
as a scalar and the backing cpumask storage is freed.
Also NULL the mask to avoid a double-free once we get down to
free_predicate().
Link: https://lkml.kernel.org/r/20230901151039.125186-3-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
parse_pred() allocates a string buffer to parse the user-provided cpulist,
but doesn't check the allocation result nor does it free the buffer once it
is no longer needed.
Add an allocation check, and free the buffer as soon as it is no longer
needed.
Link: https://lkml.kernel.org/r/20230901151039.125186-2-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:
# cat /sys/kernel/debug/tracing/trace_pipe
cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy
Zero the allocation of pipe_cpumask to avoid the problem.
Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com
Cc: stable@vger.kernel.org
Fixes: c2489bb7e6be ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes")
Reviewed-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Use LIST_HEAD() to initialize clear_hash instead of open-coding it.
Link: https://lore.kernel.org/linux-trace-kernel/20230809071551.913041-1-ruanjinjie@huawei.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
within_module_core && within_module_init condition is same to
within module but it's more readable.
Use within_module instead of former condition to check rec->ip
within specified module area or not.
Link: https://lore.kernel.org/linux-trace-kernel/20230803205236.32201-1-ppbuk5246@gmail.com
Signed-off-by: Levi Yun <ppbuk5246@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes: f1affcaaa861 ("tracing: Add snapshot in the per_cpu trace directories")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Space is printed after each mode value including the last one:
$ echo \"$(sudo cat /sys/kernel/tracing/hwlat_detector/mode)\"
"none [round-robin] per-cpu "
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lore.kernel.org/linux-trace-kernel/20230825103432.7750-1-m.kobuk@ispras.ru
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 8fa826b7344d ("trace/hwlat: Implement the mode config option")
Signed-off-by: Mikhail Kobuk <m.kobuk@ispras.ru>
Reviewed-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"User visible changes:
- Added a way to easier filter with cpumasks:
# echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter
- Show actual size of ring buffer after modifying the ring buffer
size via buffer_size_kb.
Currently it just returns what was written, but the actual size
rounds up to the sub buffer size. Show that real size instead.
Major changes:
- Added "eventfs". This is the code that handles the inodes and
dentries of tracefs/events directory. As there are thousands of
events, and each event has several inodes and dentries that
currently exist even when tracing is never used, they take up
precious memory. Instead, eventfs will allocate the inodes and
dentries in a JIT way (similar to what procfs does). There is now
metadata that handles the events and subdirectories, and will
create the inodes and dentries when they are used.
Note, I also have patches that remove the subdirectory meta data,
but will wait till the next merge window before applying them. It's
a little more complex, and I want to make sure the dynamic code
works properly before adding more complexity, making it easier to
revert if need be.
Minor changes:
- Optimization to user event list traversal
- Remove intermediate permission of tracefs files (note the
intermediate permission removes all access to the files so it is
not a security concern, but just a clean up)
- Add the complex fix to FORTIFY_SOURCE to the kernel stack event
logic
- Other minor cleanups"
* tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
tracefs: Remove kerneldoc from struct eventfs_file
tracefs: Avoid changing i_mode to a temp value
tracing/user_events: Optimize safe list traversals
ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon()
tracing: Remove unused function declarations
tracing/filters: Document cpumask filtering
tracing/filters: Further optimise scalar vs cpumask comparison
tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU
tracing/filters: Enable filtering the CPU common field by a cpumask
tracing/filters: Enable filtering a scalar field by a cpumask
tracing/filters: Enable filtering a cpumask field by another cpumask
tracing/filters: Dynamically allocate filter_pred.regex
test: ftrace: Fix kprobe test for eventfs
eventfs: Move tracing/events to eventfs
eventfs: Implement removal of meta data from eventfs
eventfs: Implement functions to create files and dirs when accessed
eventfs: Implement eventfs lookup, read, open functions
eventfs: Implement eventfs file add functions
...
|
|
Pull workqueue updates from Tejun Heo:
- Unbound workqueues now support more flexible affinity scopes.
The default behavior is to soft-affine according to last level cache
boundaries. A work item queued from a given LLC is executed by a
worker running on the same LLC but the worker may be moved across
cache boundaries as the scheduler sees fit. On machines which
multiple L3 caches, which are becoming more popular along with
chiplet designs, this improves cache locality while not harming work
conservation too much.
Unbound workqueues are now also a lot more flexible in terms of
execution affinity. Differeing levels of affinity scopes are
supported and both the default and per-workqueue affinity settings
can be modified dynamically. This should help working around amny of
sub-optimal behaviors observed recently with asymmetric ARM CPUs.
This involved signficant restructuring of workqueue code. Nothing was
reported yet but there's some risk of subtle regressions. Should keep
an eye out.
- Rescuer workers now has more identifiable comms.
- workqueue.unbound_cpus added so that CPUs which can be used by
workqueue can be constrained early during boot.
- Now that all the in-tree users have been flushed out, trigger warning
if system-wide workqueues are flushed.
* tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (31 commits)
workqueue: fix data race with the pwq->stats[] increment
workqueue: Rename rescuer kworker
workqueue: Make default affinity_scope dynamically updatable
workqueue: Add "Affinity Scopes and Performance" section to documentation
workqueue: Implement non-strict affinity scope for unbound workqueues
workqueue: Add workqueue_attrs->__pod_cpumask
workqueue: Factor out need_more_worker() check and worker wake-up
workqueue: Factor out work to worker assignment and collision handling
workqueue: Add multiple affinity scopes and interface to select them
workqueue: Modularize wq_pod_type initialization
workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue configuration
workqueue: Generalize unbound CPU pods
workqueue: Factor out clearing of workqueue-only attrs fields
workqueue: Factor out actual cpumask calculation to reduce subtlety in wq_update_pod()
workqueue: Initialize unbound CPU pods later in the boot
workqueue: Move wq_pod_init() below workqueue_init()
workqueue: Rename NUMA related names to use pod instead
workqueue: Rename workqueue_attrs->no_numa to ->ordered
workqueue: Make unbound workqueues to use per-cpu pool_workqueues
workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Per-cpu cpu usage stats are now tracked
This currently isn't printed out in the cgroupfs interface and can
only be accessed through e.g. BPF. Should decide on a not-too-ugly
way to show per-cpu stats in cgroupfs
- cpuset received some cleanups and prepatory patches for the pending
cpus.exclusive patchset which will allow cpuset partitions to be
created below non-partition parents, which should ease the management
of partition cpusets
- A lot of code and documentation cleanup patches
- tools/testing/selftests/cgroup/test_cpuset.c added
* tag 'cgroup-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (32 commits)
cgroup: Avoid -Wstringop-overflow warnings
cgroup:namespace: Remove unused cgroup_namespaces_init()
cgroup/rstat: Record the cumulative per-cpu time of cgroup and its descendants
cgroup: clean up if condition in cgroup_pidlist_start()
cgroup: fix obsolete function name in cgroup_destroy_locked()
Documentation: cgroup-v2.rst: Correct number of stats entries
cgroup: fix obsolete function name above css_free_rwork_fn()
cgroup/cpuset: fix kernel-doc
cgroup: clean up printk()
cgroup: fix obsolete comment above cgroup_create()
docs: cgroup-v1: fix typo
docs: cgroup-v1: correct the term of Page Cache organization in inode
cgroup/misc: Store atomic64_t reads to u64
cgroup/misc: Change counters to be explicit 64bit types
cgroup/misc: update struct members descriptions
cgroup: remove cgrp->kn check in css_populate_dir()
cgroup: fix obsolete function name
cgroup: use cached local variable parent in for loop
cgroup: remove obsolete comment above struct cgroupstats
cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu
Pull percpu updates from Dennis Zhou:
"One bigger change to percpu_counter's api allowing for init and
destroy of multiple counters via percpu_counter_init_many() and
percpu_counter_destroy_many(). This is used to help begin remediating
a performance regression with percpu rss stats.
Additionally, it seems larger core count machines are feeling the
burden of the single threaded allocation of percpu. Mateusz is
thinking about it and I will spend some time on it too.
percpu:
- A couple cleanups by Baoquan He and Bibo Mao. The only behavior
change is to start printing messages if we're under the warn limit
for failed atomic allocations.
percpu_counter:
- Shakeel introduced percpu counters into mm_struct which caused
percpu allocations be on the hot path [1]. Originally I spent some
time trying to improve the percpu allocator, but instead preferred
what Mateusz Guzik proposed grouping at the allocation site,
percpu_counter_init_many(). This allows a single percpu allocation
to be shared by the counters. I like this approach because it
creates a shared lifetime by the allocations. Additionally, I
believe many inits have higher level synchronization requirements,
like percpu_counter does against HOTPLUG_CPU. Therefore we can
group these optimizations together"
Link: https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/ [1]
* tag 'percpu-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
kernel/fork: group allocation/free of per-cpu counters for mm struct
pcpcntr: add group allocation/free
mm/percpu.c: print error message too if atomic alloc failed
mm/percpu.c: optimize the code in pcpu_setup_first_chunk() a little bit
mm/percpu.c: remove redundant check
mm/percpu: Remove some local variables in pcpu_populate_pte
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.6-rc1.
Lots of cleanups in here this cycle, and some driver updates. Short
summary is:
- Jiri's continued work to make the tty code and apis be a bit more
sane with regards to modern kernel coding style and types
- cpm_uart driver updates
- n_gsm updates and fixes
- meson driver updates
- sc16is7xx driver updates
- 8250 driver updates for different hardware types
- qcom-geni driver fixes
- tegra serial driver change
- stm32 driver updates
- synclink_gt driver cleanups
- tty structure size reduction
All of these have been in linux-next this week with no reported
issues. The last bit of cleanups from Jiri and the tty structure size
reduction came in last week, a bit late but as they were just style
changes and size reductions, I figured they should get into this merge
cycle so that others can work on top of them with no merge conflicts"
* tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
tty: shrink the size of struct tty_struct by 40 bytes
tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw()
tty: n_tty: extract ECHO_OP processing to a separate function
tty: n_tty: unify counts to size_t
tty: n_tty: use u8 for chars and flags
tty: n_tty: simplify chars_in_buffer()
tty: n_tty: remove unsigned char casts from character constants
tty: n_tty: move newline handling to a separate function
tty: n_tty: move canon handling to a separate function
tty: n_tty: use MASK() for masking out size bits
tty: n_tty: make n_tty_data::num_overrun unsigned
tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun()
tty: n_tty: use 'num' for writes' counts
tty: n_tty: use output character directly
tty: n_tty: make flow of n_tty_receive_buf_common() a bool
Revert "tty: serial: meson: Add a earlycon for the T7 SoC"
Documentation: devices.txt: Fix minors for ttyCPM*
Documentation: devices.txt: Remove ttySIOC*
Documentation: devices.txt: Remove ttyIOC*
serial: 8250_bcm7271: improve bcm7271 8250 port
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the
configured SMT state when hotplugging CPUs into the system
- Combine final TLB flush and lazy TLB mm shootdown IPIs when using the
Radix MMU to avoid a broadcast TLBIE flush on exit
- Drop the exclusion between ptrace/perf watchpoints, and drop the now
unused associated arch hooks
- Add support for the "nohlt" command line option to disable CPU idle
- Add support for -fpatchable-function-entry for ftrace, with GCC >=
13.1
- Rework memory block size determination, and support 256MB size on
systems with GPUs that have hotpluggable memory
- Various other small features and fixes
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam
Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel
Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof
Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar,
Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor,
Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar
Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh
Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain,
Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai.
* tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits)
macintosh/ams: linux/platform_device.h is needed
powerpc/xmon: Reapply "Relax frame size for clang"
powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached
powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled
powerpc/iommu: Fix notifiers being shared by PCI and VIO buses
powerpc/mpc5xxx: Add missing fwnode_handle_put()
powerpc/config: Disable SLAB_DEBUG_ON in skiroot
powerpc/pseries: Remove unused hcall tracing instruction
powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n
powerpc: dts: add missing space before {
powerpc/eeh: Use pci_dev_id() to simplify the code
powerpc/64s: Move CPU -mtune options into Kconfig
powerpc/powermac: Fix unused function warning
powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT
powerpc: Don't include lppaca.h in paca.h
powerpc/pseries: Move hcall_vphn() prototype into vphn.h
powerpc/pseries: Move VPHN constants into vphn.h
cxl: Drop unused detach_spa()
powerpc: Drop zalloc_maybe_bootmem()
powerpc/powernv: Use struct opal_prd_msg in more places
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 shadow stack support from Dave Hansen:
"This is the long awaited x86 shadow stack support, part of Intel's
Control-flow Enforcement Technology (CET).
CET consists of two related security features: shadow stacks and
indirect branch tracking. This series implements just the shadow stack
part of this feature, and just for userspace.
The main use case for shadow stack is providing protection against
return oriented programming attacks. It works by maintaining a
secondary (shadow) stack using a special memory type that has
protections against modification. When executing a CALL instruction,
the processor pushes the return address to both the normal stack and
to the special permission shadow stack. Upon RET, the processor pops
the shadow stack copy and compares it to the normal stack copy.
For more information, refer to the links below for the earlier
versions of this patch set"
Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/
* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
x86/shstk: Change order of __user in type
x86/ibt: Convert IBT selftest to asm
x86/shstk: Don't retry vm_munmap() on -EINTR
x86/kbuild: Fix Documentation/ reference
x86/shstk: Move arch detail comment out of core mm
x86/shstk: Add ARCH_SHSTK_STATUS
x86/shstk: Add ARCH_SHSTK_UNLOCK
x86: Add PTRACE interface for shadow stack
selftests/x86: Add shadow stack test
x86/cpufeatures: Enable CET CR4 bit for shadow stack
x86/shstk: Wire in shadow stack interface
x86: Expose thread features in /proc/$PID/status
x86/shstk: Support WRSS for userspace
x86/shstk: Introduce map_shadow_stack syscall
x86/shstk: Check that signal frame is shadow stack mem
x86/shstk: Check that SSP is aligned on sigreturn
x86/shstk: Handle signals for shadow stack
x86/shstk: Introduce routines modifying shstk
x86/shstk: Handle thread shadow stack
x86/shstk: Add user-mode shadow stack support
...
|
|
Move the #endif a line so that free_page label is only seen by the
compile pass when actually used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chunhui He <hchunhui@mail.ustc.edu.cn>
Reviewed-by: Robin Murphy <roin.murphy@arm.com>
|
|
Pull documentation updates from Jonathan Corbet:
"Documentation work keeps chugging along; this includes:
- Work from Carlos Bilbao to integrate rustdoc output into the
generated HTML documentation. This took some work to figure out how
to do it without slowing the docs build and without creating people
who don't have Rust installed, but Carlos got there
- Move the loongarch and mips architecture documentation under
Documentation/arch/
- Some more maintainer documentation from Jakub
... plus the usual assortment of updates, translations, and fixes"
* tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits)
Docu: genericirq.rst: fix irq-example
input: docs: pxrc: remove reference to phoenix-sim
Documentation: serial-console: Fix literal block marker
docs/mm: remove references to hmm_mirror ops and clean typos
docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add()
Documentation: Fix typos
Documentation/ABI: Fix typos
scripts: kernel-doc: fix macro handling in enums
scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN]
Documentation: riscv: Update boot image header since EFI stub is supported
Documentation: riscv: Add early boot document
Documentation: arm: Add bootargs to the table of added DT parameters
docs: kernel-parameters: Refer to the correct bitmap function
doc: update params of memhp_default_state=
docs: Add book to process/kernel-docs.rst
docs: sparse: fix invalid link addresses
docs: vfs: clean up after the iterate() removal
docs: Add a section on surveys to the researcher guidelines
docs: move mips under arch
docs: move loongarch under arch
...
|
|
Resetting rules' stateful data happens outside of the transaction logic,
so 'get' and 'dump' handlers have to emit audit log entries themselves.
Fixes: 8daa8fde3fc3f ("netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since set element reset is not integrated into nf_tables' transaction
logic, an explicit log call is needed, similar to NFT_MSG_GETOBJ_RESET
handling.
For the sake of simplicity, catchall element reset will always generate
a dedicated log entry. This relieves nf_tables_dump_set() from having to
adjust the logged element count depending on whether a catchall element
was found or not.
Fixes: 079cd633219d7 ("netfilter: nf_tables: Introduce NFT_MSG_GETSETELEM_RESET")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity subsystem updates from Mimi Zohar:
- With commit 099f26f22f58 ("integrity: machine keyring CA
configuration") certificates may be loaded onto the IMA keyring,
directly or indirectly signed by keys on either the "builtin" or the
"machine" keyrings.
With the ability for the system/machine owner to sign the IMA policy
itself without needing to recompile the kernel, update the IMA
architecture specific policy rules to require the IMA policy itself
be signed.
[ As commit 099f26f22f58 was upstreamed in linux-6.4, updating the
IMA architecture specific policy now to require signed IMA policies
may break userspace expectations. ]
- IMA only checked the file data hash was not on the system blacklist
keyring for files with an appended signature (e.g. kernel modules,
Power kernel image).
Check all file data hashes regardless of how it was signed
- Code cleanup, and a kernel-doc update
* tag 'integrity-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
kexec_lock: Replace kexec_mutex() by kexec_lock() in two comments
ima: require signed IMA policy when UEFI secure boot is enabled
integrity: Always reference the blacklist keyring with appraisal
ima: Remove deprecated IMA_TRUSTED_KEYRING Kconfig
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Add proper multi-LSM support for xattrs in the
security_inode_init_security() hook
Historically the LSM layer has only allowed a single LSM to add an
xattr to an inode, with IMA/EVM measuring that and adding its own as
well. As we work towards promoting IMA/EVM to a "proper LSM" instead
of the special case that it is now, we need to better support the
case of multiple LSMs each adding xattrs to an inode and after
several attempts we now appear to have something that is working
well. It is worth noting that in the process of making this change we
uncovered a problem with Smack's SMACK64TRANSMUTE xattr which is also
fixed in this pull request.
- Additional LSM hook constification
Two patches to constify parameters to security_capget() and
security_binder_transfer_file(). While I generally don't make a
special note of who submitted these patches, these were the work of
an Outreachy intern, Khadija Kamran, and that makes me happy;
hopefully it does the same for all of you reading this.
- LSM hook comment header fixes
One patch to add a missing hook comment header, one to fix a minor
typo.
- Remove an old, unused credential function declaration
It wasn't clear to me who should pick this up, but it was trivial,
obviously correct, and arguably the LSM layer has a vested interest
in credentials so I merged it. Sadly I'm now noticing that despite my
subject line cleanup I didn't cleanup the "unsued" misspelling, sigh
* tag 'lsm-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: constify the 'file' parameter in security_binder_transfer_file()
lsm: constify the 'target' parameter in security_capget()
lsm: add comment block for security_sk_classify_flow LSM hook
security: Fix ret values doc for security_inode_init_security()
cred: remove unsued extern declaration change_create_files_as()
evm: Support multiple LSMs providing an xattr
evm: Align evm_inode_init_security() definition with LSM infrastructure
smack: Set the SMACK64TRANSMUTE xattr in smack_inode_init_security()
security: Allow all LSMs to provide xattrs for inode_init_security hook
lsm: fix typo in security_file_lock() comment header
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Six audit patches, the highlights are:
- Add an explicit cond_resched() call when generating PATH records
Certain tracefs/debugfs operations can generate a *lot* of audit
PATH entries and if one has an aggressive system configuration (not
the default) this can cause a soft lockup in the audit code as it
works to process all of these new entries.
This is in sharp contrast to the common case where only one or two
PATH entries are logged. In order to fix this corner case without
excessively impacting the common case we're adding a single
cond_rescued() call between two of the most intensive loops in the
__audit_inode_child() function.
- Various minor cleanups
We removed a conditional header file as the included header already
had the necessary logic in place, fixed a dummy function's return
value, and the usual collection of checkpatch.pl noise (whitespace,
brace, and trailing statement tweaks)"
* tag 'audit-pr-20230829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: move trailing statements to next line
audit: cleanup function braces and assignment-in-if-condition
audit: add space before parenthesis and around '=', "==", and '<'
audit: fix possible soft lockup in __audit_inode_child()
audit: correct audit_filter_inodes() definition
audit: include security.h unconditionally
|
|
It makes no sense to expose CONFIG_DMA_NUMA_CMA if CONFIG_NUMA is not
enabled, and random config options shouldn't be default unless there
is a good reason. Replace the default NUMA with a depends on to fix both
issues.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <roin.murphy@arm.com>
|
|
Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
CPU1 CPU2
T1 sets cfs_quota
starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2
T1 initiates offlining of CPU1
Hotplug operation starts
...
'period_timer' expires and is re-enqueued on CPU1
...
take_cpu_down()
CPU1 shuts down and does not handle timers
anymore. They have to be migrated in the
post dead hotplug steps by the control task.
T1 runs the post dead offline operation
T1 is scheduled out
T1 waits for 'period_timer' to expire
T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().
Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.
Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
|
|
In commit 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") the
new function report_idle_softirq() was created by breaking code out of the
existing can_stop_idle_tick() for kernels v5.18 and newer.
In doing so, the code essentially went from a one conditional:
if (a && b && c)
warn();
to a three conditional:
if (!a)
return;
if (!b)
return;
if (!c)
return;
warn();
But that conversion got the condition for the RT specific
local_bh_blocked() wrong. The original condition was:
!local_bh_blocked()
but the conversion failed to negate it so it ended up as:
if (!local_bh_blocked())
return false;
This issue lay dormant until another fixup for the same commit was added
in commit a7e282c77785 ("tick/rcu: Fix bogus ratelimit condition").
This commit realized the ratelimit was essentially set to zero instead
of ten, and hence *no* softirq pending messages would ever be issued.
Once this commit was backported via linux-stable, both the v6.1 and v6.4
preempt-rt kernels started printing out 10 instances of this at boot:
NOHZ tick-stop error: local softirq work is pending, handler #80!!!
Remove the negation and return when local_bh_blocked() evaluates to true to
bring the correct behaviour back.
Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Wen Yang <wenyang.linux@foxmail.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230818200757.1808398-1-paul.gortmaker@windriver.com
|
|
__dma_entry_alloc_check_leak() calls into printk -> serial console
output (qcom geni) and grabs port->lock under free_entries_lock
spin lock, which is a reverse locking dependency chain as qcom_geni
IRQ handler can call into dma-debug code and grab free_entries_lock
under port->lock.
Move __dma_entry_alloc_check_leak() call out of free_entries_lock
scope so that we don't acquire serial console's port->lock under it.
Trimmed-down lockdep splat:
The existing dependency chain (in reverse order) is:
-> #2 (free_entries_lock){-.-.}-{2:2}:
_raw_spin_lock_irqsave+0x60/0x80
dma_entry_alloc+0x38/0x110
debug_dma_map_page+0x60/0xf8
dma_map_page_attrs+0x1e0/0x230
dma_map_single_attrs.constprop.0+0x6c/0xc8
geni_se_rx_dma_prep+0x40/0xcc
qcom_geni_serial_isr+0x310/0x510
__handle_irq_event_percpu+0x110/0x244
handle_irq_event_percpu+0x20/0x54
handle_irq_event+0x50/0x88
handle_fasteoi_irq+0xa4/0xcc
handle_irq_desc+0x28/0x40
generic_handle_domain_irq+0x24/0x30
gic_handle_irq+0xc4/0x148
do_interrupt_handler+0xa4/0xb0
el1_interrupt+0x34/0x64
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x64/0x68
arch_local_irq_enable+0x4/0x8
____do_softirq+0x18/0x24
...
-> #1 (&port_lock_key){-.-.}-{2:2}:
_raw_spin_lock_irqsave+0x60/0x80
qcom_geni_serial_console_write+0x184/0x1dc
console_flush_all+0x344/0x454
console_unlock+0x94/0xf0
vprintk_emit+0x238/0x24c
vprintk_default+0x3c/0x48
vprintk+0xb4/0xbc
_printk+0x68/0x90
register_console+0x230/0x38c
uart_add_one_port+0x338/0x494
qcom_geni_serial_probe+0x390/0x424
platform_probe+0x70/0xc0
really_probe+0x148/0x280
__driver_probe_device+0xfc/0x114
driver_probe_device+0x44/0x100
__device_attach_driver+0x64/0xdc
bus_for_each_drv+0xb0/0xd8
__device_attach+0xe4/0x140
device_initial_probe+0x1c/0x28
bus_probe_device+0x44/0xb0
device_add+0x538/0x668
of_device_add+0x44/0x50
of_platform_device_create_pdata+0x94/0xc8
of_platform_bus_create+0x270/0x304
of_platform_populate+0xac/0xc4
devm_of_platform_populate+0x60/0xac
geni_se_probe+0x154/0x160
platform_probe+0x70/0xc0
...
-> #0 (console_owner){-...}-{0:0}:
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
console_flush_all+0x330/0x454
console_unlock+0x94/0xf0
vprintk_emit+0x238/0x24c
vprintk_default+0x3c/0x48
vprintk+0xb4/0xbc
_printk+0x68/0x90
dma_entry_alloc+0xb4/0x110
debug_dma_map_sg+0xdc/0x2f8
__dma_map_sg_attrs+0xac/0xe4
dma_map_sgtable+0x30/0x4c
get_pages+0x1d4/0x1e4 [msm]
msm_gem_pin_pages_locked+0x38/0xac [msm]
msm_gem_pin_vma_locked+0x58/0x88 [msm]
msm_ioctl_gem_submit+0xde4/0x13ac [msm]
drm_ioctl_kernel+0xe0/0x15c
drm_ioctl+0x2e8/0x3f4
vfs_ioctl+0x30/0x50
...
Chain exists of:
console_owner --> &port_lock_key --> free_entries_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(free_entries_lock);
lock(&port_lock_key);
lock(free_entries_lock);
lock(console_owner);
*** DEADLOCK ***
Call trace:
dump_backtrace+0xb4/0xf0
show_stack+0x20/0x30
dump_stack_lvl+0x60/0x84
dump_stack+0x18/0x24
print_circular_bug+0x1cc/0x234
check_noncircular+0x78/0xac
__lock_acquire+0xdf8/0x109c
lock_acquire+0x234/0x284
console_flush_all+0x330/0x454
console_unlock+0x94/0xf0
vprintk_emit+0x238/0x24c
vprintk_default+0x3c/0x48
vprintk+0xb4/0xbc
_printk+0x68/0x90
dma_entry_alloc+0xb4/0x110
debug_dma_map_sg+0xdc/0x2f8
__dma_map_sg_attrs+0xac/0xe4
dma_map_sgtable+0x30/0x4c
get_pages+0x1d4/0x1e4 [msm]
msm_gem_pin_pages_locked+0x38/0xac [msm]
msm_gem_pin_vma_locked+0x58/0x88 [msm]
msm_ioctl_gem_submit+0xde4/0x13ac [msm]
drm_ioctl_kernel+0xe0/0x15c
drm_ioctl+0x2e8/0x3f4
vfs_ioctl+0x30/0x50
...
Reported-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-maping updates from Christoph Hellwig:
- allow dynamic sizing of the swiotlb buffer, to cater for secure
virtualization workloads that require all I/O to be bounce buffered
(Petr Tesarik)
- move a declaration to a header (Arnd Bergmann)
- check for memory region overlap in dma-contiguous (Binglei Wang)
- remove the somewhat dangerous runtime swiotlb-xen enablement and
unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)
- per-node CMA improvements (Yajun Deng)
* tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: optimize get_max_slots()
swiotlb: move slot allocation explanation comment where it belongs
swiotlb: search the software IO TLB only if the device makes use of it
swiotlb: allocate a new memory pool when existing pools are full
swiotlb: determine potential physical address limit
swiotlb: if swiotlb is full, fall back to a transient memory pool
swiotlb: add a flag whether SWIOTLB is allowed to grow
swiotlb: separate memory pool data from other allocator data
swiotlb: add documentation and rename swiotlb_do_find_slots()
swiotlb: make io_tlb_default_mem local to swiotlb.c
swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated
dma-contiguous: check for memory region overlap
dma-contiguous: support numa CMA for specified node
dma-contiguous: support per-numa CMA for all architectures
dma-mapping: move arch_dma_set_mask() declaration to header
swiotlb: unexport is_swiotlb_active
x86: always initialize xen-swiotlb when xen-pcifront is enabling
xen/pci: add flag for PCI passthrough being possible
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"Long ago we set out to remove the kitchen sink on kernel/sysctl.c
arrays and placings sysctls to their own sybsystem or file to help
avoid merge conflicts. Matthew Wilcox pointed out though that if we're
going to do that we might as well also *save* space while at it and
try to remove the extra last sysctl entry added at the end of each
array, a sentintel, instead of bloating the kernel by adding a new
sentinel with each array moved.
Doing that was not so trivial, and has required slowing down the moves
of kernel/sysctl.c arrays and measuring the impact on size by each new
move.
The complex part of the effort to help reduce the size of each sysctl
is being done by the patient work of el señor Don Joel Granados. A lot
of this is truly painful code refactoring and testing and then trying
to measure the savings of each move and removing the sentinels.
Although Joel already has code which does most of this work,
experience with sysctl moves in the past shows is we need to be
careful due to the slew of odd build failures that are possible due to
the amount of random Kconfig options sysctls use.
To that end Joel's work is split by first addressing the major
housekeeping needed to remove the sentinels, which is part of this
merge request. The rest of the work to actually remove the sentinels
will be done later in future kernel releases.
The preliminary math is showing this will all help reduce the overall
build time size of the kernel and run time memory consumed by the
kernel by about ~64 bytes per array where we are able to remove each
sentinel in the future. That also means there is no more bloating the
kernel with the extra ~64 bytes per array moved as no new sentinels
are created"
* tag 'sysctl-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: Use ctl_table_size as stopping criteria for list macro
sysctl: SIZE_MAX->ARRAY_SIZE in register_net_sysctl
vrf: Update to register_net_sysctl_sz
networking: Update to register_net_sysctl_sz
netfilter: Update to register_net_sysctl_sz
ax.25: Update to register_net_sysctl_sz
sysctl: Add size to register_net_sysctl function
sysctl: Add size arg to __register_sysctl_init
sysctl: Add size to register_sysctl
sysctl: Add a size arg to __register_sysctl_table
sysctl: Add size argument to init_header
sysctl: Add ctl_table_size to ctl_table_header
sysctl: Use ctl_table_header in list_for_each_table_entry
sysctl: Prefer ctl_table_header in proc_sysctl
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
"Summary of the changes worth highlighting from most interesting to
boring below:
- Christoph Hellwig's symbol_get() fix to Nvidia's efforts to
circumvent the protection he put in place in year 2020 to prevent
proprietary modules from using GPL only symbols, and also ensuring
proprietary modules which export symbols grandfather their taint.
That was done through year 2020 commit 262e6ae7081d ("modules:
inherit TAINT_PROPRIETARY_MODULE"). Christoph's new fix is done by
clarifing __symbol_get() was only ever intended to prevent module
reference loops by Linux kernel modules and so making it only find
symbols exported via EXPORT_SYMBOL_GPL(). The circumvention tactic
used by Nvidia was to use symbol_get() to purposely swift through
proprietary module symbols and completely bypass our traditional
EXPORT_SYMBOL*() annotations and community agreed upon
restrictions.
A small set of preamble patches fix up a few symbols which just
needed adjusting for this on two modules, the rtc ds1685 and the
networking enetc module. Two other modules just needed some build
fixing and removal of use of __symbol_get() as they can't ever be
modular, as was done by Arnd on the ARM pxa module and Christoph
did on the mmc au1xmmc driver.
This is a good reminder to us that symbol_get() is just a hack to
address things which should be fixed through Kconfig at build time
as was done in the later patches, and so ultimately it should just
go.
- Extremely late minor fix for old module layout 055f23b74b20
("module: check for exit sections in layout_sections() instead of
module_init_section()") by James Morse for arm64. Note that this
layout thing is old, it is *not* Song Liu's commit ac3b43283923
("module: replace module_layout with module_memory"). The issue
however is very odd to run into and so there was no hurry to get
this in fast.
- Although the fix did not go through the modules tree I'd like to
highlight the fix by Peter Zijlstra in commit 54097309620e
("x86/static_call: Fix __static_call_fixup()") now merged in your
tree which came out of what was originally suspected to be a
fallout of the the newer module layout changes by Song Liu commit
ac3b43283923 ("module: replace module_layout with module_memory")
instead of module_init_section()"). Thanks to the report by
Christian Bricart and the debugging by Song Liu & Peter that turned
to be noted as a kernel regression in place since v5.19 through
commit ee88d363d156 ("x86,static_call: Use alternative RET
encoding").
I highlight this to reflect and clarify that we haven't seen more
fallout from ac3b43283923 ("module: replace module_layout with
module_memory").
- RISC-V toolchain got mapping symbol support which prefix symbols
with "$" to help with alignment considerations for disassembly.
This is used to differentiate between incompatible instruction
encodings when disassembling. RISC-V just matches what ARM/AARCH64
did for alignment considerations and Palmer Dabbelt extended
is_mapping_symbol() to accept these symbols for RISC-V. We already
had support for this for all architectures but it also checked for
the second character, the RISC-V check Dabbelt added was just for
the "$". After a bit of testing and fallout on linux-next and based
on feedback from Masahiro Yamada it was decided to simplify the
check and treat the first char "$" as unique for all architectures,
and so we no make is_mapping_symbol() for all archs if the symbol
starts with "$".
The most relevant commit for this for RISC-V on binutils was:
https://sourceware.org/pipermail/binutils/2021-July/117350.html
- A late fix by Andrea Righi (today) to make module zstd
decompression use vmalloc() instead of kmalloc() to account for
large compressed modules. I suspect we'll see similar things for
other decompression algorithms soon.
- samples/hw_breakpoint minor fixes by Rong Tao, Arnd Bergmann and
Chen Jiahao"
* tag 'modules-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module/decompress: use vmalloc() for zstd decompression workspace
kallsyms: Add more debug output for selftest
ARM: module: Use module_init_layout_section() to spot init sections
arm64: module: Use module_init_layout_section() to spot init sections
module: Expose module_init_layout_section()
modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules
rtc: ds1685: use EXPORT_SYMBOL_GPL for ds1685_rtc_poweroff
net: enetc: use EXPORT_SYMBOL_GPL for enetc_phc_index
mmc: au1xmmc: force non-modular build and remove symbol_get usage
ARM: pxa: remove use of symbol_get()
samples/hw_breakpoint: mark sample_hbp as static
samples/hw_breakpoint: fix building without module unloading
samples/hw_breakpoint: Fix kernel BUG 'invalid opcode: 0000'
modpost, kallsyms: Treat add '$'-prefixed symbols as mapping symbols
kernel: params: Remove unnecessary ‘0’ values from err
module: Ignore RISC-V mapping symbols too
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- An extensive rework of kexec and crash Kconfig from Eric DeVolder
("refactor Kconfig to consolidate KEXEC and CRASH options")
- kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a
couple of macros to args.h")
- gdb feature work from Kuan-Ying Lee ("Add GDB memory helper
commands")
- vsprintf inclusion rationalization from Andy Shevchenko
("lib/vsprintf: Rework header inclusions")
- Switch the handling of kdump from a udev scheme to in-kernel
handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory
hot un/plug")
- Many singleton patches to various parts of the tree
* tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits)
document while_each_thread(), change first_tid() to use for_each_thread()
drivers/char/mem.c: shrink character device's devlist[] array
x86/crash: optimize CPU changes
crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()
crash: hotplug support for kexec_load()
x86/crash: add x86 crash hotplug support
crash: memory and CPU hotplug sysfs attributes
kexec: exclude elfcorehdr from the segment digest
crash: add generic infrastructure for crash hotplug support
crash: move a few code bits to setup support of crash hotplug
kstrtox: consistently use _tolower()
kill do_each_thread()
nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
scripts/bloat-o-meter: count weak symbol sizes
treewide: drop CONFIG_EMBEDDED
lockdep: fix static memory detection even more
lib/vsprintf: declare no_hash_pointers in sprintf.h
lib/vsprintf: split out sprintf() and friends
kernel/fork: stop playing lockless games for exe_file replacement
adfs: delete unused "union adfs_dirtail" definition
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Some swap cleanups from Ma Wupeng ("fix WARN_ON in
add_to_avail_list")
- Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which
reduces the special-case code for handling hugetlb pages in GUP. It
also speeds up GUP handling of transparent hugepages.
- Peng Zhang provides some maple tree speedups ("Optimize the fast path
of mas_store()").
- Sergey Senozhatsky has improved te performance of zsmalloc during
compaction (zsmalloc: small compaction improvements").
- Domenico Cerasuolo has developed additional selftest code for zswap
("selftests: cgroup: add zswap test program").
- xu xin has doe some work on KSM's handling of zero pages. These
changes are mainly to enable the user to better understand the
effectiveness of KSM's treatment of zero pages ("ksm: support
tracking KSM-placed zero-pages").
- Jeff Xu has fixes the behaviour of memfd's
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl
MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED").
- David Howells has fixed an fscache optimization ("mm, netfs, fscache:
Stop read optimisation when folio removed from pagecache").
- Axel Rasmussen has given userfaultfd the ability to simulate memory
poisoning ("add UFFDIO_POISON to simulate memory poisoning with
UFFD").
- Miaohe Lin has contributed some routine maintenance work on the
memory-failure code ("mm: memory-failure: remove unneeded PageHuge()
check").
- Peng Zhang has contributed some maintenance work on the maple tree
code ("Improve the validation for maple tree and some cleanup").
- Hugh Dickins has optimized the collapsing of shmem or file pages into
THPs ("mm: free retracted page table by RCU").
- Jiaqi Yan has a patch series which permits us to use the healthy
subpages within a hardware poisoned huge page for general purposes
("Improve hugetlbfs read on HWPOISON hugepages").
- Kemeng Shi has done some maintenance work on the pagetable-check code
("Remove unused parameters in page_table_check").
- More folioification work from Matthew Wilcox ("More filesystem folio
conversions for 6.6"), ("Followup folio conversions for zswap"). And
from ZhangPeng ("Convert several functions in page_io.c to use a
folio").
- page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext").
- Baoquan He has converted some architectures to use the
GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert
architectures to take GENERIC_IOREMAP way").
- Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support
batched/deferred tlb shootdown during page reclamation/migration").
- Better maple tree lockdep checking from Liam Howlett ("More strict
maple tree lockdep"). Liam also developed some efficiency
improvements ("Reduce preallocations for maple tree").
- Cleanup and optimization to the secondary IOMMU TLB invalidation,
from Alistair Popple ("Invalidate secondary IOMMU TLB on permission
upgrade").
- Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes
for arm64").
- Kemeng Shi provides some maintenance work on the compaction code
("Two minor cleanups for compaction").
- Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle
most file-backed faults under the VMA lock").
- Aneesh Kumar contributes code to use the vmemmap optimization for DAX
on ppc64, under some circumstances ("Add support for DAX vmemmap
optimization for ppc64").
- page-ext cleanups from Kemeng Shi ("add page_ext_data to get client
data in page_ext"), ("minor cleanups to page_ext header").
- Some zswap cleanups from Johannes Weiner ("mm: zswap: three
cleanups").
- kmsan cleanups from ZhangPeng ("minor cleanups for kmsan").
- VMA handling cleanups from Kefeng Wang ("mm: convert to
vma_is_initial_heap/stack()").
- DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes:
implement DAMOS tried total bytes file"), ("Extend DAMOS filters for
address ranges and DAMON monitoring targets").
- Compaction work from Kemeng Shi ("Fixes and cleanups to compaction").
- Liam Howlett has improved the maple tree node replacement code
("maple_tree: Change replacement strategy").
- ZhangPeng has a general code cleanup - use the K() macro more widely
("cleanup with helper macro K()").
- Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for
memmap on memory feature on ppc64").
- pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list
in page_alloc"), ("Two minor cleanups for get pageblock
migratetype").
- Vishal Moola introduces a memory descriptor for page table tracking,
"struct ptdesc" ("Split ptdesc from struct page").
- memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups
for vm.memfd_noexec").
- MM include file rationalization from Hugh Dickins ("arch: include
asm/cacheflush.h in asm/hugetlb.h").
- THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text
output").
- kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use
object_cache instead of kmemleak_initialized").
- More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor
and _folio_order").
- A VMA locking scalability improvement from Suren Baghdasaryan
("Per-VMA lock support for swap and userfaults").
- pagetable handling cleanups from Matthew Wilcox ("New page table
range API").
- A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop
using page->private on tail pages for THP_SWAP + cleanups").
- Cleanups and speedups to the hugetlb fault handling from Matthew
Wilcox ("Change calling convention for ->huge_fault").
- Matthew Wilcox has also done some maintenance work on the MM
subsystem documentation ("Improve mm documentation").
* tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits)
maple_tree: shrink struct maple_tree
maple_tree: clean up mas_wr_append()
secretmem: convert page_is_secretmem() to folio_is_secretmem()
nios2: fix flush_dcache_page() for usage from irq context
hugetlb: add documentation for vma_kernel_pagesize()
mm: add orphaned kernel-doc to the rst files.
mm: fix clean_record_shared_mapping_range kernel-doc
mm: fix get_mctgt_type() kernel-doc
mm: fix kernel-doc warning from tlb_flush_rmaps()
mm: remove enum page_entry_size
mm: allow ->huge_fault() to be called without the mmap_lock held
mm: move PMD_ORDER to pgtable.h
mm: remove checks for pte_index
memcg: remove duplication detection for mem_cgroup_uncharge_swap
mm/huge_memory: work on folio->swap instead of page->private when splitting folio
mm/swap: inline folio_set_swap_entry() and folio_swap_entry()
mm/swap: use dedicated entry for swap in folio
mm/swap: stop using page->private on tail pages for THP_SWAP
selftests/mm: fix WARNING comparing pointer to 0
selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check
...
|
|
KCSAN has discovered a data race in kernel/workqueue.c:2598:
[ 1863.554079] ==================================================================
[ 1863.554118] BUG: KCSAN: data-race in process_one_work / process_one_work
[ 1863.554142] write to 0xffff963d99d79998 of 8 bytes by task 5394 on cpu 27:
[ 1863.554154] process_one_work (kernel/workqueue.c:2598)
[ 1863.554166] worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2752)
[ 1863.554177] kthread (kernel/kthread.c:389)
[ 1863.554186] ret_from_fork (arch/x86/kernel/process.c:145)
[ 1863.554197] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
[ 1863.554213] read to 0xffff963d99d79998 of 8 bytes by task 5450 on cpu 12:
[ 1863.554224] process_one_work (kernel/workqueue.c:2598)
[ 1863.554235] worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2752)
[ 1863.554247] kthread (kernel/kthread.c:389)
[ 1863.554255] ret_from_fork (arch/x86/kernel/process.c:145)
[ 1863.554266] ret_from_fork_asm (arch/x86/entry/entry_64.S:312)
[ 1863.554280] value changed: 0x0000000000001766 -> 0x000000000000176a
[ 1863.554295] Reported by Kernel Concurrency Sanitizer on:
[ 1863.554303] CPU: 12 PID: 5450 Comm: kworker/u64:1 Tainted: G L 6.5.0-rc6+ #44
[ 1863.554314] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 1863.554322] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[ 1863.554941] ==================================================================
lockdep_invariant_state(true);
→ pwq->stats[PWQ_STAT_STARTED]++;
trace_workqueue_execute_start(work);
worker->current_func(work);
Moving pwq->stats[PWQ_STAT_STARTED]++; before the line
raw_spin_unlock_irq(&pool->lock);
resolves the data race without performance penalty.
KCSAN detected at least one additional data race:
[ 157.834751] ==================================================================
[ 157.834770] BUG: KCSAN: data-race in process_one_work / process_one_work
[ 157.834793] write to 0xffff9934453f77a0 of 8 bytes by task 468 on cpu 29:
[ 157.834804] process_one_work (/home/marvin/linux/kernel/linux_torvalds/kernel/workqueue.c:2606)
[ 157.834815] worker_thread (/home/marvin/linux/kernel/linux_torvalds/./include/linux/list.h:292 /home/marvin/linux/kernel/linux_torvalds/kernel/workqueue.c:2752)
[ 157.834826] kthread (/home/marvin/linux/kernel/linux_torvalds/kernel/kthread.c:389)
[ 157.834834] ret_from_fork (/home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/process.c:145)
[ 157.834845] ret_from_fork_asm (/home/marvin/linux/kernel/linux_torvalds/arch/x86/entry/entry_64.S:312)
[ 157.834859] read to 0xffff9934453f77a0 of 8 bytes by task 214 on cpu 7:
[ 157.834868] process_one_work (/home/marvin/linux/kernel/linux_torvalds/kernel/workqueue.c:2606)
[ 157.834879] worker_thread (/home/marvin/linux/kernel/linux_torvalds/./include/linux/list.h:292 /home/marvin/linux/kernel/linux_torvalds/kernel/workqueue.c:2752)
[ 157.834890] kthread (/home/marvin/linux/kernel/linux_torvalds/kernel/kthread.c:389)
[ 157.834897] ret_from_fork (/home/marvin/linux/kernel/linux_torvalds/arch/x86/kernel/process.c:145)
[ 157.834907] ret_from_fork_asm (/home/marvin/linux/kernel/linux_torvalds/arch/x86/entry/entry_64.S:312)
[ 157.834920] value changed: 0x000000000000052a -> 0x0000000000000532
[ 157.834933] Reported by Kernel Concurrency Sanitizer on:
[ 157.834941] CPU: 7 PID: 214 Comm: kworker/u64:2 Tainted: G L 6.5.0-rc7-kcsan-00169-g81eaf55a60fc #4
[ 157.834951] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 157.834958] Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
[ 157.835567] ==================================================================
in code:
trace_workqueue_execute_end(work, worker->current_func);
→ pwq->stats[PWQ_STAT_COMPLETED]++;
lock_map_release(&lockdep_map);
lock_map_release(&pwq->wq->lockdep_map);
which needs to be resolved separately.
Fixes: 725e8ec59c56c ("workqueue: Add pwq->stats[] and a monitoring script")
Cc: Tejun Heo <tj@kernel.org>
Suggested-by: Lai Jiangshan <jiangshanlai@gmail.com>
Link: https://lore.kernel.org/lkml/20230818194448.29672-1-mirsad.todorovac@alu.unizg.hr/
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The function update_entity_lag() is only used inside the kernel/sched/fair.c file.
Make it static.
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230829030325.69128-1-jiahao.os@bytedance.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Increase size limits for to-be-sent skb frag allocations. This
allows tun, tap devices and packet sockets to better cope with
large writes operations
- Store netdevs in an xarray, to simplify iterating over netdevs
- Refactor nexthop selection for multipath routes
- Improve sched class lifetime handling
- Add backup nexthop ID support for bridge
- Implement drop reasons support in openvswitch
- Several data races annotations and fixes
- Constify the sk parameter of routing functions
- Prepend kernel version to netconsole message
Protocols:
- Implement support for TCP probing the peer being under memory
pressure
- Remove hard coded limitation on IPv6 specific info placement inside
the socket struct
- Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per
socket scaling factor
- Scaling-up the IPv6 expired route GC via a separated list of
expiring routes
- In-kernel support for the TLS alert protocol
- Better support for UDP reuseport with connected sockets
- Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR
header size
- Get rid of additional ancillary per MPTCP connection struct socket
- Implement support for BPF-based MPTCP packet schedulers
- Format MPTCP subtests selftests results in TAP
- Several new SMC 2.1 features including unique experimental options,
max connections per lgr negotiation, max links per lgr negotiation
BPF:
- Multi-buffer support in AF_XDP
- Add multi uprobe BPF links for attaching multiple uprobes and usdt
probes, which is significantly faster and saves extra fds
- Implement an fd-based tc BPF attach API (TCX) and BPF link support
on top of it
- Add SO_REUSEPORT support for TC bpf_sk_assign
- Support new instructions from cpu v4 to simplify the generated code
and feature completeness, for x86, arm64, riscv64
- Support defragmenting IPv(4|6) packets in BPF
- Teach verifier actual bounds of bpf_get_smp_processor_id() and fix
perf+libbpf issue related to custom section handling
- Introduce bpf map element count and enable it for all program types
- Add a BPF hook in sys_socket() to change the protocol ID from
IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy
- Introduce bpf_me_mcache_free_rcu() and fix OOM under stress
- Add uprobe support for the bpf_get_func_ip helper
- Check skb ownership against full socket
- Support for up to 12 arguments in BPF trampoline
- Extend link_info for kprobe_multi and perf_event links
Netfilter:
- Speed-up process exit by aborting ruleset validation if a fatal
signal is pending
- Allow NLA_POLICY_MASK to be used with BE16/BE32 types
Driver API:
- Page pool optimizations, to improve data locality and cache usage
- Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the
need for raw ioctl() handling in drivers
- Simplify genetlink dump operations (doit/dumpit) providing them the
common information already populated in struct genl_info
- Extend and use the yaml devlink specs to [re]generate the split ops
- Introduce devlink selective dumps, to allow SF filtering SF based
on handle and other attributes
- Add yaml netlink spec for netlink-raw families, allow route, link
and address related queries via the ynl tool
- Remove phylink legacy mode support
- Support offload LED blinking to phy
- Add devlink port function attributes for IPsec
New hardware / drivers:
- Ethernet:
- Broadcom ASP 2.0 (72165) ethernet controller
- MediaTek MT7988 SoC
- Texas Instruments AM654 SoC
- Texas Instruments IEP driver
- Atheros qca8081 phy
- Marvell 88Q2110 phy
- NXP TJA1120 phy
- WiFi:
- MediaTek mt7981 support
- Can:
- Kvaser SmartFusion2 PCI Express devices
- Allwinner T113 controllers
- Texas Instruments tcan4552/4553 chips
- Bluetooth:
- Intel Gale Peak
- Qualcomm WCN3988 and WCN7850
- NXP AW693 and IW624
- Mediatek MT2925
Drivers:
- Ethernet NICs:
- nVidia/Mellanox:
- mlx5:
- support UDP encapsulation in packet offload mode
- IPsec packet offload support in eswitch mode
- improve aRFS observability by adding new set of counters
- extends MACsec offload support to cover RoCE traffic
- dynamic completion EQs
- mlx4:
- convert to use auxiliary bus instead of custom interface
logic
- Intel
- ice:
- implement switchdev bridge offload, even for LAG
interfaces
- implement SRIOV support for LAG interfaces
- igc:
- add support for multiple in-flight TX timestamps
- Broadcom:
- bnxt:
- use the unified RX page pool buffers for XDP and non-XDP
- use the NAPI skb allocation cache
- OcteonTX2:
- support Round Robin scheduling HTB offload
- TC flower offload support for SPI field
- Freescale:
- add XDP_TX feature support
- AMD:
- ionic: add support for PCI FLR event
- sfc:
- basic conntrack offload
- introduce eth, ipv4 and ipv6 pedit offloads
- ST Microelectronics:
- stmmac: maximze PTP timestamping resolution
- Virtual NICs:
- Microsoft vNIC:
- batch ringing RX queue doorbell on receiving packets
- add page pool for RX buffers
- Virtio vNIC:
- add per queue interrupt coalescing support
- Google vNIC:
- add queue-page-list mode support
- Ethernet high-speed switches:
- nVidia/Mellanox (mlxsw):
- add port range matching tc-flower offload
- permit enslavement to netdevices with uppers
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- convert to phylink_pcs
- Renesas:
- r8A779fx: add speed change support
- rzn1: enables vlan support
- Ethernet PHYs:
- convert mv88e6xxx to phylink_pcs
- WiFi:
- Qualcomm Wi-Fi 7 (ath12k):
- extremely High Throughput (EHT) PHY support
- RealTek (rtl8xxxu):
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
- RealTek (rtw89):
- Introduce Time Averaged SAR (TAS) support
- Connector:
- support for event filtering"
* tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits)
net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show
net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler
net: stmmac: clarify difference between "interface" and "phy_interface"
r8152: add vendor/device ID pair for D-Link DUB-E250
devlink: move devlink_notify_register/unregister() to dev.c
devlink: move small_ops definition into netlink.c
devlink: move tracepoint definitions into core.c
devlink: push linecard related code into separate file
devlink: push rate related code into separate file
devlink: push trap related code into separate file
devlink: use tracepoint_enabled() helper
devlink: push region related code into separate file
devlink: push param related code into separate file
devlink: push resource related code into separate file
devlink: push dpipe related code into separate file
devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper
devlink: push shared buffer related code into separate file
devlink: push port related code into separate file
devlink: push object register/unregister notifications into separate helpers
inet: fix IP_TRANSPARENT error handling
...
|
|
Using kmalloc() to allocate the decompression workspace for zstd may
trigger the following warning when large modules are loaded (i.e., xfs):
[ 2.961884] WARNING: CPU: 1 PID: 254 at mm/page_alloc.c:4453 __alloc_pages+0x2c3/0x350
...
[ 2.989033] Call Trace:
[ 2.989841] <TASK>
[ 2.990614] ? show_regs+0x6d/0x80
[ 2.991573] ? __warn+0x89/0x160
[ 2.992485] ? __alloc_pages+0x2c3/0x350
[ 2.993520] ? report_bug+0x17e/0x1b0
[ 2.994506] ? handle_bug+0x51/0xa0
[ 2.995474] ? exc_invalid_op+0x18/0x80
[ 2.996469] ? asm_exc_invalid_op+0x1b/0x20
[ 2.997530] ? module_zstd_decompress+0xdc/0x2a0
[ 2.998665] ? __alloc_pages+0x2c3/0x350
[ 2.999695] ? module_zstd_decompress+0xdc/0x2a0
[ 3.000821] __kmalloc_large_node+0x7a/0x150
[ 3.001920] __kmalloc+0xdb/0x170
[ 3.002824] module_zstd_decompress+0xdc/0x2a0
[ 3.003857] module_decompress+0x37/0xc0
[ 3.004688] init_module_from_file+0xd0/0x100
[ 3.005668] idempotent_init_module+0x11c/0x2b0
[ 3.006632] __x64_sys_finit_module+0x64/0xd0
[ 3.007568] do_syscall_64+0x59/0x90
[ 3.008373] ? ksys_read+0x73/0x100
[ 3.009395] ? exit_to_user_mode_prepare+0x30/0xb0
[ 3.010531] ? syscall_exit_to_user_mode+0x37/0x60
[ 3.011662] ? do_syscall_64+0x68/0x90
[ 3.012511] ? do_syscall_64+0x68/0x90
[ 3.013364] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
However, continuous physical memory does not seem to be required in
module_zstd_decompress(), so use vmalloc() instead, to prevent the
warning and avoid potential failures at loading compressed modules.
Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- add support for running Rust documentation tests as KUnit tests
- make init, str, sync, types doctests compilable/testable
- add support for attributes API which include speed, modules
attributes, ability to filter and report attributes
- add support for marking tests slow using attributes API
- add attributes API documentation
- fix a wild-memory-access bug in kunit_filter_suites() and a possible
memory leak in kunit_filter_suites()
- add support for counting number of test suites in a module, list
action to kunit test modules, and test filtering on module tests
* tag 'linux-kselftest-kunit-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
kunit: fix struct kunit_attr header
kunit: replace KUNIT_TRIGGER_STATIC_STUB maro with KUNIT_STATIC_STUB_REDIRECT
kunit: Allow kunit test modules to use test filtering
kunit: Make 'list' action available to kunit test modules
kunit: Report the count of test suites in a module
kunit: fix uninitialized variables bug in attributes filtering
kunit: fix possible memory leak in kunit_filter_suites()
kunit: fix wild-memory-access bug in kunit_filter_suites()
kunit: Add documentation of KUnit test attributes
kunit: add tests for filtering attributes
kunit: time: Mark test as slow using test attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: tool: Add command line interface to filter and report attributes
kunit: Add ability to filter attributes
kunit: Add module attribute
kunit: Add speed attribute
kunit: Add test attributes API structure
MAINTAINERS: add Rust KUnit files to the KUnit entry
rust: support running Rust documentation tests as KUnit ones
rust: types: make doctests compilable/testable
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These rework cpuidle governors to call tick_nohz_get_sleep_length()
less often and fix one of them, rework hibernation to avoid storing
pages filled with zeros in hibernation images, switch over some
cpufreq drivers to use void remove callbacks, fix and clean up
multiple cpufreq drivers, fix the devfreq core, update the cpupower
utility and make other assorted improvements.
Specifics:
- Rework the menu and teo cpuidle governors to avoid calling
tick_nohz_get_sleep_length(), which is likely to become quite
expensive going forward, too often and improve making decisions
regarding whether or not to stop the scheduler tick in the teo
governor (Rafael Wysocki)
- Improve the performance of cpufreq_stats_create_table() in some
cases (Liao Chang)
- Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal)
- Use clamp() helper macro to improve the code readability in
cpufreq_verify_within_limits() (Liao Chang)
- Set stale CPU frequency to minimum in intel_pstate (Doug Smythies)
- Migrate cpufreq drivers for various platforms to use void remove
callback (Yangtao Li)
- Add online/offline/exit hooks for Tegra driver (Sumit Gupta)
- Explicitly include correct DT includes in cpufreq (Rob Herring)
- Frequency domain updates for qcom-hw driver (Neil Armstrong)
- Modify AMD pstate driver return the highest_perf value (Meng Li)
- Generic cleanups for cppc, mediatek and powernow driver (Liao
Chang, Konrad Dybcio)
- Add more platforms to cpufreq-arm driver's blocklist
(AngeloGioacchino Del Regno and Konrad Dybcio)
- brcmstb-avs-cpufreq: Fix -Warray-bounds bug (Gustavo A. R. Silva)
- Add device PM helpers to allow a device to remain powered-on during
system-wide transitions (Ulf Hansson)
- Rework hibernation memory snapshotting to avoid storing pages
filled with zeros in hibernation image files (Brian Geffon)
- Add check to make sure that CPU latency QoS constraints do not use
negative values (Clive Lin)
- Optimize rp->domains memory allocation in the Intel RAPL power
capping driver (xiongxin)
- Remove recursion while parsing zones in the arm_scmi power capping
driver (Cristian Marussi)
- Fix memory leak in devfreq_dev_release() (Boris Brezillon)
- Rewrite devfreq_monitor_start() kerneldoc comment (Manivannan
Sadhasivam)
- Explicitly include correct DT includes in devfreq (Rob Herring)
- Remove unsued pm_runtime_update_max_time_suspended() extern
declaration (YueHaibing)
- Add turbo-boost support to cpupower (Wyes Karny)
- Add support for amd_pstate mode change to cpupower (Wyes Karny)
- Fix 'cpupower idle_set' command to accept only numeric values of
arguments (Likhitha Korrapati)
- Clean up OPP code and add new frequency related APIs to it (Viresh
Kumar, Manivannan Sadhasivam)
- Convert ti cpufreq/opp bindings to json schema (Nishanth Menon)"
* tag 'pm-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
cpufreq: tegra194: remove opp table in exit hook
cpufreq: powernow-k8: Use related_cpus instead of cpus in driver.exit()
cpufreq: tegra194: add online/offline hooks
cpuidle: teo: Avoid unnecessary variable assignments
cpufreq: qcom-cpufreq-hw: add support for 4 freq domains
dt-bindings: cpufreq: qcom-hw: add a 4th frequency domain
cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver
cpufreq: amd-pstate-ut: Remove module parameter access
cpufreq: Use clamp() helper macro to improve the code readability
PM: sleep: Add helpers to allow a device to remain powered-on
PM: QoS: Add check to make sure CPU latency is non-negative
PM: runtime: Remove unsued extern declaration of pm_runtime_update_max_time_suspended()
cpufreq: intel_pstate: set stale CPU frequency to minimum
cpufreq: stats: Improve the performance of cpufreq_stats_create_table()
dt-bindings: cpufreq: Convert ti-cpufreq to json schema
dt-bindings: opp: Convert ti-omap5-opp-supply to json schema
OPP: Fix argument name in doc comment
cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases
cpufreq: cppc: Set fie_disabled to FIE_DISABLED if fails to create kworker_fie
cpufreq: cppc: cppc_cpufreq_get_rate() returns zero in all error cases.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 cleanups from Ingo Molnar:
"The following commit deserves special mention:
22dc02f81cddd Revert "sched/fair: Move unused stub functions to header"
This is in x86/cleanups, because the revert is a re-application of a
number of cleanups that got removed inadvertedly"
[ This also effectively undoes the amd_check_microcode() microcode
declaration change I had done in my microcode loader merge in commit
42a7f6e3ffe0 ("Merge tag 'x86_microcode_for_v6.6_rc1' [...]").
I picked the declaration change by Arnd from this branch instead,
which put it in <asm/processor.h> instead of <asm/microcode.h> like I
had done in my merge resolution - Linus ]
* tag 'x86-cleanups-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Refactor code using deprecated strncpy() interface to use strscpy()
x86/hpet: Refactor code using deprecated strncpy() interface to use strscpy()
x86/platform/uv: Refactor code using deprecated strcpy()/strncpy() interfaces to use strscpy()
x86/qspinlock-paravirt: Fix missing-prototype warning
x86/paravirt: Silence unused native_pv_lock_init() function warning
x86/alternative: Add a __alt_reloc_selftest() prototype
x86/purgatory: Include header for warn() declaration
x86/asm: Avoid unneeded __div64_32 function definition
Revert "sched/fair: Move unused stub functions to header"
x86/apic: Hide unused safe_smp_processor_id() on 32-bit UP
x86/cpu: Fix amd_check_microcode() declaration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- The biggest change is introduction of a new iteration of the
SCHED_FAIR interactivity code: the EEVDF ("Earliest Eligible Virtual
Deadline First") scheduler
EEVDF too is a virtual-time scheduler, with two parameters (weight
and relative deadline), compared to CFS that had weight only. It
completely reworks the base scheduler: placement, preemption, picking
-- everything
LWN.net, as usual, has a terrific writeup about EEVDF:
https://lwn.net/Articles/925371/
Preemption (both tick and wakeup) is driven by testing against a
fresh pick. Because the tree is now effectively an interval tree, and
the selection is no longer the 'leftmost' task, over-scheduling is
less of a problem. A lot of the CFS heuristics are removed or
replaced by more natural latency-space parameters & constructs
In terms of expected performance regressions: we will and can fix
everything where a 'good' workload misbehaves with the new scheduler,
but EEVDF inevitably changes workload scheduling in a binary fashion,
hopefully for the better in the overwhelming majority of cases, but
in some cases it won't, especially in adversarial loads that got
lucky with the previous code, such as some variants of hackbench. We
are trying hard to err on the side of fixing all performance
regressions, but we expect some inevitable post-release iterations of
that process
- Improve load-balancing on hybrid x86 systems: enable cluster
scheduling (again)
- Improve & fix bandwidth-scheduling on nohz systems
- Improve bandwidth-throttling
- Use lock guards to simplify and de-goto-ify control flow
- Misc improvements, cleanups and fixes
* tag 'sched-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
sched/eevdf/doc: Modify the documented knob to base_slice_ns as well
sched/eevdf: Curb wakeup-preemption
sched: Simplify sched_core_cpu_{starting,deactivate}()
sched: Simplify try_steal_cookie()
sched: Simplify sched_tick_remote()
sched: Simplify sched_exec()
sched: Simplify ttwu()
sched: Simplify wake_up_if_idle()
sched: Simplify: migrate_swap_stop()
sched: Simplify sysctl_sched_uclamp_handler()
sched: Simplify get_nohz_timer_target()
sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset
sched/rt: Fix sysctl_sched_rr_timeslice intial value
sched/fair: Block nohz tick_stop when cfs bandwidth in use
sched, cgroup: Restore meaning to hierarchical_quota
MAINTAINERS: Add Peter explicitly to the psi section
sched/psi: Select KERNFS as needed
sched/topology: Align group flags when removing degenerate domain
sched/fair: remove util_est boosting
sched/fair: Propagate enqueue flags into place_entity()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event updates from Ingo Molnar:
- AMD IBS improvements
- Intel PMU driver updates
- Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events
- Micro-optimize software events and the ring-buffer code
- Misc cleanups & fixes
* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
perf/x86/intel: Add Crestmont PMU
x86/cpu: Update Hybrids
x86/cpu: Fix Crestmont uarch
x86/cpu: Fix Gracemont uarch
perf: Remove unused extern declaration arch_perf_get_page_size()
perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
perf/mem: Introduce PERF_MEM_LVLNUM_UNC
perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
locking/arch: Avoid variable shadowing in local_try_cmpxchg()
perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
perf/x86: Use local64_try_cmpxchg
perf/amd: Prevent grouping of IBS events
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner:
"Updates for the CPU hotplug core:
- Support partial SMT enablement.
So far the sysfs SMT control only allows to toggle between SMT on
and off. That's sufficient for x86 which usually has at max two
threads except for the Xeon PHI platform which has four threads per
core
Though PowerPC has up to 16 threads per core and so far it's only
possible to control the number of enabled threads per core via a
command line option. There is some way to control this at runtime,
but that lacks enforcement and the usability is awkward
This update expands the sysfs interface and the core infrastructure
to accept numerical values so PowerPC can build SMT runtime control
for partial SMT enablement on top
The core support has also been provided to the PowerPC maintainers
who added the PowerPC related changes on top
- Minor cleanups and documentation updates"
* tag 'smp-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: core-api/cpuhotplug: Fix state names
cpu/hotplug: Remove unused function declaration cpu_set_state_online()
cpu/SMT: Fix cpu_smt_possible() comment
cpu/SMT: Allow enabling partial SMT states via sysfs
cpu/SMT: Create topology_smt_thread_allowed()
cpu/SMT: Remove topology_smt_supported()
cpu/SMT: Store the current/max number of threads
cpu/SMT: Move smt/control simple exit cases earlier
cpu/SMT: Move SMT prototypes into cpu_smt.h
cpu/hotplug: Remove dependancy against cpu_primary_thread_mask
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Boring updates for the interrupt subsystem:
Core:
- Prevent a deadlock of nested interrupt threads vs.
synchronize_hard()
- Removal of a stale extern declaration
Drivers:
- The first new driver since v6.2 for Amlogic-C3 SoCs
- The usual small fixes, cleanups and improvements all over the
place"
* tag 'irq-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: Add support for Amlogic-C3 SoCs
dt-bindings: interrupt-controller: Add support for Amlogic-C3 SoCs
irqchip/irq-mvebu-sei: Use devm_platform_get_and_ioremap_resource()
irqchip/ls-scfg-msi: Use devm_platform_get_and_ioremap_resource()
irqchip: Explicitly include correct DT includes
irqchip/orion: Use of_address_count() helper
irqchip/irq-pruss-intc: Do not check for 0 return after calling platform_get_irq()
irqchip/imx-mu-msi: Do not check for 0 return after calling platform_get_irq()
irqchipr/i8259: Mark i8259_of_init() static
irqchip/mips-gic: Mark gic_irq_domain_free() static
irqchip/xtensa-pic: Include header for xtensa_pic_init_legacy()
irqchip/loongson-eiointc: Fix return value checking of eiointc_index
genirq: Remove unused extern declaration
genirq: Prevent nested thread vs synchronize_hardirq() deadlock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry code update from Thomas Gleixner:
"A single update to the core entry code, which removes the empty user
address limit check which is a leftover of the removed TIF_FSCHECK"
* tag 'core-entry-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Remove empty addr_limit_user_check()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull clocksource watchdog updates from Paul McKenney:
- Handle negative skews in "skew is too large" messages
- Extend watchdog check exemption to 4-Socket platforms
* tag 'clocksource.2023.08.15a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
x86/tsc: Extend watchdog check exemption to 4-Sockets platform
clocksource: Handle negative skews in "skew is too large" messages
|