summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2024-07-16Merge tag 'net-next-6.11' of ↵HEADmasterLinus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Not much excitement - a handful of large patchsets (devmem among them) did not make it in time. Core & protocols: - Use local_lock in addition to local_bh_disable() to protect per-CPU resources in networking, a step closer for local_bh_disable() not to act as a big lock on PREEMPT_RT - Use flex array for netdevice priv area, ensure its cache alignment - Add a sysctl knob to allow user to specify a default rto_min at socket init time. Bit of a big hammer but multiple companies were independently carrying such patch downstream so clearly it's useful - Support scheduling transmission of packets based on CLOCK_TAI - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off using cpusets - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address - Allow configuration of multipath hash seed, to both allow synchronizing hashing of two routers, and preventing partial accidental sync - Improve TCP compliance with RFC 9293 for simultaneous connect() - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it - Support sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled - Introduce IPPROTO_SMC for selecting SMC when socket is created - Allow UDP GSO transmit from devices with no checksum offload - openvswitch: add packet sampling via psample, separating the sampled traffic from "upcall" packets sent to user space for forwarding - nf_tables: shrink memory consumption for transaction objects Things we sprinkled into general kernel code: - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for QCA6390) [ Already merged separately - Linus ] - Add IRQ information in sysfs for auxiliary bus - Introduce guard definition for local_lock - Add aligned flavor of __cacheline_group_{begin, end}() markings for grouping fields in structures BPF: - Notify user space (via epoll) when a struct_ops object is getting detached/unregistered - Add new kfuncs for a generic, open-coded bits iterator - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head - Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible WRT BTF from modules - Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs - riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter - Add the capability to offload the netfilter flowtable in XDP layer through kfuncs Driver API: - Allow users to configure IRQ tresholds between which automatic IRQ moderation can choose - Expand Power Sourcing (PoE) status with power, class and failure reason. Support setting power limits - Track additional RSS contexts in the core, make sure configuration changes don't break them - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths - Support updating firmware on SFP modules Tests and tooling: - mptcp: use net/lib.sh to manage netns - TCP-AO and TCP-MD5: replace debug prints used by tests with tracepoints - openvswitch: make test self-contained (don't depend on OvS CLI tools) Drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - increase the max total outstanding PTP TX packets to 4 - add timestamping statistics support - implement netdev_queue_mgmt_ops - support new RSS context API - Intel (100G, ice, idpf): - implement FEC statistics and dumping signal quality indicators - support E825C products (with 56Gbps PHYs) - nVidia/Mellanox: - support HW-GRO - mlx4/mlx5: support per-queue statistics via netlink - obey the max number of EQs setting in sub-functions - AMD/Solarflare: - support new RSS context API - AMD/Pensando: - ionic: rework fix for doorbell miss to lower overhead and skip it on new HW - Wangxun: - txgbe: support Flow Director perfect filters - Ethernet NICs consumer, embedded and virtual: - Add driver for Tehuti Networks TN40xx chips - Add driver for Meta's internal NIC chips - Add driver for Ethernet MAC on Airoha EN7581 SoCs - Add driver for Renesas Ethernet-TSN devices - Google cloud vNIC: - flow steering support - Microsoft vNIC: - support page sizes other than 4KB on ARM64 - vmware vNIC: - support latency measurement (update to version 9) - VirtIO net: - support for Byte Queue Limits - support configuring thresholds for automatic IRQ moderation - support for AF_XDP Rx zero-copy - Synopsys (stmmac): - support for STM32MP13 SoC - let platforms select the right PCS implementation - TI: - icssg-prueth: add multicast filtering support - icssg-prueth: enable PTP timestamping and PPS - Renesas: - ravb: improve Rx performance 30-400% by using page pool, theaded NAPI and timer-based IRQ coalescing - ravb: add MII support for R-Car V4M - Cadence (macb): - macb: add ARP support to Wake-On-LAN - Cortina: - use phylib for RX and TX pause configuration - Ethernet switches: - nVidia/Mellanox: - support configuration of multipath hash seed - report more accurate max MTU - use page_pool to improve Rx performance - MediaTek: - mt7530: add support for bridge port isolation - Qualcomm: - qca8k: add support for bridge port isolation - Microchip: - lan9371/2: add 100BaseTX PHY support - NXP: - vsc73xx: implement VLAN operations - Ethernet PHYs: - aquantia: enable support for aqr115c - aquantia: add support for PHY LEDs - realtek: add support for rtl8224 2.5Gbps PHY - xpcs: add memory-mapped device support - add BroadR-Reach link mode and support in Broadcom's PHY driver - CAN: - add document for ISO 15765-2 protocol support - mcp251xfd: workaround for erratum DS80000789E, use timestamps to catch when device returns incorrect FIFO status - WiFi: - mac80211/cfg80211: - parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers - improvements for 6 GHz regulatory flexibility - multi-link improvements - support multiple radios per wiphy - remove DEAUTH_NEED_MGD_TX_PREP flag - Intel (iwlwifi): - bump FW API to 91 for BZ/SC devices - report 64-bit radiotap timestamp - enable P2P low latency by default - handle Transmit Power Envelope (TPE) advertised by AP - remove support for older FW for new devices - fast resume (keeping the device configured) - mvm: re-enable Multi-Link Operation (MLO) - aggregation (A-MSDU) optimizations - MediaTek (mt76): - mt7925 Multi-Link Operation (MLO) support - Qualcomm (ath10k): - LED support for various chipsets - Qualcomm (ath12k): - remove unsupported Tx monitor handling - support channel 2 in 6 GHz band - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) - support dynamic VLAN - add panic handler for resetting the firmware state - DebugFS support for datapath statistics - WCN7850: support for Wake on WLAN - Microchip (wilc1000): - read MAC address during probe to make it visible to user space - suspend/resume improvements - TI (wl18xx): - support newer firmware versions - RealTek (rtw89): - preparation for RTL8852BE-VT support - Wake on WLAN support for WiFi 6 chips - 36-bit PCI DMA support - RealTek (rtlwifi): - RTL8192DU support - Broadcom (brcmfmac): - Management Frame Protection support (to enable WPA3) - Bluetooth: - qualcomm: use the power sequencer for QCA6390 - btusb: mediatek: add ISO data transmission functions - hci_bcm4377: add BCM4388 support - btintel: add support for BlazarU core - btintel: add support for Whale Peak2 - btnxpuart: add support for AW693 A1 chipset - btnxpuart: add support for IW615 chipset - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591" * tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits) eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering" tcp: Replace strncpy() with strscpy() wifi: ath12k: fix build vs old compiler tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). eth: fbnic: Write the TCAM tables used for RSS control and Rx to host eth: fbnic: Add L2 address programming eth: fbnic: Add basic Rx handling eth: fbnic: Add basic Tx handling eth: fbnic: Add link detection eth: fbnic: Add initial messaging to notify FW of our presence eth: fbnic: Implement Rx queue alloc/start/stop/free eth: fbnic: Implement Tx queue alloc/start/stop/free eth: fbnic: Allocate a netdevice and napi vectors with queues eth: fbnic: Add FW communication mechanism eth: fbnic: Add message parsing for FW messages eth: fbnic: Add register init to set PCIe/Ethernet device config eth: fbnic: Allocate core device specific structures and devlink interface eth: fbnic: Add scaffolding for Meta's NIC driver PCI: Add Meta Platforms vendor ID net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK ...
2024-07-16Merge tag 'linux_kselftest-kunit-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - add vm_mmap() allocation resource manager - convert usercopy kselftest to KUnit - disable usercopy testing on !CONFIG_MMU - add MODULE_DESCRIPTION() to core, list, and usercopy tests - add tests for assertion formatting functions - assert.c - introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros - fix KUNIT_ASSERT_STRNEQ comments to make it clear that it is an assertion - rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT * tag 'linux_kselftest-kunit-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_FAIL_AND_ABORT for readability kunit: Fix the comment of KUNIT_ASSERT_STRNEQ as assertion kunit: executor: Simplify string allocation handling kunit/usercopy: Add missing MODULE_DESCRIPTION() kunit/usercopy: Disable testing on !CONFIG_MMU usercopy: Convert test_user_copy to KUnit test kunit: test: Add vm_mmap() allocation resource manager list: test: add the missing MODULE_DESCRIPTION() macro kunit: add missing MODULE_DESCRIPTION() macros to core modules list: test: remove unused struct 'klist_test_struct' kunit: Cover 'assert.c' with tests
2024-07-16Merge tag 'perf-core-2024-07-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Intel PT support enhancements & fixes - Fix leaked SIGTRAP events - Improve and fix the Intel uncore driver - Add support for Intel HBM and CXL uncore counters - Add Intel Lake and Arrow Lake support - AMD uncore driver fixes - Make SIGTRAP and __perf_pending_irq() work on RT - Micro-optimizations - Misc cleanups and fixes * tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) perf/x86/intel: Add a distinct name for Granite Rapids perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR perf: Split __perf_pending_irq() out of perf_pending_irq() perf: Don't disable preemption in perf_pending_task(). perf: Move swevent_htable::recursion into task_struct. perf: Shrink the size of the recursion counter. perf: Enqueue SIGTRAP always via task_work. task_work: Add TWA_NMI_CURRENT as an additional notify mode. perf: Move irq_work_queue() where the event is prepared. perf: Fix event leak upon exec and file release perf: Fix event leak upon exit task_work: Introduce task_work_cancel() again task_work: s/task_work_cancel()/task_work_cancel_func()/ perf/x86/amd/uncore: Fix DF and UMC domain identification perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable perf/x86/intel: Support Perfmon MSRs aliasing perf/x86/intel: Support PERFEVTSEL extension perf/x86: Add config_mask to represent EVENTSEL bitmask ...
2024-07-16Merge tag 'sched-core-2024-07-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Update Daniel Bristot de Oliveira's entry in MAINTAINERS, and credit him in CREDITS - Harmonize the lock-yielding behavior on dynamically selected preemption models with static ones - Reorganize the code a bit: split out sched/syscalls.c to reduce the size of sched/core.c - Micro-optimize psi_group_change() - Fix set_load_weight() for SCHED_IDLE tasks - Misc cleanups & fixes * tag 'sched-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Update MAINTAINERS and CREDITS sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks sched/psi: Optimise psi_group_change a bit sched/core: Drop spinlocks on contention iff kernel is preemptible sched/core: Move preempt_model_*() helpers from sched.h to preempt.h sched/balance: Skip unnecessary updates to idle load balancer's flags idle: Remove stale RCU comment sched/headers: Move struct pre-declarations to the beginning of the header sched/core: Clean up kernel/sched/sched.h a bit sched/core: Simplify prefetch_curr_exec_start() sched: Fix spelling in comments sched/syscalls: Split out kernel/sched/syscalls.c from kernel/sched/core.c
2024-07-16Merge tag 'locking-core-2024-07-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Jump label fixes, including a perf events fix that originally manifested as jump label failures, but was a serialization bug at the usage site - Mark down_write*() helpers as __always_inline, to improve WCHAN debuggability - Misc cleanups and fixes * tag 'locking-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Add __always_inline annotation to __down_write_common() and inlined callers jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() jump_label: Clarify condition in static_key_fast_inc_not_disabled() jump_label: Fix concurrency issues in static_key_slow_dec() perf/x86: Serialize set_attr_rdpmc() cleanup: Standardize the header guard define's name
2024-07-16Merge tag 'sysctl-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Remove "->procname == NULL" check when iterating through sysctl table arrays Removing sentinels in ctl_table arrays reduces the build time size and runtime memory consumed by ~64 bytes per array. With all ctl_table sentinels gone, the additional check for ->procname == NULL that worked in tandem with the ARRAY_SIZE to calculate the size of the ctl_table arrays is no longer needed and has been removed. The sysctl register functions now returns an error if a sentinel is used. - Preparation patches for sysctl constification Constifying ctl_table structs prevents the modification of proc_handler function pointers as they would reside in .rodata. The ctl_table arguments in sysctl utility functions are const qualified in preparation for a future treewide proc_handler argument constification commit. - Misc fixes Increase robustness of set_ownership by providing sane default ownership values in case the callee doesn't set them. Bound check proc_dou8vec_minmax to avoid loading buggy modules and give sysctl testing module a name to avoid compiler complaints. * tag 'sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: Warn on an empty procname element sysctl: Remove ctl_table sentinel code comments sysctl: Remove "child" sysctl code comments sysctl: Remove superfluous empty allocations from sysctl internals sysctl: Replace nr_entries with ctl_table_size in new_links sysctl: Remove check for sentinel element in ctl_table arrays mm profiling: Remove superfluous sentinel element from ctl_table locking: Remove superfluous sentinel element from kern_lockdep_table sysctl: Add module description to sysctl-testing sysctl: constify ctl_table arguments of utility function utsname: constify ctl_table arguments of utility function sysctl: move the extra1/2 boundary check of u8 to sysctl_check_table_array sysctl: always initialize i_uid/i_gid
2024-07-16Merge tag 'seccomp-v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - interrupt SECCOMP_IOCTL_NOTIF_RECV when all users exit (Andrei Vagin) - Update selftests to check for expected NOTIF_RECV exits (Andrei Vagin) * tag 'seccomp-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: check that a zombie leader doesn't affect others selftests/seccomp: add test for NOTIF_RECV and unused filters seccomp: release task filters when the task exits seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exited
2024-07-16Merge tag 'for-linus-6.11-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - some trivial cleanups - a fix for the Xen timer - add boot time selectable debug capability to the Xen multicall handling - two fixes for the recently added Xen irqfd handling * tag 'for-linus-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: remove deprecated xen_nopvspin boot parameter x86/xen: eliminate some private header files x86/xen: make some functions static xen: make multicall debug boot time selectable xen/arm: Convert comma to semicolon xen: privcmd: Fix possible access to a freed kirqfd instance xen: privcmd: Switch from mutex to spinlock for irqfds xen: add missing MODULE_DESCRIPTION() macros x86/xen: Convert comma to semicolon x86/xen/time: Reduce Xen timer tick xen/manage: Constify struct shutdown_handler
2024-07-16Merge tag 'asm-generic-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Most of this is part of my ongoing work to clean up the system call tables. In this bit, all of the newer architectures are converted to use the machine readable syscall.tbl format instead in place of complex macros in include/uapi/asm-generic/unistd.h. This follows an earlier series that fixed various API mismatches and in turn is used as the base for planned simplifications. The other two patches are dead code removal and a warning fix" * tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: vmlinux.lds.h: catch .bss..L* sections into BSS") fixmap: Remove unused set_fixmap_offset_io() riscv: convert to generic syscall table openrisc: convert to generic syscall table nios2: convert to generic syscall table loongarch: convert to generic syscall table hexagon: use new system call table csky: convert to generic syscall table arm64: rework compat syscall macros arm64: generate 64-bit syscall.tbl arm64: convert unistd_32.h to syscall.tbl format arc: convert to generic syscall table clone3: drop __ARCH_WANT_SYS_CLONE3 macro kbuild: add syscall table generation to scripts/Makefile.asm-headers kbuild: verify asm-generic header list loongarch: avoid generating extra header files um: don't generate asm/bpf_perf_event.h csky: drop asm/gpio.h wrapper syscalls: add generic scripts/syscall.tbl
2024-07-15Merge tag 'x86_cc_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 confidential computing updates from Borislav Petkov: "Unrelated x86/cc changes queued here to avoid ugly cross-merges and conflicts: - Carve out CPU hotplug function declarations into a separate header with the goal to be able to use the lockdep assertions in a more flexible manner - As a result, refactor cacheinfo code after carving out a function to return the cache ID associated with a given cache level - Cleanups Add support to be able to kexec TDX guests: - Expand ACPI MADT CPU offlining support - Add machinery to prepare CoCo guests memory before kexec-ing into a new kernel - Cleanup, readjust and massage related code" * tag 'x86_cc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method x86/mm: Introduce kernel_ident_mapping_free() x86/smp: Add smp_ops.stop_this_cpu() callback x86/acpi: Do not attempt to bring up secondary CPUs in the kexec case x86/acpi: Rename fields in the acpi_madt_multiproc_wakeup structure x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump x86/mm: Make e820__end_ram_pfn() cover E820_TYPE_ACPI ranges x86/tdx: Convert shared memory back to private on kexec x86/mm: Add callbacks to prepare encrypted memory for kexec x86/tdx: Account shared memory x86/mm: Return correct level from lookup_address() if pte is none x86/mm: Make x86_platform.guest.enc_status_change_*() return an error x86/kexec: Keep CR4.MCE set during kexec for TDX guest x86/relocate_kernel: Use named labels for less confusion cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup cpu/hotplug: Add support for declaring CPU offlining not supported x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init x86/acpi: Extract ACPI MADT wakeup code into a separate file x86/kexec: Remove spurious unconditional JMP from from identity_mapped() ...
2024-07-15Merge tag 'gpio-updates-for-v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "The majority of added lines are two new modules: the GPIO virtual consumer module that improves our ability to add automated tests for the kernel API and the "sloppy" logic analyzer module that uses the GPIO API to implement a coarse-grained debugging tool for useful for remote development. Other than that we have the usual assortment of various driver extensions, improvements to the core GPIO code, DT-bindings and other documentation updates as well as an extension to the interrupt simulator: GPIOLIB core: - rework kfifo handling rework in the character device code - improve the labeling of GPIOs requested as interrupts and show more info on interrupt-only GPIOs in debugfs - remove unused APIs - unexport interfaces that are only used from the core GPIO code - drop the return value from gpiochip_set_desc_names() as it cannot fail - move a string array definition out of a header and into a specific compilation unit - convert the last user of gpiochip_get_desc() other than GPIO core to using a safer alternative - use array_index_nospec() where applicable New drivers: - add a "virtual GPIO consumer" module that allows requesting GPIOs from actual hardware and driving tests of the in-kernel GPIO API from user-space over debugfs - add a GPIO-based "sloppy" logic analyzer module useful for "first glance" debugging on remote boards Driver improvements: - add support for a new model to gpio-pca953x - lock GPIOs as interrupts in gpio-sim when the lines are requested as irqs via the simulator domain + some other minor improvements - improve error reporting in gpio-syscon - convert gpio-ath79 to using dynamic GPIO base and range - use pcibios_err_to_errno() for converting PCIBIOS error codes to errno vaues in gpio-amd8111 and gpio-rdc321x - allow building gpio-brcmstb for the BCM2835 architecture DT bindings: - convert DT bindings for lsi,zevio, mpc8xxx, and atmel to DT schema - document new properties for aspeed,gpio, fsl,qoriq-gpio and gpio-vf610 - document new compatibles for pca953x and fsl,qoriq-gpio Documentation: - document stricter behavior of the GPIO character device uAPI with regards to reconfiguring requested line without direction set - clarify the effect of the active-low flag on line values and edges - remove documentation for the legacy GPIO API in order to stop tempting people to use it - document the preference for using pread() for reading edge events in the sysfs API Other: - add an extended initializer to the interrupt simulator allowing to specify a number of callbacks callers can use to be notified about irqs being requested and released" * tag 'gpio-updates-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (41 commits) gpio: mc33880: Convert comma to semicolon gpio: virtuser: actually use the "trimmed" local variable dt-bindings: gpio: convert Atmel GPIO to json-schema gpio: virtuser: new virtual testing driver for the GPIO API dt-bindings: gpio: vf610: Allow gpio-line-names to be set gpio: sim: lock GPIOs as interrupts when they are requested genirq/irq_sim: add an extended irq_sim initializer dt-bindings: gpio: fsl,qoriq-gpio: Add compatible string fsl,ls1046a-gpio gpiolib: unexport gpiochip_get_desc() gpio: add sloppy logic analyzer using polling Documentation: gpio: Reconfiguration with unset direction (uAPI v2) Documentation: gpio: Reconfiguration with unset direction (uAPI v1) dt-bindings: gpio: fsl,qoriq-gpio: add common property gpio-line-names gpio: ath79: convert to dynamic GPIO base allocation pinctrl: da9062: replace gpiochip_get_desc() with gpio_device_get_desc() gpiolib: put gpio_suffixes in a single compilation unit Documentation: gpio: Clarify effect of active low flag on line edges Documentation: gpio: Clarify effect of active low flag on line values gpiolib: Remove data-less gpiochip_add() function gpio: sim: use devm_mutex_init() ...
2024-07-15Merge tag 'wq-for-6.11-rc1-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "cpus_read_lock() was dropped from workqueue creation path but there were still remaining lockdep_assert_cpus_held() triggering spurious lockdep failures. Remove them" * tag 'wq-for-6.11-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Remove unneeded lockdep_assert_cpus_held()
2024-07-15Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "The biggest part is the virtual CPU hotplug that touches ACPI, irqchip. We also have some GICv3 optimisation for pseudo-NMIs that has been queued via the arm64 tree. Otherwise the usual perf updates, kselftest, various small cleanups. Core: - Virtual CPU hotplug support for arm64 ACPI systems - cpufeature infrastructure cleanups and making the FEAT_ECBHB ID bits visible to guests - CPU errata: expand the speculative SSBS workaround to more CPUs - GICv3, use compile-time PMR values: optimise the way regular IRQs are masked/unmasked when GICv3 pseudo-NMIs are used, removing the need for a static key in fast paths by using a priority value chosen dynamically at boot time ACPI: - 'acpi=nospcr' option to disable SPCR as default console for arm64 - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/ Perf updates: - Rework of the IMX PMU driver to enable support for I.MX95 - Enable support for tertiary match groups in the CMN PMU driver - Initial refactoring of the CPU PMU code to prepare for the fixed instruction counter introduced by Arm v9.4 - Add missing PMU driver MODULE_DESCRIPTION() strings - Hook up DT compatibles for recent CPU PMUs Kselftest updates: - Kernel mode NEON fp-stress - Cleanups, spelling mistakes Miscellaneous: - arm64 Documentation update with a minor clarification on TBI - Fix missing IPI statistics - Implement raw_smp_processor_id() using thread_info rather than a per-CPU variable (better code generation) - Make MTE checking of in-kernel asynchronous tag faults conditional on KASAN being enabled - Minor cleanups, typos" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (69 commits) selftests: arm64: tags: remove the result script selftests: arm64: tags_test: conform test to TAP output perf: add missing MODULE_DESCRIPTION() macros arm64: smp: Fix missing IPI statistics irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64 Documentation: arm64: Update memory.rst for TBI arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1 KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1 perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h perf: arm_v6/7_pmu: Drop non-DT probe support perf/arm: Move 32-bit PMU drivers to drivers/perf/ perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU perf: imx_perf: add support for i.MX95 platform perf: imx_perf: fix counter start and config sequence perf: imx_perf: refactor driver for imx93 perf: imx_perf: let the driver manage the counter usage rather the user perf: imx_perf: add macro definitions for parsing config attr ...
2024-07-15workqueue: Remove unneeded lockdep_assert_cpus_held()Lai Jiangshan
The commit 19af45757383 ("workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()") removes the unneed cpus_read_lock() after the pwq creations and installations have been reworked based on wq_online_cpumask rather than cpu_online_mask making cpus_read_lock() is unneeded during wqattrs changes. But it desn't remove the lockdep_assert_cpus_held() checks during wqattrs changes, which leads to complaints from lockdep reported by kernel test robot: [ 15.726567][ T131] ------------[ cut here ]------------ [ 15.728117][ T131] WARNING: CPU: 1 PID: 131 at kernel/cpu.c:525 lockdep_assert_cpus_held (kernel/cpu.c:525) [ 15.731191][ T131] Modules linked in: floppy(+) parport_pc(+) parport qemu_fw_cfg rtc_cmos [ 15.733423][ T131] CPU: 1 PID: 131 Comm: systemd-udevd Tainted: G T 6.10.0-rc2-00254-g19af45757383 #1 df6f039f42e8818bf9a534449362ebad1aad32e2 [ 15.737011][ T131] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 15.739760][ T131] EIP: lockdep_assert_cpus_held (kernel/cpu.c:525) [ 15.741326][ T131] Code: 97 c2 03 72 20 83 3d f4 73 97 c2 00 74 17 55 89 e5 b8 fc bd 4d c2 ba ff ff ff ff e8 e4 57 d1 00 85 c0 74 06 5d 31 c0 31 d2 c3 <0f> 0b eb f6 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 b8 Fix it by removing the unneeded lockdep_assert_cpus_held(). Also remove the unneed cpus_read_lock() from wq_affn_dfl_set(). tj: Dropped the removal of cpus_read_lock/unlock() in wq_affn_dfl_set() to keep this patch fix only. Cc: kernel test robot <oliver.sang@intel.com> Fixes: 19af45757383("workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202407141846.665c0446-lkp@intel.com Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-15Merge tag 'wq-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: - Lai fixed a bug where CPU hotplug and workqueue attribute changes race leaving some workqueues not fully updated. This involved refactoring and changing how online CPUs are tracked. The resulting code is cleaner. - Workqueue watchdog touch operation was causing too much cacheline contention on very large machines. Nicholas improved scalabililty by avoiding unnecessary global updates. - Code cleanups and minor rescuer behavior improvement. - The last commit 58629d4871e8 ("workqueue: Always queue work items to the newest PWQ for order workqueues") is a cherry-picked straggler commit from for-6.10-fixes, a fix for a bug which may not actually trigger. * tag 'wq-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (24 commits) workqueue: Always queue work items to the newest PWQ for order workqueues workqueue: Rename wq_update_pod() to unbound_wq_update_pwq() workqueue: Remove the arguments @hotplug_cpu and @online from wq_update_pod() workqueue: Remove the argument @cpu_going_down from wq_calc_pod_cpumask() workqueue: Remove the unneeded cpumask empty check in wq_calc_pod_cpumask() workqueue: Remove cpus_read_lock() from apply_wqattrs_lock() workqueue: Simplify wq_calc_pod_cpumask() with wq_online_cpumask workqueue: Add wq_online_cpumask workqueue: Init rescuer's affinities as the wq's effective cpumask workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S. workqueue: Move kthread_flush_worker() out of alloc_and_link_pwqs() workqueue: Make rescuer initialization as the last step of the creation of a new wq workqueue: Register sysfs after the whole creation of the new wq workqueue: Simplify goto statement workqueue: Update cpumasks after only applying it successfully workqueue: Improve scalability of workqueue watchdog touch workqueue: wq_watchdog_touch is always called with valid CPU workqueue: Remove useless pool->dying_workers workqueue: Detach workers directly in idle_cull_fn() workqueue: Don't bind the rescuer in the last working cpu ...
2024-07-15Merge tag 'cgroup-for-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Added Michal Koutný as a maintainer - Counters in pids.events were behaving inconsistently. pids.events made properly hierarchical and pids.events.local added - misc.peak and misc.events.local added - cpuset remote partition creation and cpuset.cpus.exclusive handling improved - Code cleanups, non-critical fixes, doc updates - for-6.10-fixes is merged in to receive two non-critical fixes that didn't trigger pull * tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits) cgroup: Add Michal Koutný as a maintainer cgroup/misc: Introduce misc.events.local cgroup/rstat: add force idle show helper cgroup: Protect css->cgroup write under css_set_lock cgroup/misc: Introduce misc.peak cgroup_misc: add kernel-doc comments for enum misc_res_type cgroup/cpuset: Prevent UAF in proc_cpuset_show() selftest/cgroup: Update test_cpuset_prs.sh to match changes cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test robot cgroup/cpuset: Fix remote root partition creation problem cgroup: avoid the unnecessary list_add(dying_tasks) in cgroup_exit() cgroup/cpuset: Optimize isolated partition only generate_sched_domains() calls cgroup/cpuset: Reduce the lock protecting CS_SCHED_LOAD_BALANCE kernel/cgroup: cleanup cgroup_base_files when fail to add cgroup_psi_files selftests: cgroup: Add basic tests for pids controller selftests: cgroup: Lexicographic order in Makefile cgroup/pids: Add pids.events.local cgroup/pids: Make event counters hierarchical ...
2024-07-15Merge tag 'kcsan.2024.07.12a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull KCSAN updates from Paul McKenney: - improve the documentation for the new __data_racy type qualifier to the data_race() macro's kernel-doc header and to the LKMM's access-marking documentation - add missing MODULE_DESCRIPTION * tag 'kcsan.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: kcsan: Add missing MODULE_DESCRIPTION() macro kcsan: Add example to data_race() kerneldoc header
2024-07-15Merge tag 'torture.2024.07.12a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull torture-test updates from Paul McKenney: "This adds MODULE_DESCRIPTION() to torture.c, locktorture.c, and scftorture.c, and also adds 'static' to a global variable that is used only in scftorture.c" * tag 'torture.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: scftorture: Make torture_type static scftorture: Add MODULE_DESCRIPTION() locktorture: Add MODULE_DESCRIPTION() torture: Add MODULE_DESCRIPTION()
2024-07-15Merge tag 'rcu.2024.07.12a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Update Tasks RCU and Tasks Rude RCU description in Requirements.rst and clarify rcu_assign_pointer() and rcu_dereference() ordering properties - Add lockdep assertions for RCU readers, limit inline wakeups for callback-bypass synchronize_rcu(), add an rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter, add Uladzislau Rezki as RCU maintainer, and fix a subtle callback-migration memory-ordering issue - Remove a number of redundant memory barriers - Remove unnecessary bypass-list lock-contention mitigation, use parking API instead of open-coded ad-hoc equivalent, and upgrade obsolete comments - Revert avoidance of a deadlock that can no longer occur and properly synchronize Tasks Trace RCU checking of runqueues - Add tests for handling of double-call_rcu() bug, add missing MODULE_DESCRIPTION, and add a script that histograms the number of calls to RCU updaters - Fill out SRCU polled-grace-period API * tag 'rcu.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (29 commits) rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation rcu: Eliminate lockless accesses to rcu_sync->gp_count MAINTAINERS: Add Uladzislau Rezki as RCU maintainer rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter rcu/exp: Remove redundant full memory barrier at the end of GP rcu: Remove full memory barrier on RCU stall printout rcu: Remove full memory barrier on boot time eqs sanity check rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot rcu: Remove superfluous full memory barrier upon first EQS snapshot rcu: Remove full ordering on second EQS snapshot srcu: Fill out polled grace-period APIs srcu: Update cleanup_srcu_struct() comment srcu: Add NUM_ACTIVE_SRCU_POLL_OLDSTATE srcu: Disable interrupts directly in srcu_gp_end() rcu: Disable interrupts directly in rcu_gp_init() rcu/tree: Reduce wake up for synchronize_rcu() common case rcu/tasks: Fix stale task snaphot for Tasks Trace tools/rcu: Add rcu-updaters.sh script rcutorture: Add missing MODULE_DESCRIPTION() macros rcutorture: Fix rcu_torture_fwd_cb_cr() data race ...
2024-07-15Merge tag 'timers-core-2024-07-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and related functionality: Core: - Make the takeover of a hrtimer based broadcast timer reliable during CPU hot-unplug. The current implementation suffers from a race which can lead to broadcast timer starvation in the worst case. - VDSO related cleanups and simplifications - Small cleanups and enhancements all over the place PTP: - Replace the architecture specific base clock to clocksource, e.g. ART to TSC, conversion function with generic functionality to avoid exposing such internals to drivers and convert all existing drivers over. This also allows to provide functionality which converts the other way round in the core code based on the same parameter set. - Provide a function to convert CLOCK_REALTIME to the base clock to support the upcoming PPS output driver on Intel platforms. Drivers: - A set of Device Tree bindings for new hardware - Cleanups and enhancements all over the place" * tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) clocksource/drivers/realtek: Add timer driver for rtl-otto platforms dt-bindings: timer: Add schema for realtek,otto-timer dt-bindings: timer: Add SOPHGO SG2002 clint dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support dt-bindings: timer: renesas,tmu: Add RZ/G1 support dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support clocksource/drivers/mips-gic-timer: Correct sched_clock width clocksource/drivers/mips-gic-timer: Refine rating computation clocksource/drivers/sh_cmt: Address race condition for clock events clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq tick/broadcast: Make takeover of broadcast hrtimer reliable tick/sched: Combine WARN_ON_ONCE and print_once x86/vdso: Remove unused include x86/vgtod: Remove unused typedef gtod_long_t x86/vdso: Fix function reference in comment vdso: Add comment about reason for vdso struct ordering vdso/gettimeofday: Clarify comment about open coded function timekeeping: Add missing kernel-doc function comments tick: Remove unnused tick_nohz_get_idle_calls() ...
2024-07-15Merge tag 'smp-core-2024-07-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug updates from Thomas Gleixner: "A small set of SMP/CPU hotplug updates: - Reverse the order of iteration when freezing secondary CPUs for hibernation. This avoids that drivers like the Intel uncore performance counter have to transfer the assignement of handling the per package uncore events for every CPU in a package, which is a considerable speedup on larger systems. - Add a missing destroy_work_on_stack() invocation in smp_call_on_cpu() to prevent debug objects to emit a false positive warning when the stack is freed. - Small cleanups in comments and a str_plural() conversion" * tag 'smp-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu() cpu/hotplug: Reverse order of iteration in freeze_secondary_cpus() smp: Use str_plural() to fix Coccinelle warnings cpu/hotplug: Fix typo in comment
2024-07-15Merge tag 'for-6.11/io_uring-20240714' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: "Here are the io_uring updates queued up for 6.11. Nothing major this time around, various minor improvements and cleanups/fixes. This contains: - Add bind/listen opcodes. Main motivation is to support direct descriptors, to avoid needing a regular fd just for doing these two operations (Gabriel) - Probe fixes (Gabriel) - Treat io-wq work flags as atomics. Not fixing a real issue, but may as well and it silences a KCSAN warning (me) - Cleanup of rsrc __set_current_state() usage (me) - Add 64-bit for {m,f}advise operations (me) - Improve performance of data ring messages (me) - Fix for ring message overflow posting (Pavel) - Fix for freezer interaction with TWA_NOTIFY_SIGNAL. Not strictly an io_uring thing, but since TWA_NOTIFY_SIGNAL was originally added for faster task_work signaling for io_uring, bundling it with this pull (Pavel) - Add Pavel as a co-maintainer - Various cleanups (me, Thorsten)" * tag 'for-6.11/io_uring-20240714' of git://git.kernel.dk/linux: (28 commits) io_uring/net: check socket is valid in io_bind()/io_listen() kernel: rerun task_work while freezing in get_signal() io_uring/io-wq: limit retrying worker initialisation io_uring/napi: Remove unnecessary s64 cast io_uring/net: cleanup io_recv_finish() bundle handling io_uring/msg_ring: fix overflow posting MAINTAINERS: change Pavel Begunkov from io_uring reviewer to maintainer io_uring/msg_ring: use kmem_cache_free() to free request io_uring/msg_ring: check for dead submitter task io_uring/msg_ring: add an alloc cache for io_kiocb entries io_uring/msg_ring: improve handling of target CQE posting io_uring: add io_add_aux_cqe() helper io_uring: add remote task_work execution helper io_uring/msg_ring: tighten requirement for remote posting io_uring: Allocate only necessary memory in io_probe io_uring: Fix probe of disabled operations io_uring: Introduce IORING_OP_LISTEN io_uring: Introduce IORING_OP_BIND net: Split a __sys_listen helper for io_uring net: Split a __sys_bind helper for io_uring ...
2024-07-15Merge tag 'vfs-6.11.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Support passing NULL along AT_EMPTY_PATH for statx(). NULL paths with any flag value other than AT_EMPTY_PATH go the usual route and end up with -EFAULT to retain compatibility (Rust is abusing calls of the sort to detect availability of statx) This avoids path lookup code, lockref management, memory allocation and in case of NULL path userspace memory access (which can be quite expensive with SMAP on x86_64) - Don't block i_writecount during exec. Remove the deny_write_access() mechanism for executables - Relax open_by_handle_at() permissions in specific cases where we can prove that the caller had sufficient privileges to open a file - Switch timespec64 fields in struct inode to discrete integers freeing up 4 bytes Fixes: - Fix false positive circular locking warning in hfsplus - Initialize hfs_inode_info after hfs_alloc_inode() in hfs - Avoid accidental overflows in vfs_fallocate() - Don't interrupt fallocate with EINTR in tmpfs to avoid constantly restarting shmem_fallocate() - Add missing quote in comment in fs/readdir Cleanups: - Don't assign and test in an if statement in mqueue. Move the assignment out of the if statement - Reflow the logic in may_create_in_sticky() - Remove the usage of the deprecated ida_simple_xx() API from procfs - Reject FSCONFIG_CMD_CREATE_EXCL requets that depend on the new mount api early - Rename variables in copy_tree() to make it easier to understand - Replace WARN(down_read_trylock, ...) abuse with proper asserts in various places in the VFS - Get rid of user_path_at_empty() and drop the empty argument from getname_flags() - Check for error while copying and no path in one branch in getname_flags() - Avoid redundant smp_mb() for THP handling in do_dentry_open() - Rename parent_ino to d_parent_ino and make it use RCU - Remove unused header include in fs/readdir - Export in_group_capable() helper and switch f2fs and fuse over to it instead of open-coding the logic in both places" * tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) ipc: mqueue: remove assignment from IS_ERR argument vfs: rename parent_ino to d_parent_ino and make it use RCU vfs: support statx(..., NULL, AT_EMPTY_PATH, ...) stat: use vfs_empty_path() helper fs: new helper vfs_empty_path() fs: reflow may_create_in_sticky() vfs: remove redundant smp_mb for thp handling in do_dentry_open fuse: Use in_group_or_capable() helper f2fs: Use in_group_or_capable() helper fs: Export in_group_or_capable() vfs: reorder checks in may_create_in_sticky hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode() proc: Remove usage of the deprecated ida_simple_xx() API hfsplus: fix to avoid false alarm of circular locking Improve readability of copy_tree vfs: shave a branch in getname_flags vfs: retire user_path_at_empty and drop empty arg from getname_flags vfs: stop using user_path_at_empty in do_readlinkat tmpfs: don't interrupt fallocate with EINTR fs: don't block i_writecount during exec ...
2024-07-14workqueue: Always queue work items to the newest PWQ for order workqueuesLai Jiangshan
To ensure non-reentrancy, __queue_work() attempts to enqueue a work item to the pool of the currently executing worker. This is not only unnecessary for an ordered workqueue, where order inherently suggests non-reentrancy, but it could also disrupt the sequence if the item is not enqueued on the newest PWQ. Just queue it to the newest PWQ and let order management guarantees non-reentrancy. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Fixes: 4c065dbce1e8 ("workqueue: Enable unbound cpumask update on ordered workqueues") Cc: stable@vger.kernel.org # v6.9+ Signed-off-by: Tejun Heo <tj@kernel.org> (cherry picked from commit 74347be3edfd11277799242766edf844c43dd5d3)
2024-07-14Merge branch 'for-6.10-fixes' into for-6.11Tejun Heo
2024-07-14Merge tag 'sched_urgent_for_v6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Fix a performance regression when measuring the CPU time of a thread (clock_gettime(CLOCK_THREAD_CPUTIME_ID,...)) due to the addition of PSI IRQ time accounting in the hotpath - Fix a task_struct leak due to missing to decrement the refcount when the task is enqueued before the timer which is supposed to do that, expires - Revert an attempt to expedite detaching of movable tasks, as finding those could become very costly. Turns out the original issue wasn't even hit by anyone * tag 'sched_urgent_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath sched/deadline: Fix task_struct reference leak Revert "sched/fair: Make sure to try to detach at least one movable task"
2024-07-13Merge tag 'timers-v6.11-rc1' of ↵Thomas Gleixner
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/event driver updates from Daniel Lezcano: - Remove unnecessary local variables initialization as they will be initialized in the code path anyway right after on the ARM arch timer and the ARM global timer (Li kunyu) - Fix a race condition in the interrupt leading to a deadlock on the SH CMT driver. Note that this fix was not tested on the platform using this timer but the fix seems reasonable enough to be picked confidently (Niklas Söderlund) - Increase the rating of the gic-timer and use the configured width clocksource register on the MIPS architecture (Jiaxun Yang) - Add the DT bindings for the TMU on the Renesas platforms (Geert Uytterhoeven) - Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas Bonnefille) - Add the rtl-otto timer driver along with the DT bindings for the Realtek platform (Chris Packham) Link: https://lore.kernel.org/all/91cd05de-4c5d-4242-a381-3b8a4fe6a2a2@linaro.org
2024-07-12Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-07-12 We've added 23 non-merge commits during the last 3 day(s) which contain a total of 18 files changed, 234 insertions(+), 243 deletions(-). The main changes are: 1) Improve BPF verifier by utilizing overflow.h helpers to check for overflows, from Shung-Hsi Yu. 2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT when attr->attach_prog_fd was not specified, from Tengda Wu. 3) Fix arm64 BPF JIT when generating code for BPF trampolines with BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits, from Puranjay Mohan. 4) Remove test_run callback from lwt_seg6local_prog_ops which never worked in the first place and caused syzbot reports, from Sebastian Andrzej Siewior. 5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/ /KF_RCU-typed BPF kfuncs, from Matt Bobrowski. 6) Fix a long standing bug in libbpf with regards to handling of BPF skeleton's forward and backward compatibility, from Andrii Nakryiko. 7) Annotate btf_{seq,snprintf}_show functions with __printf, from Alan Maguire. 8) BPF selftest improvements to reuse common network helpers in sk_lookup test and dropping the open-coded inetaddr_len() and make_socket() ones, from Geliang Tang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type() bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again bpf: use check_sub_overflow() to check for subtraction overflows bpf: use check_add_overflow() to check for addition overflows bpf: fix overflow check in adjust_jmp_off() bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o bpf: annotate BTF show functions with __printf bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG selftests/bpf: Close obj in error path in xdp_adjust_tail selftests/bpf: Null checks for links in bpf_tcp_ca selftests/bpf: Use connect_fd_to_fd in sk_lookup selftests/bpf: Use start_server_addr in sk_lookup selftests/bpf: Use start_server_str in sk_lookup selftests/bpf: Close fd in error path in drop_on_reuseport selftests/bpf: Add ASSERT_OK_FD macro selftests/bpf: Add backlog for network_helper_opts selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m bpf: Remove tst_run from lwt_seg6local_prog_ops. bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU ... ==================== Link: https://patch.msgid.link/20240712212448.5378-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12cgroup/misc: Introduce misc.events.localXiu Jianfeng
Currently the event counting provided by misc.events is hierarchical, it's not practical if user is only concerned with events of a specified cgroup. Therefore, introduce misc.events.local collect events specific to the given cgroup. This is analogous to memory.events.local and pids.events.local. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-12bpf: use check_sub_overflow() to check for subtraction overflowsShung-Hsi Yu
Similar to previous patch that drops signed_add*_overflows() and uses (compiler) builtin-based check_add_overflow(), do the same for signed_sub*_overflows() and replace them with the generic check_sub_overflow() to make future refactoring easier and have the checks implemented more efficiently. Unsigned overflow check for subtraction does not use helpers and are simple enough already, so they're left untouched. After the change GCC 13.3.0 generates cleaner assembly on x86_64: if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) || 139bf: mov 0x28(%r12),%rax 139c4: mov %edx,0x54(%r12) 139c9: sub %r11,%rax 139cc: mov %rax,0x28(%r12) 139d1: jo 14627 <adjust_reg_min_max_vals+0x1237> check_sub_overflow(*dst_smax, src_reg->smin_value, dst_smax)) { 139d7: mov 0x30(%r12),%rax 139dc: sub %r9,%rax 139df: mov %rax,0x30(%r12) if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) || 139e4: jo 14627 <adjust_reg_min_max_vals+0x1237> ... *dst_smin = S64_MIN; 14627: movabs $0x8000000000000000,%rax 14631: mov %rax,0x28(%r12) *dst_smax = S64_MAX; 14636: sub $0x1,%rax 1463a: mov %rax,0x30(%r12) Before the change it gives: if (signed_sub_overflows(dst_reg->smin_value, smax_val) || 13a50: mov 0x28(%r12),%rdi 13a55: mov %edx,0x54(%r12) dst_reg->smax_value = S64_MAX; 13a5a: movabs $0x7fffffffffffffff,%rdx 13a64: mov %eax,0x50(%r12) dst_reg->smin_value = S64_MIN; 13a69: movabs $0x8000000000000000,%rax s64 res = (s64)((u64)a - (u64)b); 13a73: mov %rdi,%rsi 13a76: sub %rcx,%rsi if (b < 0) 13a79: test %rcx,%rcx 13a7c: js 145ea <adjust_reg_min_max_vals+0x119a> if (signed_sub_overflows(dst_reg->smin_value, smax_val) || 13a82: cmp %rsi,%rdi 13a85: jl 13ac7 <adjust_reg_min_max_vals+0x677> signed_sub_overflows(dst_reg->smax_value, smin_val)) { 13a87: mov 0x30(%r12),%r8 s64 res = (s64)((u64)a - (u64)b); 13a8c: mov %r8,%rax 13a8f: sub %r9,%rax return res > a; 13a92: cmp %rax,%r8 13a95: setl %sil if (b < 0) 13a99: test %r9,%r9 13a9c: js 147d1 <adjust_reg_min_max_vals+0x1381> dst_reg->smax_value = S64_MAX; 13aa2: movabs $0x7fffffffffffffff,%rdx dst_reg->smin_value = S64_MIN; 13aac: movabs $0x8000000000000000,%rax if (signed_sub_overflows(dst_reg->smin_value, smax_val) || 13ab6: test %sil,%sil 13ab9: jne 13ac7 <adjust_reg_min_max_vals+0x677> dst_reg->smin_value -= smax_val; 13abb: mov %rdi,%rax dst_reg->smax_value -= smin_val; 13abe: mov %r8,%rdx dst_reg->smin_value -= smax_val; 13ac1: sub %rcx,%rax dst_reg->smax_value -= smin_val; 13ac4: sub %r9,%rdx 13ac7: mov %rax,0x28(%r12) ... 13ad1: mov %rdx,0x30(%r12) ... if (signed_sub_overflows(dst_reg->smin_value, smax_val) || 145ea: cmp %rsi,%rdi 145ed: jg 13ac7 <adjust_reg_min_max_vals+0x677> 145f3: jmp 13a87 <adjust_reg_min_max_vals+0x637> Suggested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20240712080127.136608-4-shung-hsi.yu@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-12bpf: use check_add_overflow() to check for addition overflowsShung-Hsi Yu
signed_add*_overflows() was added back when there was no overflow-check helper. With the introduction of such helpers in commit f0907827a8a91 ("compiler.h: enable builtin overflow checkers and add fallback code"), we can drop signed_add*_overflows() in kernel/bpf/verifier.c and use the generic check_add_overflow() instead. This will make future refactoring easier, and takes advantage of compiler-emitted hardware instructions that efficiently implement these checks. After the change GCC 13.3.0 generates cleaner assembly on x86_64: err = adjust_scalar_min_max_vals(env, insn, dst_reg, *src_reg); 13625: mov 0x28(%rbx),%r9 /* r9 = src_reg->smin_value */ 13629: mov 0x30(%rbx),%rcx /* rcx = src_reg->smax_value */ ... if (check_add_overflow(*dst_smin, src_reg->smin_value, dst_smin) || 141c1: mov %r9,%rax 141c4: add 0x28(%r12),%rax 141c9: mov %rax,0x28(%r12) 141ce: jo 146e4 <adjust_reg_min_max_vals+0x1294> check_add_overflow(*dst_smax, src_reg->smax_value, dst_smax)) { 141d4: add 0x30(%r12),%rcx 141d9: mov %rcx,0x30(%r12) if (check_add_overflow(*dst_smin, src_reg->smin_value, dst_smin) || 141de: jo 146e4 <adjust_reg_min_max_vals+0x1294> ... *dst_smin = S64_MIN; 146e4: movabs $0x8000000000000000,%rax 146ee: mov %rax,0x28(%r12) *dst_smax = S64_MAX; 146f3: sub $0x1,%rax 146f7: mov %rax,0x30(%r12) Before the change it gives: s64 smin_val = src_reg->smin_value; 675: mov 0x28(%rsi),%r8 s64 smax_val = src_reg->smax_value; u64 umin_val = src_reg->umin_value; u64 umax_val = src_reg->umax_value; 679: mov %rdi,%rax /* rax = dst_reg */ if (signed_add_overflows(dst_reg->smin_value, smin_val) || 67c: mov 0x28(%rdi),%rdi /* rdi = dst_reg->smin_value */ u64 umin_val = src_reg->umin_value; 680: mov 0x38(%rsi),%rdx u64 umax_val = src_reg->umax_value; 684: mov 0x40(%rsi),%rcx s64 res = (s64)((u64)a + (u64)b); 688: lea (%r8,%rdi,1),%r9 /* r9 = dst_reg->smin_value + src_reg->smin_value */ return res < a; 68c: cmp %r9,%rdi 68f: setg %r10b /* r10b = (dst_reg->smin_value + src_reg->smin_value) > dst_reg->smin_value */ if (b < 0) 693: test %r8,%r8 696: js 72b <scalar_min_max_add+0xbb> signed_add_overflows(dst_reg->smax_value, smax_val)) { dst_reg->smin_value = S64_MIN; dst_reg->smax_value = S64_MAX; 69c: movabs $0x7fffffffffffffff,%rdi s64 smax_val = src_reg->smax_value; 6a6: mov 0x30(%rsi),%r8 dst_reg->smin_value = S64_MIN; 6aa: 00 00 00 movabs $0x8000000000000000,%rsi if (signed_add_overflows(dst_reg->smin_value, smin_val) || 6b4: test %r10b,%r10b /* (dst_reg->smin_value + src_reg->smin_value) > dst_reg->smin_value ? goto 6cb */ 6b7: jne 6cb <scalar_min_max_add+0x5b> signed_add_overflows(dst_reg->smax_value, smax_val)) { 6b9: mov 0x30(%rax),%r10 /* r10 = dst_reg->smax_value */ s64 res = (s64)((u64)a + (u64)b); 6bd: lea (%r10,%r8,1),%r11 /* r11 = dst_reg->smax_value + src_reg->smax_value */ if (b < 0) 6c1: test %r8,%r8 6c4: js 71e <scalar_min_max_add+0xae> if (signed_add_overflows(dst_reg->smin_value, smin_val) || 6c6: cmp %r11,%r10 /* (dst_reg->smax_value + src_reg->smax_value) <= dst_reg->smax_value ? goto 723 */ 6c9: jle 723 <scalar_min_max_add+0xb3> } else { dst_reg->smin_value += smin_val; dst_reg->smax_value += smax_val; } 6cb: mov %rsi,0x28(%rax) ... 6d5: mov %rdi,0x30(%rax) ... if (signed_add_overflows(dst_reg->smin_value, smin_val) || 71e: cmp %r11,%r10 721: jl 6cb <scalar_min_max_add+0x5b> dst_reg->smin_value += smin_val; 723: mov %r9,%rsi dst_reg->smax_value += smax_val; 726: mov %r11,%rdi 729: jmp 6cb <scalar_min_max_add+0x5b> return res > a; 72b: cmp %r9,%rdi 72e: setl %r10b 732: jmp 69c <scalar_min_max_add+0x2c> 737: nopw 0x0(%rax,%rax,1) Note: unlike adjust_ptr_min_max_vals() and scalar*_min_max_add(), it is necessary to introduce intermediate variable in adjust_jmp_off() to keep the functional behavior unchanged. Without an intermediate variable imm/off will be altered even on overflow. Suggested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/r/20240712080127.136608-3-shung-hsi.yu@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-12bpf: fix overflow check in adjust_jmp_off()Shung-Hsi Yu
adjust_jmp_off() incorrectly used the insn->imm field for all overflow check, which is incorrect as that should only be done or the BPF_JMP32 | BPF_JA case, not the general jump instruction case. Fix it by using insn->off for overflow check in the general case. Fixes: 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and jump to the 1st insn.") Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/r/20240712080127.136608-2-shung-hsi.yu@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-12bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.oAlan Maguire
As reported by Mirsad [1] we still see format warnings in kernel/bpf/btf.o at W=1 warning level: CC kernel/bpf/btf.o ./kernel/bpf/btf.c: In function ‘btf_type_seq_show_flags’: ./kernel/bpf/btf.c:7553:21: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format] 7553 | sseq.showfn = btf_seq_show; | ^ ./kernel/bpf/btf.c: In function ‘btf_type_snprintf_show’: ./kernel/bpf/btf.c:7604:31: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format] 7604 | ssnprintf.show.showfn = btf_snprintf_show; | ^ Combined with CONFIG_WERROR=y these can halt the build. The fix (annotating the structure field with __printf()) suggested by Mirsad resolves these. Apologies I missed this last time. No other W=1 warnings were observed in kernel/bpf after this fix. [1] https://lore.kernel.org/bpf/92c9d047-f058-400c-9c7d-81d4dc1ef71b@gmail.com/ Fixes: b3470da314fd ("bpf: annotate BTF show functions with __printf") Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com> Suggested-by: Mirsad Todorovac <mtodorovac69@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240712092859.1390960-1-alan.maguire@oracle.com
2024-07-11workqueue: Rename wq_update_pod() to unbound_wq_update_pwq()Lai Jiangshan
What wq_update_pod() does is just to update the pwq of the specific cpu. Rename it and update the comments. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove the arguments @hotplug_cpu and @online from wq_update_pod()Lai Jiangshan
The arguments @hotplug_cpu and @online are not used in wq_update_pod() since the functions called by wq_update_pod() don't need them. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove the argument @cpu_going_down from wq_calc_pod_cpumask()Lai Jiangshan
wq_calc_pod_cpumask() uses wq_online_cpumask, which excludes the cpu going down, so the argument cpu_going_down is unused and can be removed. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove the unneeded cpumask empty check in wq_calc_pod_cpumask()Lai Jiangshan
The cpumask empty check in wq_calc_pod_cpumask() has long been useless. It just works purely as documents which states that the cpumask is not possible empty after the function returns. Now the code above is even more explicit that the cpumask is not empty, so the document-only empty check can be removed. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()Lai Jiangshan
1726a1713590 ("workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.") led to the following possible deadlock: WARNING: possible recursive locking detected 6.10.0-rc5-00004-g1d4c6111406c #1 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) but task is already holding lock: c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: padata_alloc (kernel/padata.c:1007) ... stack backtrace: ... cpus_read_lock (include/linux/percpu-rwsem.h:53 kernel/cpu.c:488) alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730) padata_alloc (kernel/padata.c:1007 (discriminator 1)) pcrypt_init_padata (crypto/pcrypt.c:327 (discriminator 1)) pcrypt_init (crypto/pcrypt.c:353) do_one_initcall (init/main.c:1267) do_initcalls (init/main.c:1328 (discriminator 1) init/main.c:1345 (discriminator 1)) kernel_init_freeable (init/main.c:1364) kernel_init (init/main.c:1469) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_32.S:737) entry_INT80_32 (arch/x86/entry/entry_32.S:944) This is caused by pcrypt allocating a workqueue while holding cpus_read_lock(), so workqueue code can't do it again as that can lead to deadlocks if down_write starts after the first down_read. The pwq creations and installations have been reworked based on wq_online_cpumask rather than cpu_online_mask making cpus_read_lock() is unneeded during wqattrs changes. Fix the deadlock by removing cpus_read_lock() from apply_wqattrs_lock(). tj: Updated changelog. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Fixes: 1726a1713590 ("workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.") Link: http://lkml.kernel.org/r/202407081521.83b627c1-lkp@intel.com Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Simplify wq_calc_pod_cpumask() with wq_online_cpumaskLai Jiangshan
Avoid relying on cpu_online_mask for wqattrs changes so that cpus_read_lock() can be removed from apply_wqattrs_lock(). And with wq_online_cpumask, attrs->__pod_cpumask doesn't need to be reused as a temporary storage to calculate if the pod have any online CPUs @attrs wants since @cpu_going_down is not in the wq_online_cpumask. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11workqueue: Add wq_online_cpumaskLai Jiangshan
The new wq_online_mask mirrors the cpu_online_mask except during hotplugging; specifically, it differs between the hotplugging stages of workqueue_offline_cpu() and workqueue_online_cpu(), during which the transitioning CPU is not represented in the mask. Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-11bpf: annotate BTF show functions with __printfAlan Maguire
-Werror=suggest-attribute=format warns about two functions in kernel/bpf/btf.c [1]; add __printf() annotations to silence these warnings since for CONFIG_WERROR=y they will trigger build failures. [1] https://lore.kernel.org/bpf/a8b20c72-6631-4404-9e1f-0410642d7d20@gmail.com/ Fixes: 31d0bc81637d ("bpf: Move to generic BTF show support, apply it to seq files/strings") Reported-by: Mirsad Todorovac <mtodorovac69@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Mirsad Todorovac <mtodorovac69@yahoo.com> Link: https://lore.kernel.org/r/20240711182321.963667-1-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/sched/act_ct.c 26488172b029 ("net/sched: Fix UAF when resolving a clash") 3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-11tick/broadcast: Make takeover of broadcast hrtimer reliableYu Liao
Running the LTP hotplug stress test on a aarch64 machine results in rcu_sched stall warnings when the broadcast hrtimer was owned by the un-plugged CPU. The issue is the following: CPU1 (owns the broadcast hrtimer) CPU2 tick_broadcast_enter() // shutdown local timer device broadcast_shutdown_local() ... tick_broadcast_exit() clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT) // timer device is not programmed cpumask_set_cpu(cpu, tick_broadcast_force_mask) initiates offlining of CPU1 take_cpu_down() /* * CPU1 shuts down and does not * send broadcast IPI anymore */ takedown_cpu() hotplug_cpu__broadcast_tick_pull() // move broadcast hrtimer to this CPU clockevents_program_event() bc_set_next() hrtimer_start() /* * timer device is not programmed * because only the first expiring * timer will trigger clockevent * device reprogramming */ What happens is that CPU2 exits broadcast mode with force bit set, then the local timer device is not reprogrammed and CPU2 expects to receive the expired event by the broadcast IPI. But this does not happen because CPU1 is offlined by CPU2. CPU switches the clockevent device to ONESHOT state, but does not reprogram the device. The subsequent reprogramming of the hrtimer broadcast device does not program the clockevent device of CPU2 either because the pending expiry time is already in the past and the CPU expects the event to be delivered. As a consequence all CPUs which wait for a broadcast event to be delivered are stuck forever. Fix this issue by reprogramming the local timer device if the broadcast force bit of the CPU is set so that the broadcast hrtimer is delivered. [ tglx: Massage comment and change log. Add Fixes tag ] Fixes: 989dcb645ca7 ("tick: Handle broadcast wakeup of multiple cpus") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240711124843.64167-1-liaoyu15@huawei.com
2024-07-11x86/xen: remove deprecated xen_nopvspin boot parameterJuergen Gross
The xen_nopvspin boot parameter is deprecated since 2019. nopvspin can be used instead. Remove the xen_nopvspin boot parameter and replace the xen_pvspin variable use cases with nopvspin. This requires to move the nopvspin variable out of the .initdata section, as it needs to be accessed for cpuhotplug, too. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Message-ID: <20240710110139.22300-1-jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11Merge branch 'sched/urgent' into sched/core, to pick up fixes and refresh ↵Ingo Molnar
the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-07-11kernel: rerun task_work while freezing in get_signal()Pavel Begunkov
io_uring can asynchronously add a task_work while the task is getting freezed. TIF_NOTIFY_SIGNAL will prevent the task from sleeping in do_freezer_trap(), and since the get_signal()'s relock loop doesn't retry task_work, the task will spin there not being able to sleep until the freezing is cancelled / the task is killed / etc. Run task_works in the freezer path. Keep the patch small and simple so it can be easily back ported, but we might need to do some cleaning after and look if there are other places with similar problems. Cc: stable@vger.kernel.org Link: https://github.com/systemd/systemd/issues/33626 Fixes: 12db8b690010c ("entry: Add support for TIF_NOTIFY_SIGNAL") Reported-by: Julian Orth <ju.orth@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89ed3a52933370deaaf61a0a620a6ac91f1e754d.1720634146.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-10bpf: Defer work in bpf_timer_cancel_and_freeKumar Kartikeya Dwivedi
Currently, the same case as previous patch (two timer callbacks trying to cancel each other) can be invoked through bpf_map_update_elem as well, or more precisely, freeing map elements containing timers. Since this relies on hrtimer_cancel as well, it is prone to the same deadlock situation as the previous patch. It would be sufficient to use hrtimer_try_to_cancel to fix this problem, as the timer cannot be enqueued after async_cancel_and_free. Once async_cancel_and_free has been done, the timer must be reinitialized before it can be armed again. The callback running in parallel trying to arm the timer will fail, and freeing bpf_hrtimer without waiting is sufficient (given kfree_rcu), and bpf_timer_cb will return HRTIMER_NORESTART, preventing the timer from being rearmed again. However, there exists a UAF scenario where the callback arms the timer before entering this function, such that if cancellation fails (due to timer callback invoking this routine, or the target timer callback running concurrently). In such a case, if the timer expiration is significantly far in the future, the RCU grace period expiration happening before it will free the bpf_hrtimer state and along with it the struct hrtimer, that is enqueued. Hence, it is clear cancellation needs to occur after async_cancel_and_free, and yet it cannot be done inline due to deadlock issues. We thus modify bpf_timer_cancel_and_free to defer work to the global workqueue, adding a work_struct alongside rcu_head (both used at _different_ points of time, so can share space). Update existing code comments to reflect the new state of affairs. Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240709185440.1104957-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10bpf: Fail bpf_timer_cancel when callback is being cancelledKumar Kartikeya Dwivedi
Given a schedule: timer1 cb timer2 cb bpf_timer_cancel(timer2); bpf_timer_cancel(timer1); Both bpf_timer_cancel calls would wait for the other callback to finish executing, introducing a lockup. Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps track of all in-flight cancellation requests for a given BPF timer. Whenever cancelling a BPF timer, we must check if we have outstanding cancellation requests, and if so, we must fail the operation with an error (-EDEADLK) since cancellation is synchronous and waits for the callback to finish executing. This implies that we can enter a deadlock situation involving two or more timer callbacks executing in parallel and attempting to cancel one another. Note that we avoid incrementing the cancelling counter for the target timer (the one being cancelled) if bpf_timer_cancel is not invoked from a callback, to avoid spurious errors. The whole point of detecting cur->cancelling and returning -EDEADLK is to not enter a busy wait loop (which may or may not lead to a lockup). This does not apply in case the caller is in a non-callback context, the other side can continue to cancel as it sees fit without running into errors. Background on prior attempts: Earlier versions of this patch used a bool 'cancelling' bit and used the following pattern under timer->lock to publish cancellation status. lock(t->lock); t->cancelling = true; mb(); if (cur->cancelling) return -EDEADLK; unlock(t->lock); hrtimer_cancel(t->timer); t->cancelling = false; The store outside the critical section could overwrite a parallel requests t->cancelling assignment to true, to ensure the parallely executing callback observes its cancellation status. It would be necessary to clear this cancelling bit once hrtimer_cancel is done, but lack of serialization introduced races. Another option was explored where bpf_timer_start would clear the bit when (re)starting the timer under timer->lock. This would ensure serialized access to the cancelling bit, but may allow it to be cleared before in-flight hrtimer_cancel has finished executing, such that lockups can occur again. Thus, we choose an atomic counter to keep track of all outstanding cancellation requests and use it to prevent lockups in case callbacks attempt to cancel each other while executing in parallel. Reported-by: Dohyun Kim <dohyunkim@google.com> Reported-by: Neel Natu <neelnatu@google.com> Fixes: b00628b1c7d5 ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240709185440.1104957-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10bpf: fix order of args in call to bpf_map_kvcallocMohammad Shehar Yaar Tausif
The original function call passed size of smap->bucket before the number of buckets which raises the error 'calloc-transposed-args' on compilation. Vlastimil Babka added: The order of parameters can be traced back all the way to 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") accross several refactorings, and that's why the commit is used as a Fixes: tag. In v6.10-rc1, a different commit 2c321f3f70bc ("mm: change inlined allocation helpers to account at the call site") however exposed the order of args in a way that gcc-14 has enough visibility to start warning about it, because (in !CONFIG_MEMCG case) bpf_map_kvcalloc is then a macro alias for kvcalloc instead of a static inline wrapper. To sum up the warning happens when the following conditions are all met: - gcc-14 is used (didn't see it with gcc-13) - commit 2c321f3f70bc is present - CONFIG_MEMCG is not enabled in .config - CONFIG_WERROR turns this from a compiler warning to error Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Reviewed-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Christian Kujau <lists@nerdbynature.de> Signed-off-by: Mohammad Shehar Yaar Tausif <sheharyaar48@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20240710100521.15061-2-vbabka@suse.cz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-07-10smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()Zqiang
For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for the debug check in __init_work() to work correctly. But this lacks the counterpart to remove the tracked object from debug objects again, which will cause a debug object warning once the stack is freed. Add the missing destroy_work_on_stack() invocation to cure that. [ tglx: Massaged changelog ] Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240704065213.13559-1-qiang.zhang1211@gmail.com