summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2024-07-16Merge tag 'net-next-6.11' of ↵HEADmasterLinus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Not much excitement - a handful of large patchsets (devmem among them) did not make it in time. Core & protocols: - Use local_lock in addition to local_bh_disable() to protect per-CPU resources in networking, a step closer for local_bh_disable() not to act as a big lock on PREEMPT_RT - Use flex array for netdevice priv area, ensure its cache alignment - Add a sysctl knob to allow user to specify a default rto_min at socket init time. Bit of a big hammer but multiple companies were independently carrying such patch downstream so clearly it's useful - Support scheduling transmission of packets based on CLOCK_TAI - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off using cpusets - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address - Allow configuration of multipath hash seed, to both allow synchronizing hashing of two routers, and preventing partial accidental sync - Improve TCP compliance with RFC 9293 for simultaneous connect() - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it - Support sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled - Introduce IPPROTO_SMC for selecting SMC when socket is created - Allow UDP GSO transmit from devices with no checksum offload - openvswitch: add packet sampling via psample, separating the sampled traffic from "upcall" packets sent to user space for forwarding - nf_tables: shrink memory consumption for transaction objects Things we sprinkled into general kernel code: - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for QCA6390) [ Already merged separately - Linus ] - Add IRQ information in sysfs for auxiliary bus - Introduce guard definition for local_lock - Add aligned flavor of __cacheline_group_{begin, end}() markings for grouping fields in structures BPF: - Notify user space (via epoll) when a struct_ops object is getting detached/unregistered - Add new kfuncs for a generic, open-coded bits iterator - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head - Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible WRT BTF from modules - Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs - riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter - Add the capability to offload the netfilter flowtable in XDP layer through kfuncs Driver API: - Allow users to configure IRQ tresholds between which automatic IRQ moderation can choose - Expand Power Sourcing (PoE) status with power, class and failure reason. Support setting power limits - Track additional RSS contexts in the core, make sure configuration changes don't break them - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths - Support updating firmware on SFP modules Tests and tooling: - mptcp: use net/lib.sh to manage netns - TCP-AO and TCP-MD5: replace debug prints used by tests with tracepoints - openvswitch: make test self-contained (don't depend on OvS CLI tools) Drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - increase the max total outstanding PTP TX packets to 4 - add timestamping statistics support - implement netdev_queue_mgmt_ops - support new RSS context API - Intel (100G, ice, idpf): - implement FEC statistics and dumping signal quality indicators - support E825C products (with 56Gbps PHYs) - nVidia/Mellanox: - support HW-GRO - mlx4/mlx5: support per-queue statistics via netlink - obey the max number of EQs setting in sub-functions - AMD/Solarflare: - support new RSS context API - AMD/Pensando: - ionic: rework fix for doorbell miss to lower overhead and skip it on new HW - Wangxun: - txgbe: support Flow Director perfect filters - Ethernet NICs consumer, embedded and virtual: - Add driver for Tehuti Networks TN40xx chips - Add driver for Meta's internal NIC chips - Add driver for Ethernet MAC on Airoha EN7581 SoCs - Add driver for Renesas Ethernet-TSN devices - Google cloud vNIC: - flow steering support - Microsoft vNIC: - support page sizes other than 4KB on ARM64 - vmware vNIC: - support latency measurement (update to version 9) - VirtIO net: - support for Byte Queue Limits - support configuring thresholds for automatic IRQ moderation - support for AF_XDP Rx zero-copy - Synopsys (stmmac): - support for STM32MP13 SoC - let platforms select the right PCS implementation - TI: - icssg-prueth: add multicast filtering support - icssg-prueth: enable PTP timestamping and PPS - Renesas: - ravb: improve Rx performance 30-400% by using page pool, theaded NAPI and timer-based IRQ coalescing - ravb: add MII support for R-Car V4M - Cadence (macb): - macb: add ARP support to Wake-On-LAN - Cortina: - use phylib for RX and TX pause configuration - Ethernet switches: - nVidia/Mellanox: - support configuration of multipath hash seed - report more accurate max MTU - use page_pool to improve Rx performance - MediaTek: - mt7530: add support for bridge port isolation - Qualcomm: - qca8k: add support for bridge port isolation - Microchip: - lan9371/2: add 100BaseTX PHY support - NXP: - vsc73xx: implement VLAN operations - Ethernet PHYs: - aquantia: enable support for aqr115c - aquantia: add support for PHY LEDs - realtek: add support for rtl8224 2.5Gbps PHY - xpcs: add memory-mapped device support - add BroadR-Reach link mode and support in Broadcom's PHY driver - CAN: - add document for ISO 15765-2 protocol support - mcp251xfd: workaround for erratum DS80000789E, use timestamps to catch when device returns incorrect FIFO status - WiFi: - mac80211/cfg80211: - parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers - improvements for 6 GHz regulatory flexibility - multi-link improvements - support multiple radios per wiphy - remove DEAUTH_NEED_MGD_TX_PREP flag - Intel (iwlwifi): - bump FW API to 91 for BZ/SC devices - report 64-bit radiotap timestamp - enable P2P low latency by default - handle Transmit Power Envelope (TPE) advertised by AP - remove support for older FW for new devices - fast resume (keeping the device configured) - mvm: re-enable Multi-Link Operation (MLO) - aggregation (A-MSDU) optimizations - MediaTek (mt76): - mt7925 Multi-Link Operation (MLO) support - Qualcomm (ath10k): - LED support for various chipsets - Qualcomm (ath12k): - remove unsupported Tx monitor handling - support channel 2 in 6 GHz band - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) - support dynamic VLAN - add panic handler for resetting the firmware state - DebugFS support for datapath statistics - WCN7850: support for Wake on WLAN - Microchip (wilc1000): - read MAC address during probe to make it visible to user space - suspend/resume improvements - TI (wl18xx): - support newer firmware versions - RealTek (rtw89): - preparation for RTL8852BE-VT support - Wake on WLAN support for WiFi 6 chips - 36-bit PCI DMA support - RealTek (rtlwifi): - RTL8192DU support - Broadcom (brcmfmac): - Management Frame Protection support (to enable WPA3) - Bluetooth: - qualcomm: use the power sequencer for QCA6390 - btusb: mediatek: add ISO data transmission functions - hci_bcm4377: add BCM4388 support - btintel: add support for BlazarU core - btintel: add support for Whale Peak2 - btnxpuart: add support for AW693 A1 chipset - btnxpuart: add support for IW615 chipset - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591" * tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits) eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering" tcp: Replace strncpy() with strscpy() wifi: ath12k: fix build vs old compiler tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). eth: fbnic: Write the TCAM tables used for RSS control and Rx to host eth: fbnic: Add L2 address programming eth: fbnic: Add basic Rx handling eth: fbnic: Add basic Tx handling eth: fbnic: Add link detection eth: fbnic: Add initial messaging to notify FW of our presence eth: fbnic: Implement Rx queue alloc/start/stop/free eth: fbnic: Implement Tx queue alloc/start/stop/free eth: fbnic: Allocate a netdevice and napi vectors with queues eth: fbnic: Add FW communication mechanism eth: fbnic: Add message parsing for FW messages eth: fbnic: Add register init to set PCIe/Ethernet device config eth: fbnic: Allocate core device specific structures and devlink interface eth: fbnic: Add scaffolding for Meta's NIC driver PCI: Add Meta Platforms vendor ID net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK ...
2024-07-16Merge tag 'perf-core-2024-07-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Intel PT support enhancements & fixes - Fix leaked SIGTRAP events - Improve and fix the Intel uncore driver - Add support for Intel HBM and CXL uncore counters - Add Intel Lake and Arrow Lake support - AMD uncore driver fixes - Make SIGTRAP and __perf_pending_irq() work on RT - Micro-optimizations - Misc cleanups and fixes * tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) perf/x86/intel: Add a distinct name for Granite Rapids perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR perf: Split __perf_pending_irq() out of perf_pending_irq() perf: Don't disable preemption in perf_pending_task(). perf: Move swevent_htable::recursion into task_struct. perf: Shrink the size of the recursion counter. perf: Enqueue SIGTRAP always via task_work. task_work: Add TWA_NMI_CURRENT as an additional notify mode. perf: Move irq_work_queue() where the event is prepared. perf: Fix event leak upon exec and file release perf: Fix event leak upon exit task_work: Introduce task_work_cancel() again task_work: s/task_work_cancel()/task_work_cancel_func()/ perf/x86/amd/uncore: Fix DF and UMC domain identification perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable perf/x86/intel: Support Perfmon MSRs aliasing perf/x86/intel: Support PERFEVTSEL extension perf/x86: Add config_mask to represent EVENTSEL bitmask ...
2024-07-16Merge tag 'locking-core-2024-07-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Jump label fixes, including a perf events fix that originally manifested as jump label failures, but was a serialization bug at the usage site - Mark down_write*() helpers as __always_inline, to improve WCHAN debuggability - Misc cleanups and fixes * tag 'locking-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Add __always_inline annotation to __down_write_common() and inlined callers jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() jump_label: Clarify condition in static_key_fast_inc_not_disabled() jump_label: Fix concurrency issues in static_key_slow_dec() perf/x86: Serialize set_attr_rdpmc() cleanup: Standardize the header guard define's name
2024-07-16Merge tag 'pm-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add a new cpufreq driver for Loongson-3, add support for new features in the intel_pstate (Lunar Lake and Arrow Lake platforms, OOB mode for Emerald Rapids, highest performance change interrupt), amd-pstate (fast CPPC) and sun50i (Allwinner H700 speed bin) cpufreq drivers, simplify the cpufreq driver interface, simplify the teo cpuidle governor, adjust the pm-graph utility for a new version of Python, address issues and clean up code. Specifics: - Add Loongson-3 CPUFreq driver support (Huacai Chen) - Add support for the Arrow Lake and Lunar Lake platforms and the out-of-band (OOB) mode on Emerald Rapids to the intel_pstate cpufreq driver, make it support the highest performance change interrupt and clean it up (Srinivas Pandruvada) - Switch cpufreq to new Intel CPU model defines (Tony Luck) - Simplify the cpufreq driver interface by switching the .exit() driver callback to the void return data type (Lizhe, Viresh Kumar) - Make cpufreq_boost_enabled() return bool (Dhruva Gole) - Add fast CPPC support to the amd-pstate cpufreq driver, address multiple assorted issues in it and clean it up (Perry Yuan, Mario Limonciello, Dhananjay Ugwekar, Meng Li, Xiaojian Du) - Add Allwinner H700 speed bin to the sun50i cpufreq driver (Ryan Walklin) - Fix memory leaks and of_node_put() usage in the sun50i and qcom-nvmem cpufreq drivers (Javier Carrasco) - Clean up the sti and dt-platdev cpufreq drivers (Jeff Johnson, Raphael Gallais-Pou) - Fix deferred probe handling in the TI cpufreq driver and wrong return values of ti_opp_supply_probe(), and add OPP tables for the AM62Ax and AM62Px SoCs to it (Bryan Brattlof, Primoz Fiser) - Avoid overflow of target_freq in .fast_switch() in the SCMI cpufreq driver (Jagadeesh Kona) - Use dev_err_probe() in every error path in probe in the Mediatek cpufreq driver (Nícolas Prado) - Fix kernel-doc param for longhaul_setstate in the longhaul cpufreq driver (Yang Li) - Fix system resume handling in the CPPC cpufreq driver (Riwen Lu) - Improve the teo cpuidle governor and clean up leftover comments from the menu cpuidle governor (Christian Loehle) - Clean up a comment typo in the teo cpuidle governor (Atul Kumar Pant) - Add missing MODULE_DESCRIPTION() macro to cpuidle haltpoll (Jeff Johnson) - Switch the intel_idle driver to new Intel CPU model defines (Tony Luck) - Switch the Intel RAPL driver new Intel CPU model defines (Tony Luck) - Simplify if condition in the idle_inject driver (Thorsten Blum) - Fix missing cleanup on error in _opp_attach_genpd() (Viresh Kumar) - Introduce an OF helper function to inform if required-opps is used and drop a redundant in-parameter to _set_opp_level() (Ulf Hansson) - Update pm-graph to v5.12 which includes fixes and major code revamp for python3.12 (Todd Brandt) - Address several assorted issues in the cpupower utility (Roman Storozhenko)" * tag 'pm-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (77 commits) cpufreq: sti: fix build warning cpufreq: mediatek: Use dev_err_probe in every error path in probe cpufreq: Add Loongson-3 CPUFreq driver support cpufreq: Make cpufreq_driver->exit() return void cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems cpufreq/amd-pstate-ut: Convert nominal_freq to khz during comparisons cpufreq: pcc: Remove empty exit() callback cpufreq: loongson2: Remove empty exit() callback cpufreq: nforce2: Remove empty exit() callback cpupower: fix lib default installation path cpufreq: docs: Add missing scaling_available_frequencies description cpuidle: teo: Don't count non-existent intercepts cpupower: Disable direct build of the 'bench' subproject cpuidle: teo: Remove recent intercepts metric Revert: "cpuidle: teo: Introduce util-awareness" cpufreq: make cpufreq_boost_enabled() return bool cpufreq: intel_pstate: Support highest performance change interrupt x86/cpufeatures: Add HWP highest perf change feature flag Documentation: cpufreq: amd-pstate: update doc for Per CPU boost control method cpufreq: amd-pstate: Cap the CPPC.max_perf to nominal_perf if CPB is off ...
2024-07-16Merge tag 'hardening-v6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - lkdtm/bugs: add test for hung smp_call_function_single() (Mark Rutland) - gcc-plugins: Remove duplicate included header file stringpool.h (Thorsten Blum) - ARM: Remove address checking for MMUless devices (Yanjun Yang) - randomize_kstack: Clean up per-arch entropy and codegen - KCFI: Make FineIBT mode Kconfig selectable - fortify: Do not special-case 0-sized destinations * tag 'hardening-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: randomize_kstack: Improve stack alignment codegen ARM: Remove address checking for MMUless devices gcc-plugins: Remove duplicate included header file stringpool.h randomize_kstack: Remove non-functional per-arch entropy filtering fortify: Do not special-case 0-sized destinations x86/alternatives: Make FineIBT mode Kconfig selectable lkdtm/bugs: add test for hung smp_call_function_single()
2024-07-16Merge tag 'for-linus-6.11-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - some trivial cleanups - a fix for the Xen timer - add boot time selectable debug capability to the Xen multicall handling - two fixes for the recently added Xen irqfd handling * tag 'for-linus-6.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: remove deprecated xen_nopvspin boot parameter x86/xen: eliminate some private header files x86/xen: make some functions static xen: make multicall debug boot time selectable xen/arm: Convert comma to semicolon xen: privcmd: Fix possible access to a freed kirqfd instance xen: privcmd: Switch from mutex to spinlock for irqfds xen: add missing MODULE_DESCRIPTION() macros x86/xen: Convert comma to semicolon x86/xen/time: Reduce Xen timer tick xen/manage: Constify struct shutdown_handler
2024-07-16Merge tag 'efi-next-for-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Note the removal of the EFI fake memory map support - this is believed to be unused and no longer worth supporting. However, we could easily bring it back if needed. With recent developments regarding confidential VMs and unaccepted memory, combined with kexec, creating a known inaccurate view of the firmware's memory map and handing it to the OS is a feature we can live without, hence the removal. Alternatively, I could imagine making this feature mutually exclusive with those confidential VM related features, but let's try simply removing it first. Summary: - Drop support for the 'fake' EFI memory map on x86 - Add an SMBIOS based tweak to the EFI stub instructing the firmware on x86 Macbook Pros to keep both GPUs enabled - Replace 0-sized array with flexible array in EFI memory attributes table handling - Drop redundant BSS clearing when booting via the native PE entrypoint on x86 - Avoid returning EFI_SUCCESS when aborting on an out-of-memory condition - Cosmetic tweak for arm64 KASLR loading logic" * tag 'efi-next-for-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: Replace efi_memory_attributes_table_t 0-sized array with flexible array efi: Rename efi_early_memdesc_ptr() to efi_memdesc_ptr() arm64/efistub: Clean up KASLR logic x86/efistub: Drop redundant clearing of BSS x86/efistub: Avoid returning EFI_SUCCESS on error x86/efistub: Call Apple set_os protocol on dual GPU Intel Macs x86/efistub: Enable SMBIOS protocol handling for x86 efistub/smbios: Simplify SMBIOS enumeration API x86/efi: Drop support for fake EFI memory maps
2024-07-16Merge tag 'asm-generic-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "Most of this is part of my ongoing work to clean up the system call tables. In this bit, all of the newer architectures are converted to use the machine readable syscall.tbl format instead in place of complex macros in include/uapi/asm-generic/unistd.h. This follows an earlier series that fixed various API mismatches and in turn is used as the base for planned simplifications. The other two patches are dead code removal and a warning fix" * tag 'asm-generic-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: vmlinux.lds.h: catch .bss..L* sections into BSS") fixmap: Remove unused set_fixmap_offset_io() riscv: convert to generic syscall table openrisc: convert to generic syscall table nios2: convert to generic syscall table loongarch: convert to generic syscall table hexagon: use new system call table csky: convert to generic syscall table arm64: rework compat syscall macros arm64: generate 64-bit syscall.tbl arm64: convert unistd_32.h to syscall.tbl format arc: convert to generic syscall table clone3: drop __ARCH_WANT_SYS_CLONE3 macro kbuild: add syscall table generation to scripts/Makefile.asm-headers kbuild: verify asm-generic header list loongarch: avoid generating extra header files um: don't generate asm/bpf_perf_event.h csky: drop asm/gpio.h wrapper syscalls: add generic scripts/syscall.tbl
2024-07-16Merge tag 'x86_sev_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Add support for running the kernel in a SEV-SNP guest, over a Secure VM Service Module (SVSM). When running over a SVSM, different services can run at different protection levels, apart from the guest OS but still within the secure SNP environment. They can provide services to the guest, like a vTPM, for example. This series adds the required facilities to interface with such a SVSM module. - The usual fixlets, refactoring and cleanups [ And as always: "SEV" is AMD's "Secure Encrypted Virtualization". I can't be the only one who gets all the newer x86 TLA's confused, can I? - Linus ] * tag 'x86_sev_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI/configfs-tsm: Fix an unexpected indentation silly x86/sev: Do RMP memory coverage check after max_pfn has been set x86/sev: Move SEV compilation units virt: sev-guest: Mark driver struct with __refdata to prevent section mismatch x86/sev: Allow non-VMPL0 execution when an SVSM is present x86/sev: Extend the config-fs attestation support for an SVSM x86/sev: Take advantage of configfs visibility support in TSM fs/configfs: Add a callback to determine attribute visibility sev-guest: configfs-tsm: Allow the privlevel_floor attribute to be updated virt: sev-guest: Choose the VMPCK key based on executing VMPL x86/sev: Provide guest VMPL level to userspace x86/sev: Provide SVSM discovery support x86/sev: Use the SVSM to create a vCPU when not in VMPL0 x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0 x86/sev: Use kernel provided SVSM Calling Areas x86/sev: Check for the presence of an SVSM in the SNP secrets page x86/irqflags: Provide native versions of the local_irq_save()/restore()
2024-07-16Merge tag 'x86_cache_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Enable Sub-NUMA clustering to work with resource control on Intel by teaching resctrl to handle scopes due to the clustering which partitions the L3 cache into sets. Modify and extend the subsystem to handle such scopes properly * tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Update documentation with Sub-NUMA cluster changes x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems x86/resctrl: Make __mon_event_count() handle sum domains x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Allocate a new field in union mon_data_bits x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function x86/resctrl: Initialize on-stack struct rmid_read instances x86/resctrl: Add a new field to struct rmid_read for summation of domains x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems x86/resctrl: Introduce snc_nodes_per_l3_cache x86/resctrl: Add node-scope to the options for feature scope x86/resctrl: Split the rdt_domain and rdt_hw_domain structures x86/resctrl: Prepare for different scope for control/monitor operations x86/resctrl: Prepare to split rdt_domain structure x86/resctrl: Prepare for new domain scope
2024-07-15Merge tag 'x86_cpu_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu model updates from Borislav Petkov: - Flip the logic to add feature names to /proc/cpuinfo to having to explicitly specify the flag if there's a valid reason to show it in /proc/cpuinfo - Switch a bunch of Intel x86 model checking code to the new CPU model defines - Fixes and cleanups * tag 'x86_cpu_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Drop stray FAM6 check with new Intel CPU model defines x86/cpufeatures: Flip the /proc/cpuinfo appearance logic x86/CPU/AMD: Always inline amd_clear_divider() x86/mce/inject: Add missing MODULE_DESCRIPTION() line perf/x86/rapl: Switch to new Intel CPU model defines x86/boot: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines perf/x86/intel: Switch to new Intel CPU model defines x86/virt/tdx: Switch to new Intel CPU model defines x86/PCI: Switch to new Intel CPU model defines x86/cpu/intel: Switch to new Intel CPU model defines x86/platform/intel-mid: Switch to new Intel CPU model defines x86/pconfig: Remove unused MKTME pconfig code x86/cpu: Remove useless work in detect_tme_early()
2024-07-15Merge tag 'x86_bugs_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu mitigation updates from Borislav Petkov: - Add a spectre_bhi=vmexit mitigation option aimed at cloud environments - Remove duplicated Spectre cmdline option documentation - Add separate macro definitions for syscall handlers which do not return in order to address objtool warnings * tag 'x86_bugs_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Add 'spectre_bhi=vmexit' cmdline option x86/bugs: Remove duplicate Spectre cmdline option descriptions x86/syscall: Mark exit[_group] syscall handlers __noreturn
2024-07-15Merge tag 'x86_vmware_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vmware updates from Borislav Petkov: - Add a unified VMware hypercall API layer which should be used by all callers instead of them doing homegrown solutions. This will provide for adding API support for confidential computing solutions like TDX * tag 'x86_vmware_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vmware: Add TDX hypercall support x86/vmware: Remove legacy VMWARE_HYPERCALL* macros x86/vmware: Correct macro names x86/vmware: Use VMware hypercall API drm/vmwgfx: Use VMware hypercall API input/vmmouse: Use VMware hypercall API ptp/vmware: Use VMware hypercall API x86/vmware: Introduce VMware hypercall API
2024-07-15Merge tag 'x86_misc_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: - Make error checking of AMD SMN accesses more robust in the callers as they're the only ones who can interpret the results properly - The usual cleanups and fixes, left and right * tag 'x86_misc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kmsan: Fix hook for unaligned accesses x86/platform/iosf_mbi: Convert PCIBIOS_* return codes to errnos x86/pci/xen: Fix PCIBIOS_* return code handling x86/pci/intel_mid_pci: Fix PCIBIOS_* return code handling x86/of: Return consistent error type from x86_of_pci_irq_enable() hwmon: (k10temp) Rename _data variable hwmon: (k10temp) Remove unused HAVE_TDIE() macro hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters hwmon: (k10temp) Define a helper function to read CCD temperature x86/amd_nb: Enhance SMN access error checking hwmon: (k10temp) Check return value of amd_smn_read() EDAC/amd64: Check return value of amd_smn_read() EDAC/amd64: Remove unused register accesses tools/x86/kcpuid: Add missing dir via Makefile x86, arm: Add missing license tag to syscall tables files
2024-07-15Merge tag 'x86_build_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build update from Borislav Petkov: - Make sure insn support detection uses the proper compiler flag in bi-arch builds * tag 'x86_build_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kconfig: Add as-instr64 macro to properly evaluate AS_WRUSS
2024-07-15Merge tag 'x86_core_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 uaccess update from Borislav Petkov: - Cleanup the 8-byte getuser() asm case * tag 'x86_core_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Improve the 8-byte getuser() case
2024-07-15Merge tag 'x86_cc_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 confidential computing updates from Borislav Petkov: "Unrelated x86/cc changes queued here to avoid ugly cross-merges and conflicts: - Carve out CPU hotplug function declarations into a separate header with the goal to be able to use the lockdep assertions in a more flexible manner - As a result, refactor cacheinfo code after carving out a function to return the cache ID associated with a given cache level - Cleanups Add support to be able to kexec TDX guests: - Expand ACPI MADT CPU offlining support - Add machinery to prepare CoCo guests memory before kexec-ing into a new kernel - Cleanup, readjust and massage related code" * tag 'x86_cc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method x86/mm: Introduce kernel_ident_mapping_free() x86/smp: Add smp_ops.stop_this_cpu() callback x86/acpi: Do not attempt to bring up secondary CPUs in the kexec case x86/acpi: Rename fields in the acpi_madt_multiproc_wakeup structure x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump x86/mm: Make e820__end_ram_pfn() cover E820_TYPE_ACPI ranges x86/tdx: Convert shared memory back to private on kexec x86/mm: Add callbacks to prepare encrypted memory for kexec x86/tdx: Account shared memory x86/mm: Return correct level from lookup_address() if pte is none x86/mm: Make x86_platform.guest.enc_status_change_*() return an error x86/kexec: Keep CR4.MCE set during kexec for TDX guest x86/relocate_kernel: Use named labels for less confusion cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup cpu/hotplug: Add support for declaring CPU offlining not supported x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init x86/acpi: Extract ACPI MADT wakeup code into a separate file x86/kexec: Remove spurious unconditional JMP from from identity_mapped() ...
2024-07-15Merge tag 'x86_cleanups_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Remove an unused function and the documentation of an already removed cmdline parameter * tag 'x86_cleanups_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Remove unused function __fortify_panic() Documentation: Remove "mfgpt_irq=" from the kernel-parameters.txt file
2024-07-15Merge tag 'x86_boot_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Borislav Petkov: - Add a check to warn when cmdline parsing happens before the final cmdline string has been built and thus arguments can get lost - Code cleanups and simplifications * tag 'x86_boot_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/setup: Warn when option parsing is done too early x86/boot: Clean up the arch/x86/boot/main.c code a bit x86/boot: Use current_stack_pointer to avoid asm() in init_heap()
2024-07-15Merge tag 'x86_alternatives_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 alternatives updates from Borislav Petkov: "This is basically PeterZ's idea to nest the alternative macros to avoid the need to "spell out" the number of alternates in an ALTERNATIVE_n() macro and thus have an ever-increasing complexity in those definitions. For ease of bisection, the old macros are converted to the new, nested variants in a step-by-step manner so that in case an issue is encountered during testing, one can pinpoint the place where it fails easier. Because debugging alternatives is a serious pain" * tag 'x86_alternatives_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives, kvm: Fix a couple of CALLs without a frame pointer x86/alternative: Replace the old macros x86/alternative: Convert the asm ALTERNATIVE_3() macro x86/alternative: Convert the asm ALTERNATIVE_2() macro x86/alternative: Convert the asm ALTERNATIVE() macro x86/alternative: Convert ALTERNATIVE_3() x86/alternative: Convert ALTERNATIVE_TERNARY() x86/alternative: Convert alternative_call_2() x86/alternative: Convert alternative_call() x86/alternative: Convert alternative_io() x86/alternative: Convert alternative_input() x86/alternative: Convert alternative_2() x86/alternative: Convert alternative() x86/alternatives: Add nested alternatives macros x86/alternative: Zap alternative_ternary()
2024-07-15Merge tag 'ras_core_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - A cleanup and a correction to the error injection driver to inject a MCA_MISC value only when one has actually been supplied by the user * tag 'ras_core_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Remove unused variable and return value in machine_check_poll() x86/mce/inject: Only write MCA_MISC when a value has been supplied
2024-07-15Merge tag 'timers-core-2024-07-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Updates for timers, timekeeping and related functionality: Core: - Make the takeover of a hrtimer based broadcast timer reliable during CPU hot-unplug. The current implementation suffers from a race which can lead to broadcast timer starvation in the worst case. - VDSO related cleanups and simplifications - Small cleanups and enhancements all over the place PTP: - Replace the architecture specific base clock to clocksource, e.g. ART to TSC, conversion function with generic functionality to avoid exposing such internals to drivers and convert all existing drivers over. This also allows to provide functionality which converts the other way round in the core code based on the same parameter set. - Provide a function to convert CLOCK_REALTIME to the base clock to support the upcoming PPS output driver on Intel platforms. Drivers: - A set of Device Tree bindings for new hardware - Cleanups and enhancements all over the place" * tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) clocksource/drivers/realtek: Add timer driver for rtl-otto platforms dt-bindings: timer: Add schema for realtek,otto-timer dt-bindings: timer: Add SOPHGO SG2002 clint dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support dt-bindings: timer: renesas,tmu: Add RZ/G1 support dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support clocksource/drivers/mips-gic-timer: Correct sched_clock width clocksource/drivers/mips-gic-timer: Refine rating computation clocksource/drivers/sh_cmt: Address race condition for clock events clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq tick/broadcast: Make takeover of broadcast hrtimer reliable tick/sched: Combine WARN_ON_ONCE and print_once x86/vdso: Remove unused include x86/vgtod: Remove unused typedef gtod_long_t x86/vdso: Fix function reference in comment vdso: Add comment about reason for vdso struct ordering vdso/gettimeofday: Clarify comment about open coded function timekeeping: Add missing kernel-doc function comments tick: Remove unnused tick_nohz_get_idle_calls() ...
2024-07-15Merge branch 'word-at-a-time'Linus Torvalds
Merge minor word-at-a-time instruction choice improvements for x86 and arm64. This is the second of four branches that came out of me looking at the code generation for path lookup on arm64. The word-at-a-time infrastructure is used to do string operations in chunks of one word both when copying the pathname from user space (in strncpy_from_user()), and when parsing and hashing the individual path components (in link_path_walk()). In particular, the "find the first zero byte" uses various bit tricks to figure out the end of the string or path component, and get the length without having to do things one byte at a time. Both x86-64 and arm64 had less than optimal code choices for that. The commit message for the arm64 change in particular tries to explain the exact code flow for the zero byte finding for people who care. It's made a bit more complicated by the fact that we support big-endian hardware too, and so we have some extra abstraction layers to allow different models for finding the zero byte, quite apart from the issue of picking specialized instructions. * word-at-a-time: arm64: word-at-a-time: improve byte count calculations for LE x86-64: word-at-a-time: improve byte count calculations
2024-07-15Merge branch 'runtime-constants'Linus Torvalds
Merge runtime constants infrastructure with implementations for x86 and arm64. This is one of four branches that came out of me looking at profiles of my kernel build filesystem load on my 128-core Altra arm64 system, where pathname walking and the user copies (particularly strncpy_from_user() for fetching the pathname from user space) is very hot. This is a very specialized "instruction alternatives" model where the dentry hash pointer and hash count will be constants for the lifetime of the kernel, but the allocation are not static but done early during the kernel boot. In order to avoid the pointer load and dynamic shift, we just rewrite the constants in the instructions in place. We can't use the "generic" alternative instructions infrastructure, because different architectures do it very differently, and it's actually simpler to just have very specific helpers, with a fallback to the generic ("old") model of just using variables for architectures that do not implement the runtime constant patching infrastructure. Link: https://lore.kernel.org/all/CAHk-=widPe38fUNjUOmX11ByDckaeEo9tN4Eiyke9u1SAtu9sA@mail.gmail.com/ * runtime-constants: arm64: add 'runtime constant' support runtime constants: add x86 architecture support runtime constants: add default dummy infrastructure vfs: dcache: move hashlen_hash() from callers into d_hash()
2024-07-13Merge tag 'timers-v6.11-rc1' of ↵Thomas Gleixner
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/event driver updates from Daniel Lezcano: - Remove unnecessary local variables initialization as they will be initialized in the code path anyway right after on the ARM arch timer and the ARM global timer (Li kunyu) - Fix a race condition in the interrupt leading to a deadlock on the SH CMT driver. Note that this fix was not tested on the platform using this timer but the fix seems reasonable enough to be picked confidently (Niklas Söderlund) - Increase the rating of the gic-timer and use the configured width clocksource register on the MIPS architecture (Jiaxun Yang) - Add the DT bindings for the TMU on the Renesas platforms (Geert Uytterhoeven) - Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas Bonnefille) - Add the rtl-otto timer driver along with the DT bindings for the Realtek platform (Chris Packham) Link: https://lore.kernel.org/all/91cd05de-4c5d-4242-a381-3b8a4fe6a2a2@linaro.org
2024-07-11x86/xen: remove deprecated xen_nopvspin boot parameterJuergen Gross
The xen_nopvspin boot parameter is deprecated since 2019. nopvspin can be used instead. Remove the xen_nopvspin boot parameter and replace the xen_pvspin variable use cases with nopvspin. This requires to move the nopvspin variable out of the .initdata section, as it needs to be accessed for cpuhotplug, too. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Message-ID: <20240710110139.22300-1-jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11x86/xen: eliminate some private header filesJuergen Gross
Under arch/x86/xen there is one large private header file xen-ops.h containing most of the Xen-private x86 related declarations, and then there are several small headers with a handful of declarations each. Merge the small headers into xen-ops.h. While doing that, move the declaration of xen_fifo_events from xen-ops.h into include/xen/events.h where it should have been from the beginning. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Message-ID: <20240710093718.14552-3-jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11x86/xen: make some functions staticJuergen Gross
Some functions and variables in arch/x86/xen are used locally only, make them static. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Message-ID: <20240710093718.14552-2-jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11xen: make multicall debug boot time selectableJuergen Gross
Today Xen multicall debugging needs to be enabled via modifying a define in a source file for getting debug data of multicall errors encountered by users. Switch multicall debugging to depend on a boot parameter "xen_mc_debug" instead, enabling affected users to boot with the new parameter set in order to get better diagnostics. With debugging enabled print the following information in case at least one of the batched calls failed: - all calls of the batch with operation, result and caller - all parameters of each call - all parameters stored in the multicall data for each call Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Message-ID: <20240710092749.13595-1-jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-11x86/sev: Do RMP memory coverage check after max_pfn has been setTom Lendacky
The RMP table is probed early in the boot process before max_pfn has been set, so the logic to check if the RMP covers all of system memory is not valid. Move the RMP memory coverage check from snp_probe_rmptable_info() into snp_rmptable_init(), which is well after max_pfn has been set. Also, fix the calculation to use PFN_UP instead of PHYS_PFN, in order to compute the required RMP size properly. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/bec4364c7e34358cc576f01bb197a7796a109169.1718984524.git.thomas.lendacky@amd.com
2024-07-11x86/sev: Move SEV compilation unitsBorislav Petkov (AMD)
A long time ago it was agreed upon that the coco stuff needs to go where it belongs: https://lore.kernel.org/all/Yg5nh1RknPRwIrb8@zn.tnic and not keep it in arch/x86/kernel. TDX did that and SEV can't find time to do so. So lemme do it. If people have trouble converting their ongoing featuritis patches, ask me for a sed script. No functional changes. Move the instrumentation exclusion bits too, as helpfully caught and reported by the 0day folks. Closes: https://lore.kernel.org/oe-kbuild-all/202406220748.hG3qlmDx-lkp@intel.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-lkp/202407091342.46d7dbb-oliver.sang@intel.com Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Ashish Kalra <ashish.kalra@amd.com> Tested-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/r/20240619093014.17962-1-bp@kernel.org
2024-07-10clone3: drop __ARCH_WANT_SYS_CLONE3 macroArnd Bergmann
When clone3() was introduced, it was not obvious how each architecture deals with setting up the stack and keeping the register contents in a fork()-like system call, so this was left for the architecture maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those that already implement it. Five years later, we still have a few architectures left that are missing clone3(), and the macro keeps getting in the way as it's fundamentally different from all the other __ARCH_WANT_SYS_* macros that are meant to provide backwards-compatibility with applications using older syscalls that are no longer provided by default. Address this by reversing the polarity of the macro, adding an __ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3 from all the other ones. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-10Merge back cpufreq material for 6.11.Rafael J. Wysocki
2024-07-09Merge tag 'for-netdev' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-07-08 The following pull-request contains BPF updates for your *net-next* tree. We've added 102 non-merge commits during the last 28 day(s) which contain a total of 127 files changed, 4606 insertions(+), 980 deletions(-). The main changes are: 1) Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman. 2) Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs, from Daniel Xu. 3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement support for BPF exceptions, from Ilya Leoshkevich. 4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui. 5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option for skbs and add coverage along with it, from Vadim Fedorenko. 6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan. 7) Extend the BPF verifier to track the delta between linked registers in order to better deal with recent LLVM code optimizations, from Alexei Starovoitov. 8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should have been a pointer to the map value, from Benjamin Tissoires. 9) Extend BPF selftests to add regular expression support for test output matching and adjust some of the selftest when compiled under gcc, from Cupertino Miranda. 10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always iterates exactly once anyway, from Dan Carpenter. 11) Add the capability to offload the netfilter flowtable in XDP layer through kfuncs, from Florian Westphal & Lorenzo Bianconi. 12) Various cleanups in networking helpers in BPF selftests to shave off a few lines of open-coded functions on client/server handling, from Geliang Tang. 13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so that x86 JIT does not need to implement detection, from Leon Hwang. 14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an out-of-bounds memory access for dynpointers, from Matt Bobrowski. 15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as it might lead to problems on 32-bit archs, from Jiri Olsa. 16) Enhance traffic validation and dynamic batch size support in xsk selftests, from Tushar Vyavahare. bpf-next-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits) selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature bpf: helpers: fix bpf_wq_set_callback_impl signature libbpf: Add NULL checks to bpf_object__{prev_map,next_map} selftests/bpf: Remove exceptions tests from DENYLIST.s390x s390/bpf: Implement exceptions s390/bpf: Change seen_reg to a mask bpf: Remove unnecessary loop in task_file_seq_get_next() riscv, bpf: Optimize stack usage of trampoline bpf, devmap: Add .map_alloc_check selftests/bpf: Remove arena tests from DENYLIST.s390x selftests/bpf: Add UAF tests for arena atomics selftests/bpf: Introduce __arena_global s390/bpf: Support arena atomics s390/bpf: Enable arena s390/bpf: Support address space cast instruction s390/bpf: Support BPF_PROBE_MEM32 s390/bpf: Land on the next JITed instruction after exception s390/bpf: Introduce pre- and post- probe functions s390/bpf: Get rid of get_probe_mem_regno() ... ==================== Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-09perf/x86/intel: Add a distinct name for Granite RapidsKan Liang
Currently, the Sapphire Rapids and Granite Rapids share the same PMU name, sapphire_rapids. Because from the kernel’s perspective, GNR is similar to SPR. The only key difference is that they support different extra MSRs. The code path and the PMU name are shared. However, from end users' perspective, they are quite different. Besides the extra MSRs, GNR has a newer PEBS format, supports Retire Latency, supports new CPUID enumeration architecture, doesn't required the load-latency AUX event, has additional TMA Level 1 Architectural Events, etc. The differences can be enumerated by CPUID or the PERF_CAPABILITIES MSR. They weren't reflected in the model-specific kernel setup. But it is worth to have a distinct PMU name for GNR. Fixes: a6742cb90b56 ("perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL") Suggested-by: Ahmad Yasin <ahmad.yasin@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240708193336.1192217-3-kan.liang@linux.intel.com
2024-07-09perf/x86/intel/ds: Fix non 0 retire latency on RaptorlakeKan Liang
A non-0 retire latency can be observed on a Raptorlake which doesn't support the retire latency feature. By design, the retire latency shares the PERF_SAMPLE_WEIGHT_STRUCT sample type with other types of latency. That could avoid adding too many different sample types to support all kinds of latency. For the machine which doesn't support some kind of latency, 0 should be returned. Perf doesn’t clear/init all the fields of a sample data for the sake of performance. It expects the later perf_{prepare,output}_sample() to update the uninitialized field. However, the current implementation doesn't touch the field of the retire latency if the feature is not supported. The memory garbage is dumped into the perf data. Clear the retire latency if the feature is not supported. Fixes: c87a31093c70 ("perf/x86: Support Retire Latency") Reported-by: "Bayduraev, Alexey V" <alexey.v.bayduraev@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: "Bayduraev, Alexey V" <alexey.v.bayduraev@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240708193336.1192217-4-kan.liang@linux.intel.com
2024-07-09perf/x86/intel: Hide Topdown metrics events if the feature is not enumeratedKan Liang
The below error is observed on Ice Lake VM. $ perf stat Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (slots). /bin/dmesg | grep -i perf may provide additional information. In a virtualization env, the Topdown metrics and the slots event haven't been supported yet. The guest CPUID doesn't enumerate them. However, the current kernel unconditionally exposes the slots event and the Topdown metrics events to sysfs, which misleads the perf tool and triggers the error. Hide the perf-metrics topdown events and the slots event if the perf-metrics feature is not enumerated. The big core of a hybrid platform can also supports the perf-metrics feature. Fix the hybrid platform as well. Closes: https://lore.kernel.org/lkml/CAM9d7cj8z+ryyzUHR+P1Dcpot2jjW+Qcc4CPQpfafTXN=LEU0Q@mail.gmail.com/ Reported-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Dongli Zhang <dongli.zhang@oracle.com> Link: https://lkml.kernel.org/r/20240708193336.1192217-2-kan.liang@linux.intel.com
2024-07-09perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPRKan Liang
The perf stat errors out with UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC_LOCAL event. $perf stat -e uncore_cha_55/event=0x35,umask=0x10c0008101/ -a -- ls event syntax error: '..0x35,umask=0x10c0008101/' \___ Bad event or PMU The definition of the CHA umask is config:8-15,32-55, which is 32bit. However, the umask of the event is bigger than 32bit. This is an error in the original uncore spec. Add a new umask_ext5 for the new CHA umask range. Fixes: 949b11381f81 ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support") Closes: https://lore.kernel.org/linux-perf-users/alpine.LRH.2.20.2401300733310.11354@Diego/ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240708185524.1185505-1-kan.liang@linux.intel.com
2024-07-08x86/efistub: Enable SMBIOS protocol handling for x86Ard Biesheuvel
The smbios.c source file is not currently included in the x86 build, and before we can do so, it needs some tweaks to build correctly in combination with the EFI mixed mode support. Reviewed-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-07-04perf/x86/amd/uncore: Fix DF and UMC domain identificationSandipan Das
For uncore PMUs, a single context is shared across all CPUs in a domain. The domain can be a CCX, like in the case of the L3 PMU, or a socket, like in the case of DF and UMC PMUs. This information is available via the PMU's cpumask. For contexts shared across a socket, the domain is currently determined from topology_die_id() which is incorrect after the introduction of commit 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") as it now returns a CCX identifier on Zen 4 and later systems which support CPUID leaf 0x80000026. Use topology_logical_package_id() instead as it always returns a socket identifier irrespective of the availability of CPUID leaf 0x80000026. Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240626074942.1044818-1-sandipan.das@amd.com
2024-07-04perf/x86/amd/uncore: Avoid PMU registration if counters are unavailableSandipan Das
X86_FEATURE_PERFCTR_NB and X86_FEATURE_PERFCTR_LLC are derived from CPUID leaf 0x80000001 ECX bits 24 and 28 respectively and denote the availability of DF and L3 counters. When these bits are not set, the corresponding PMUs have no counters and hence, should not be registered. Fixes: 07888daa056e ("perf/x86/amd/uncore: Move discovery and registration") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240626074404.1044230-1-sandipan.das@amd.com
2024-07-04perf/x86/intel: Support Perfmon MSRs aliasingKan Liang
The architectural performance monitoring V6 supports a new range of counters' MSRs in the 19xxH address range. They include all the GP counter MSRs, the GP control MSRs, and the fixed counter MSRs. The step between each sibling counter is 4. Add intel_pmu_addr_offset() to calculate the correct offset. Add fixedctr in struct x86_pmu to store the address of the fixed counter 0. It can be used to calculate the rest of the fixed counters. The MSR address of the fixed counter control is not changed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Support PERFEVTSEL extensionKan Liang
Two new fields (the unit mask2, and the equal flag) are added in the IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX. Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout of the PERFEVTSEL. Expose the new formats into sysfs if they are available. The umask extension reuses the same format attr name "umask" as the previous umask. Add umask2_show to determine/display the correct format for the current machine. Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-8-kan.liang@linux.intel.com
2024-07-04perf/x86: Add config_mask to represent EVENTSEL bitmaskKan Liang
Different vendors may support different fields in EVENTSEL MSR, such as Intel would introduce new fields umask2 and eq bits in EVENTSEL MSR since Perfmon version 6. However, a fixed mask X86_RAW_EVENT_MASK is used to filter the attr.config. Introduce a new config_mask to record the real supported EVENTSEL bitmask. Only apply it to the existing code now. No functional change. Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-7-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Support new data source for Lunar LakeKan Liang
A new PEBS data source format is introduced for the p-core of Lunar Lake. The data source field is extended to 8 bits with new encodings. A new layout is introduced into the union intel_x86_pebs_dse. Introduce the lnl_latency_data() to parse the new format. Enlarge the pebs_data_source[] accordingly to include new encodings. Only the mem load and the mem store events can generate the data source. Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and INTEL_HYBRID_STLAT_CONSTRAINT to mark them. Add two new bits for the new cache-related data src, L2_MHB and MSC. The L2_MHB is short for L2 Miss Handling Buffer, which is similar to LFB (Line Fill Buffer), but to track the L2 Cache misses. The MSC stands for the memory-side cache. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-6-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Rename model-specific pebs_latency_data functionsKan Liang
The model-specific pebs_latency_data functions of ADL and MTL use the "small" as a postfix to indicate the e-core. The postfix is too generic for a model-specific function. It cannot provide useful information that can directly map it to a specific uarch, which can facilitate the development and maintenance. Use the abbr of the uarch to rename the model-specific functions. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-5-kan.liang@linux.intel.com
2024-07-04perf/x86: Add Lunar Lake and Arrow Lake supportKan Liang
From PMU's perspective, Lunar Lake and Arrow Lake are similar to the previous generation Meteor Lake. Both are hybrid platforms, with e-core and p-core. The key differences include: - The e-core supports 3 new fixed counters - The p-core supports an updated PEBS Data Source format - More GP counters (Updated event constraint table) - New Architectural performance monitoring V6 (New Perfmon MSRs aliasing, umask2, eq). - New PEBS format V6 (Counters Snapshotting group) - New RDPMC metrics clear mode The legacy features, the 3 new fixed counters and updated event constraint table are enabled in this patch. The new PEBS data source format, the architectural performance monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode are supported in the following patches. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-4-kan.liang@linux.intel.com
2024-07-04perf/x86: Support counter maskKan Liang
The current perf assumes that both GP and fixed counters are contiguous. But it's not guaranteed on newer Intel platforms or in a virtualization environment. Use the counter mask to replace the number of counters for both GP and the fixed counters. For the other ARCHs or old platforms which don't support a counter mask, using GENMASK_ULL(num_counter - 1, 0) to replace. There is no functional change for them. The interface to KVM is not changed. The number of counters still be passed to KVM. It can be updated later separately. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-3-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Support the PEBS event maskKan Liang
The current perf assumes that the counters that support PEBS are contiguous. But it's not guaranteed with the new leaf 0x23 introduced. The counters are enumerated with a counter mask. There may be holes in the counter mask for future platforms or in a virtualization environment. Store the PEBS event mask rather than the maximum number of PEBS counters in the x86 PMU structures. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-2-kan.liang@linux.intel.com
2024-07-04perf/x86/intel/cstate: Add Lunarlake supportZhang Rui
Compared with previous client platforms, PC8 is removed from Lunarlake. It supports CC1/CC6/CC7 and PC2/PC3/PC6/PC10 residency counters. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20240628031758.43103-4-rui.zhang@intel.com