summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2024-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov
Cross-merge bpf fixes after downstream PR including important fixes (from bpf-next point of view): commit 41c24102af7b ("selftests/bpf: Filter out _GNU_SOURCE when compiling test_cpp") commit fdad456cbcca ("bpf: Fix updating attached freplace prog in prog_array map") No conflicts. Adjacent changes in: include/linux/bpf_verifier.h kernel/bpf/verifier.c tools/testing/selftests/bpf/Makefile Link: https://lore.kernel.org/bpf/20240813234307.82773-1-alexei.starovoitov@gmail.com/ Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-08-14KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)Sean Christopherson
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't directly emulate instructions for ES/SNP, and instead the guest must explicitly request emulation. Unless the guest explicitly requests emulation without accessing memory, ES/SNP relies on KVM creating an MMIO SPTE, with the subsequent #NPF being reflected into the guest as a #VC. But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs, because except for ES/SNP, doing so requires setting reserved bits in the SPTE, i.e. the SPTE can't be readable while also generating a #VC on writes. Because KVM never creates MMIO SPTEs and jumps directly to emulation, the guest never gets a #VC. And since KVM simply resumes the guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU into an infinite #NPF loop if the vCPU attempts to write read-only memory. Disallow read-only memory for all VMs with protected state, i.e. for upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible to support read-only memory, as TDX uses EPT Violation #VE to reflect the fault into the guest, e.g. KVM could configure read-only SPTEs with RX protections and SUPPRESS_VE=0. But there is no strong use case for supporting read-only memslots on TDX, e.g. the main historical usage is to emulate option ROMs, but TDX disallows executing from shared memory. And if someone comes along with a legitimate, strong use case, the restriction can always be lifted for TDX. Don't bother trying to retroactively apply the restriction to SEV-ES VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't possibly work for SEV-ES, i.e. disallowing such memslots is really just means reporting an error to userspace instead of silently hanging vCPUs. Trying to deal with the ordering between KVM_SEV_INIT and memslot creation isn't worth the marginal benefit it would provide userspace. Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES") Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support") Cc: Peter Gonda <pgonda@google.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Ackerly Tng <ackerleytng@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240809190319.1710470-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: x86: Make x2APIC ID 100% readonlySean Christopherson
Ignore the userspace provided x2APIC ID when fixing up APIC state for KVM_SET_LAPIC, i.e. make the x2APIC fully readonly in KVM. Commit a92e2543d6a8 ("KVM: x86: use hardware-compatible format for APIC ID register"), which added the fixup, didn't intend to allow userspace to modify the x2APIC ID. In fact, that commit is when KVM first started treating the x2APIC ID as readonly, apparently to fix some race: static inline u32 kvm_apic_id(struct kvm_lapic *apic) { - return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff; + /* To avoid a race between apic_base and following APIC_ID update when + * switching to x2apic_mode, the x2apic mode returns initial x2apic id. + */ + if (apic_x2apic_mode(apic)) + return apic->vcpu->vcpu_id; + + return kvm_lapic_get_reg(apic, APIC_ID) >> 24; } Furthermore, KVM doesn't support delivering interrupts to vCPUs with a modified x2APIC ID, but KVM *does* return the modified value on a guest RDMSR and for KVM_GET_LAPIC. I.e. no remotely sane setup can actually work with a modified x2APIC ID. Making the x2APIC ID fully readonly fixes a WARN in KVM's optimized map calculation, which expects the LDR to align with the x2APIC ID. WARNING: CPU: 2 PID: 958 at arch/x86/kvm/lapic.c:331 kvm_recalculate_apic_map+0x609/0xa00 [kvm] CPU: 2 PID: 958 Comm: recalc_apic_map Not tainted 6.4.0-rc3-vanilla+ #35 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014 RIP: 0010:kvm_recalculate_apic_map+0x609/0xa00 [kvm] Call Trace: <TASK> kvm_apic_set_state+0x1cf/0x5b0 [kvm] kvm_arch_vcpu_ioctl+0x1806/0x2100 [kvm] kvm_vcpu_ioctl+0x663/0x8a0 [kvm] __x64_sys_ioctl+0xb8/0xf0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fade8b9dd6f Unfortunately, the WARN can still trigger for other CPUs than the current one by racing against KVM_SET_LAPIC, so remove it completely. Reported-by: Michal Luczaj <mhal@rbox.co> Closes: https://lore.kernel.org/all/814baa0c-1eaa-4503-129f-059917365e80@rbox.co Reported-by: Haoyu Wu <haoyuwu254@gmail.com> Closes: https://lore.kernel.org/all/20240126161633.62529-1-haoyuwu254@gmail.com Reported-by: syzbot+545f1326f405db4e1c3e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000c2a6b9061cbca3c3@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240802202941.344889-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())Isaku Yamahata
Use this_cpu_ptr() instead of open coding the equivalent in various user return MSR helpers. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Message-ID: <20240802201630.339306-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()Yue Haibing
There is no caller in tree since introduction in commit b4f69df0f65e ("KVM: x86: Make Hyper-V emulation optional") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Message-ID: <20240803113233.128185-1-yuehaibing@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: SVM: Fix an error code in sev_gmem_post_populate()Dan Carpenter
The copy_from_user() function returns the number of bytes which it was not able to copy. Return -EFAULT instead. Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Message-ID: <20240612115040.2423290-4-dan.carpenter@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-13KVM: SVM: Fix uninitialized variable bugDan Carpenter
If snp_lookup_rmpentry() fails then "assigned" is printed in the error message but it was never initialized. Initialize it to false. Fixes: dee5a47cc7a4 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Message-ID: <20240612115040.2423290-3-dan.carpenter@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-08x86/mtrr: Check if fixed MTRRs exist before saving themAndi Kleen
MTRRs have an obsolete fixed variant for fine grained caching control of the 640K-1MB region that uses separate MSRs. This fixed variant has a separate capability bit in the MTRR capability MSR. So far all x86 CPUs which support MTRR have this separate bit set, so it went unnoticed that mtrr_save_state() does not check the capability bit before accessing the fixed MTRR MSRs. Though on a CPU that does not support the fixed MTRR capability this results in a #GP. The #GP itself is harmless because the RDMSR fault is handled gracefully, but results in a WARN_ON(). Add the missing capability check to prevent this. Fixes: 2b1f6278d77c ("[PATCH] x86: Save the MTRRs of the BSP before booting an AP") Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240808000244.946864-1-ak@linux.intel.com
2024-08-07x86/paravirt: Fix incorrect virt spinlock setting on bare metalChen Yu
The kernel can change spinlock behavior when running as a guest. But this guest-friendly behavior causes performance problems on bare metal. The kernel uses a static key to switch between the two modes. In theory, the static key is enabled by default (run in guest mode) and should be disabled for bare metal (and in some guests that want native behavior or paravirt spinlock). A performance drop is reported when running encode/decode workload and BenchSEE cache sub-workload. Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS is disabled the virt_spin_lock_key is incorrectly set to true on bare metal. The qspinlock degenerates to test-and-set spinlock, which decreases the performance on bare metal. Set the default value of virt_spin_lock_key to false. If booting in a VM, enable this key. Later during the VM initialization, if other high-efficient spinlock is preferred (e.g. paravirt-spinlock), or the user wants the native qspinlock (via nopvspin boot commandline), the virt_spin_lock_key is disabled accordingly. This results in the following decision matrix: X86_FEATURE_HYPERVISOR Y Y Y N CONFIG_PARAVIRT_SPINLOCKS Y Y N Y/N PV spinlock Y N N Y/N virt_spin_lock_key N Y/N Y N Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function warning") Reported-by: Prem Nath Dey <prem.nath.dey@intel.com> Reported-by: Xiaoping Zhou <xiaoping.zhou@intel.com> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Suggested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Suggested-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240806112207.29792-1-yu.c.chen@intel.com
2024-08-07x86/acpi: Remove __ro_after_init from acpi_mp_wake_mailboxZhiquan Li
On a platform using the "Multiprocessor Wakeup Structure"[1] to startup secondary CPUs the control processor needs to memremap() the physical address of the MP Wakeup Structure mailbox to the variable acpi_mp_wake_mailbox, which holds the virtual address of mailbox. To wake up the AP the control processor writes the APIC ID of AP, the wakeup vector and the ACPI_MP_WAKE_COMMAND_WAKEUP command into the mailbox. Current implementation doesn't consider the case which restricts boot time CPU bringup to 1 with the kernel parameter "maxcpus=1" and brings other CPUs online later from user space as it sets acpi_mp_wake_mailbox to read-only after init. So when the first AP is tried to brought online after init, the attempt to update the variable results in a kernel panic. The memremap() call that initializes the variable cannot be moved into acpi_parse_mp_wake() because memremap() is not functional at that point in the boot process. Also as the APs might never be brought up, keep the memremap() call in acpi_wakeup_cpu() so that the operation only takes place when needed. Fixes: 24dd05da8c79 ("x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init") Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/all/20240805103531.1230635-1-zhiquan1.li@intel.com
2024-08-07x86/mm: Fix PTI for i386 some moreThomas Gleixner
So it turns out that we have to do two passes of pti_clone_entry_text(), once before initcalls, such that device and late initcalls can use user-mode-helper / modprobe and once after free_initmem() / mark_readonly(). Now obviously mark_readonly() can cause PMD splits, and pti_clone_pgtable() doesn't like that much. Allow the late clone to split PMDs so that pagetables stay in sync. [peterz: Changelog and comments] Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lkml.kernel.org/r/20240806184843.GX37996@noisy.programming.kicks-ass.net
2024-08-04Merge tag 'x86-urgent-2024-08-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver. A recent change in the ACPI code which consolidated code pathes moved the invocation of init_freq_invariance_cppc() to be moved to a CPU hotplug handler. The first invocation on AMD CPUs ends up enabling a static branch which dead locks because the static branch enable tries to acquire cpu_hotplug_lock but that lock is already held write by the hotplug machinery. Use static_branch_enable_cpuslocked() instead and take the hotplug lock read for the Intel code path which is invoked from the architecture code outside of the CPU hotplug operations. - Fix the number of reserved bits in the sev_config structure bit field so that the bitfield does not exceed 64 bit. - Add missing Zen5 model numbers - Fix the alignment assumptions of pti_clone_pgtable() and clone_entry_text() on 32-bit: The code assumes PMD aligned code sections, but on 32-bit the kernel entry text is not PMD aligned. So depending on the code size and location, which is configuration and compiler dependent, entry text can cross a PMD boundary. As the start is not PMD aligned adding PMD size to the start address is larger than the end address which results in partially mapped entry code for user space. That causes endless recursion on the first entry from userspace (usually #PF). Cure this by aligning the start address in the addition so it ends up at the next PMD start address. clone_entry_text() enforces PMD mapping, but on 32-bit the tail might eventually be PTE mapped, which causes a map fail because the PMD for the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for the clone() invocation which resolves to PTE on 32-bit and PMD on 64-bit. - Zero the 8-byte case for get_user() on range check failure on 32-bit The recend consolidation of the 8-byte get_user() case broke the zeroing in the failure case again. Establish it by clearing ECX before the range check and not afterwards as that obvioulsy can't be reached when the range check fails * tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit x86/mm: Fix pti_clone_entry_text() for i386 x86/mm: Fix pti_clone_pgtable() alignment assumption x86/setup: Parse the builtin command line before merging x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range x86/sev: Fix __reserved field in sev_config x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
2024-08-04Merge tag 'perf-urgent-2024-08-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf fixes from Thomas Gleixner: - Move the smp_processor_id() invocation back into the non-preemtible region, so that the result is valid to use - Add the missing package C2 residency counters for Sierra Forest CPUs to make the newly added support actually useful * tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix smp_processor_id()-in-preemptible warnings perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
2024-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "The bulk of the changes here is a largish change to guest_memfd, delaying the clearing and encryption of guest-private pages until they are actually added to guest page tables. This started as "let's make it impossible to misuse the API" for SEV-SNP; but then it ballooned a bit. The new logic is generally simpler and more ready for hugepage support in guest_memfd. Summary: - fix latent bug in how usage of large pages is determined for confidential VMs - fix "underline too short" in docs - eliminate log spam from limited APIC timer periods - disallow pre-faulting of memory before SEV-SNP VMs are initialized - delay clearing and encrypting private memory until it is added to guest page tables - this change also enables another small cleanup: the checks in SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can now be moved in the common kvm_gmem_populate() function - fix compilation error that the RISC-V merge introduced in selftests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: fix determination of max NPT mapping level for private pages KVM: riscv: selftests: Fix compile error KVM: guest_memfd: abstract how prepared folios are recorded KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns KVM: extend kvm_range_has_memory_attributes() to check subset of attributes KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes() KVM: guest_memfd: move check for already-populated page to common code KVM: remove kvm_arch_gmem_prepare_needed() KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_* KVM: guest_memfd: do not go through struct page KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation KVM: guest_memfd: return folio from __kvm_gmem_get_pfn() KVM: x86: disallow pre-fault for SNP VMs before initialization KVM: Documentation: Fix title underline too short warning KVM: x86: Eliminate log spam from limited APIC timer periods
2024-08-02Merge branch 'kvm-fixes' into HEADPaolo Bonzini
* fix latent bug in how usage of large pages is determined for confidential VMs * fix "underline too short" in docs * eliminate log spam from limited APIC timer periods * disallow pre-faulting of memory before SEV-SNP VMs are initialized * delay clearing and encrypting private memory until it is added to guest page tables * this change also enables another small cleanup: the checks in SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can now be moved in the common kvm_gmem_populate() function
2024-08-02uretprobe: change syscall number, againArnd Bergmann
Despite multiple attempts to get the syscall number assignment right for the newly added uretprobe syscall, we ended up with a bit of a mess: - The number is defined as 467 based on the assumption that the xattrat family of syscalls would use 463 through 466, but those did not make it into 6.11. - The include/uapi/asm-generic/unistd.h file still lists the number 463, but the new scripts/syscall.tbl that was supposed to have the same data lists 467 instead as the number for arc, arm64, csky, hexagon, loongarch, nios2, openrisc and riscv. None of these architectures actually provide a uretprobe syscall. - All the other architectures (powerpc, arm, mips, ...) don't list this syscall at all. There are two ways to make it consistent again: either list it with the same syscall number on all architectures, or only list it on x86 but not in scripts/syscall.tbl and asm-generic/unistd.h. Based on the most recent discussion, it seems like we won't need it anywhere else, so just remove the inconsistent assignment and instead move the x86 number to the next available one in the architecture specific range, which is 335. Fixes: 5c28424e9a34 ("syscalls: Fix to add sys_uretprobe to syscall.tbl") Fixes: 190fec72df4a ("uprobe: Wire up uretprobe system call") Fixes: 63ded110979b ("uprobe: Change uretprobe syscall scope and number") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-01x86/uaccess: Zero the 8-byte get_range case on failure on 32-bitDavid Gow
While zeroing the upper 32 bits of an 8-byte getuser on 32-bit x86 was fixed by commit 8c860ed825cb ("x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking") it was broken again in commit 8a2462df1547 ("x86/uaccess: Improve the 8-byte getuser() case"). This is because the register which holds the upper 32 bits (%ecx) is being cleared _after_ the check_range, so if the range check fails, %ecx is never cleared. This can be reproduced with: ./tools/testing/kunit/kunit.py run --arch i386 usercopy Instead, clear %ecx _before_ check_range in the 8-byte case. This reintroduces a bit of the ugliness we were trying to avoid by adding another #ifndef CONFIG_X86_64, but at least keeps check_range from needing a separate bad_get_user_8 jump. Fixes: 8a2462df1547 ("x86/uaccess: Improve the 8-byte getuser() case") Signed-off-by: David Gow <davidgow@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/20240731073031.4045579-1-davidgow@google.com
2024-08-01KVM: x86/mmu: fix determination of max NPT mapping level for private pagesAckerley Tng
The `if (req_max_level)` test was meant ignore req_max_level if PG_LEVEL_NONE was returned. Hence, this function should return max_level instead of the ignored req_max_level. This is only a latent issue for now, since guest_memfd does not support large pages. Signed-off-by: Ackerley Tng <ackerleytng@google.com> Message-ID: <20240801173955.1975034-1-ackerleytng@google.com> Fixes: f32fb32820b1 ("KVM: x86: Add hook for determining max NPT mapping level") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-01x86/mm: Fix pti_clone_entry_text() for i386Peter Zijlstra
While x86_64 has PMD aligned text sections, i386 does not have this luxery. Notably ALIGN_ENTRY_TEXT_END is empty and _etext has PAGE alignment. This means that text on i386 can be page granular at the tail end, which in turn means that the PTI text clones should consistently account for this. Make pti_clone_entry_text() consistent with pti_clone_kernel_text(). Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2024-08-01x86/mm: Fix pti_clone_pgtable() alignment assumptionPeter Zijlstra
Guenter reported dodgy crashes on an i386-nosmp build using GCC-11 that had the form of endless traps until entry stack exhaust and then #DF from the stack guard. It turned out that pti_clone_pgtable() had alignment assumptions on the start address, notably it hard assumes start is PMD aligned. This is true on x86_64, but very much not true on i386. These assumptions can cause the end condition to malfunction, leading to a 'short' clone. Guess what happens when the user mapping has a short copy of the entry text? Use the correct increment form for addr to avoid alignment assumptions. Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit") Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240731163105.GG33588@noisy.programming.kicks-ass.net
2024-07-31x86/setup: Parse the builtin command line before mergingBorislav Petkov (AMD)
Commit in Fixes was added as a catch-all for cases where the cmdline is parsed before being merged with the builtin one. And promptly one issue appeared, see Link below. The microcode loader really needs to parse it that early, but the merging happens later. Reshuffling the early boot nightmare^W code to handle that properly would be a painful exercise for another day so do the chicken thing and parse the builtin cmdline too before it has been merged. Fixes: 0c40b1c7a897 ("x86/setup: Warn when option parsing is done too early") Reported-by: Mike Lothian <mike@fireburn.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20240730152108.GAZqkE5Dfi9AuKllRw@fat_crate.local Link: https://lore.kernel.org/r/20240722152330.GCZp55ck8E_FT4kPnC@fat_crate.local
2024-07-31perf/x86: Fix smp_processor_id()-in-preemptible warningsLi Huafei
The following bug was triggered on a system built with CONFIG_DEBUG_PREEMPT=y: # echo p > /proc/sysrq-trigger BUG: using smp_processor_id() in preemptible [00000000] code: sh/117 caller is perf_event_print_debug+0x1a/0x4c0 CPU: 3 UID: 0 PID: 117 Comm: sh Not tainted 6.11.0-rc1 #109 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4f/0x60 check_preemption_disabled+0xc8/0xd0 perf_event_print_debug+0x1a/0x4c0 __handle_sysrq+0x140/0x180 write_sysrq_trigger+0x61/0x70 proc_reg_write+0x4e/0x70 vfs_write+0xd0/0x430 ? handle_mm_fault+0xc8/0x240 ksys_write+0x9c/0xd0 do_syscall_64+0x96/0x190 entry_SYSCALL_64_after_hwframe+0x4b/0x53 This is because the commit d4b294bf84db ("perf/x86: Hybrid PMU support for counters") took smp_processor_id() outside the irq critical section. If a preemption occurs in perf_event_print_debug() and the task is migrated to another cpu, we may get incorrect pmu debug information. Move smp_processor_id() back inside the irq critical section to fix this issue. Fixes: d4b294bf84db ("perf/x86: Hybrid PMU support for counters") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20240729220928.325449-1-lihuafei1@huawei.com
2024-07-30x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 rangePerry Yuan
Add some new Zen5 models for the 0x1A family. [ bp: Merge the 0x60 and 0x70 ranges. ] Signed-off-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240729064626.24297-1-bp@kernel.org
2024-07-30x86/sev: Fix __reserved field in sev_configPavan Kumar Paluri
sev_config currently has debug, ghcbs_initialized, and use_cas fields. However, __reserved count has not been updated. Fix this. Fixes: 34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas") Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240729180808.366587-1-papaluri@amd.com
2024-07-29bpf, x64: Fix tailcall hierarchyLeon Hwang
This patch fixes a tailcall issue caused by abusing the tailcall in bpf2bpf feature. As we know, tail_call_cnt propagates by rax from caller to callee when to call subprog in tailcall context. But, like the following example, MAX_TAIL_CALL_CNT won't work because of missing tail_call_cnt back-propagation from callee to caller. \#include <linux/bpf.h> \#include <bpf/bpf_helpers.h> \#include "bpf_legacy.h" struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail1(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } static __noinline int subprog_tail2(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("tc") int entry(struct __sk_buff *skb) { volatile int ret = 1; count++; subprog_tail1(skb); subprog_tail2(skb); return ret; } char __license[] SEC("license") = "GPL"; At run time, the tail_call_cnt in entry() will be propagated to subprog_tail1() and subprog_tail2(). But, when the tail_call_cnt in subprog_tail1() updates when bpf_tail_call_static(), the tail_call_cnt in entry() won't be updated at the same time. As a result, in entry(), when tail_call_cnt in entry() is less than MAX_TAIL_CALL_CNT and subprog_tail1() returns because of MAX_TAIL_CALL_CNT limit, bpf_tail_call_static() in suprog_tail2() is able to run because the tail_call_cnt in subprog_tail2() propagated from entry() is less than MAX_TAIL_CALL_CNT. So, how many tailcalls are there for this case if no error happens? From top-down view, does it look like hierarchy layer and layer? With this view, there will be 2+4+8+...+2^33 = 2^34 - 2 = 17,179,869,182 tailcalls for this case. How about there are N subprog_tail() in entry()? There will be almost N^34 tailcalls. Then, in this patch, it resolves this case on x86_64. In stead of propagating tail_call_cnt from caller to callee, it propagates its pointer, tail_call_cnt_ptr, tcc_ptr for short. However, where does it store tail_call_cnt? It stores tail_call_cnt on the stack of main prog. When tail call happens in subprog, it increments tail_call_cnt by tcc_ptr. Meanwhile, it stores tail_call_cnt_ptr on the stack of main prog, too. And, before jump to tail callee, it has to pop tail_call_cnt and tail_call_cnt_ptr. Then, at the prologue of subprog, it must not make rax as tail_call_cnt_ptr again. It has to reuse tail_call_cnt_ptr from caller. As a result, at run time, it has to recognize rax is tail_call_cnt or tail_call_cnt_ptr at prologue by: 1. rax is tail_call_cnt if rax is <= MAX_TAIL_CALL_CNT. 2. rax is tail_call_cnt_ptr if rax is > MAX_TAIL_CALL_CNT, because a pointer won't be <= MAX_TAIL_CALL_CNT. Here's an example to dump JITed. struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 1); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); int count = 0; static __noinline int subprog_tail(struct __sk_buff *skb) { bpf_tail_call_static(skb, &jmp_table, 0); return 0; } SEC("tc") int entry(struct __sk_buff *skb) { int ret = 1; count++; subprog_tail(skb); subprog_tail(skb); return ret; } When bpftool p d j id 42: int entry(struct __sk_buff * skb): bpf_prog_0c0f4c2413ef19b1_entry: ; int entry(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax,%rax) 9: xorq %rax, %rax ;; rax = 0 (tail_call_cnt) c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: cmpq $33, %rax ;; if rax > 33, rax = tcc_ptr 18: ja 0x20 ;; if rax > 33 goto 0x20 ---+ 1a: pushq %rax ;; [rbp - 8] = rax = 0 | 1b: movq %rsp, %rax ;; rax = rbp - 8 | 1e: jmp 0x21 ;; ---------+ | 20: pushq %rax ;; <--------|---------------+ 21: pushq %rax ;; <--------+ [rbp - 16] = rax 22: pushq %rbx ;; callee saved 23: movq %rdi, %rbx ;; rbx = skb (callee saved) ; count++; 26: movabsq $-82417199407104, %rdi 30: movl (%rdi), %esi 33: addl $1, %esi 36: movl %esi, (%rdi) ; subprog_tail(skb); 39: movq %rbx, %rdi ;; rdi = skb 3c: movq -16(%rbp), %rax ;; rax = tcc_ptr 43: callq 0x80 ;; call subprog_tail() ; subprog_tail(skb); 48: movq %rbx, %rdi ;; rdi = skb 4b: movq -16(%rbp), %rax ;; rax = tcc_ptr 52: callq 0x80 ;; call subprog_tail() ; return ret; 57: movl $1, %eax 5c: popq %rbx 5d: leave 5e: retq int subprog_tail(struct __sk_buff * skb): bpf_prog_3a140cef239a4b4f_subprog_tail: ; int subprog_tail(struct __sk_buff *skb) 0: endbr64 4: nopl (%rax,%rax) 9: nopl (%rax) ;; do not touch tail_call_cnt c: pushq %rbp d: movq %rsp, %rbp 10: endbr64 14: pushq %rax ;; [rbp - 8] = rax (tcc_ptr) 15: pushq %rax ;; [rbp - 16] = rax (tcc_ptr) 16: pushq %rbx ;; callee saved 17: pushq %r13 ;; callee saved 19: movq %rdi, %rbx ;; rbx = skb ; asm volatile("r1 = %[ctx]\n\t" 1c: movabsq $-105487587488768, %r13 ;; r13 = jmp_table 26: movq %rbx, %rdi ;; 1st arg, skb 29: movq %r13, %rsi ;; 2nd arg, jmp_table 2c: xorl %edx, %edx ;; 3rd arg, index = 0 2e: movq -16(%rbp), %rax ;; rax = [rbp - 16] (tcc_ptr) 35: cmpq $33, (%rax) 39: jae 0x4e ;; if *tcc_ptr >= 33 goto 0x4e --------+ 3b: jmp 0x4e ;; jmp bypass, toggled by poking | 40: addq $1, (%rax) ;; (*tcc_ptr)++ | 44: popq %r13 ;; callee saved | 46: popq %rbx ;; callee saved | 47: popq %rax ;; undo rbp-16 push | 48: popq %rax ;; undo rbp-8 push | 49: nopl (%rax,%rax) ;; tail call target, toggled by poking | ; return 0; ;; | 4e: popq %r13 ;; restore callee saved <--------------+ 50: popq %rbx ;; restore callee saved 51: leave 52: retq Furthermore, when trampoline is the caller of bpf prog, which is tail_call_reachable, it is required to propagate rax through trampoline. Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901c0b77 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Leon Hwang <hffilwlqm@gmail.com> Link: https://lore.kernel.org/r/20240714123902.32305-2-hffilwlqm@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-07-29x86/aperfmperf: Fix deadlock on cpu_hotplug_lockJonathan Cameron
The broken patch results in a call to init_freq_invariance_cppc() in a CPU hotplug handler in both the path for initially present CPUs and those hotplugged later. That function includes a one time call to amd_set_max_freq_ratio() which in turn calls freq_invariance_enable() that has a static_branch_enable() which takes the cpu_hotlug_lock which is already held. Avoid the deadlock by using static_branch_enable_cpuslocked() as the lock will always be already held. The equivalent path on Intel does not already hold this lock, so take it around the call to freq_invariance_enable(), which results in it being held over the call to register_syscall_ops, which looks to be safe to do. Fixes: c1385c1f0ba3 ("ACPI: processor: Simplify initial onlining to use same path for cold and hotplug") Closes: https://lore.kernel.org/all/CABXGCsPvqBfL5hQDOARwfqasLRJ_eNPBbCngZ257HOe=xbWDkA@mail.gmail.com/ Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240729105504.2170-1-Jonathan.Cameron@huawei.com
2024-07-29perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra ForestZhenyu Wang
Package C2 residency counter is also available on Sierra Forest. So add it support in srf_cstates. Fixes: 3877d55a0db2 ("perf/x86/intel/cstate: Add Sierra Forest support") Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Wendy Wang <wendy.wang@intel.com> Link: https://lore.kernel.org/r/20240717031609.74513-1-zhenyuw@linux.intel.com
2024-07-28minmax: add a few more MIN_T/MAX_T usersLinus Torvalds
Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-27Merge tag 'for-linus-6.11-rc1a-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two fixes for issues introduced in this merge window: - fix enhanced debugging in the Xen multicall handling - two patches fixing a boot failure when running as dom0 in PVH mode" * tag 'for-linus-6.11-rc1a-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix memblock_reserve() usage on PVH x86/xen: move xen_reserve_extra_memory() xen: fix multicall debug data referencing
2024-07-26minmax: avoid overly complex min()/max() macro arguments in xenLinus Torvalds
We have some very fancy min/max macros that have tons of sanity checking to warn about mixed signedness etc. This is all things that a sane compiler should warn about, but there are no sane compiler interfaces for this, and '-Wsign-compare' is broken [1] and not useful. So then we compensate (some would say over-compensate) by doing the checks manually with some truly horrid macro games. And no, we can't just use __builtin_types_compatible_p(), because the whole question of "does it make sense to compare these two values" is a lot more complicated than that. For example, it makes a ton of sense to compare unsigned values with simple constants like "5", even if that is indeed a signed type. So we have these very strange macros to try to make sensible type checking decisions on the arguments to 'min()' and 'max()'. But that can cause enormous code expansion if the min()/max() macros are used with complicated expressions, and particularly if you nest these things so that you get the first big expansion then expanded again. The xen setup.c file ended up ballooning to over 50MB of preprocessed noise that takes 15s to compile (obviously depending on the build host), largely due to one single line. So let's split that one single line to just be simpler. I think it ends up being more legible to humans too at the same time. Now that single file compiles in under a second. Reported-and-reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/all/c83c17bb-be75-4c67-979d-54eee38774c6@lucifer.local/ Link: https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/ [1] Cc: David Laight <David.Laight@aculab.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-26KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfnsPaolo Bonzini
This check is currently performed by sev_gmem_post_populate(), but it applies to all callers of kvm_gmem_populate(): the point of the function is that the memory is being encrypted and some work has to be done on all the gfns in order to encrypt them. Therefore, check the KVM_MEMORY_ATTRIBUTE_PRIVATE attribute prior to invoking the callback, and stop the operation if a shared page is encountered. Because CONFIG_KVM_PRIVATE_MEM in principle does not require attributes, this makes kvm_gmem_populate() depend on CONFIG_KVM_GENERIC_PRIVATE_MEM (which does require them). Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26KVM: extend kvm_range_has_memory_attributes() to check subset of attributesPaolo Bonzini
While currently there is no other attribute than KVM_MEMORY_ATTRIBUTE_PRIVATE, KVM code such as kvm_mem_is_private() is written to expect their existence. Allow using kvm_range_has_memory_attributes() as a multi-page version of kvm_mem_is_private(), without it breaking later when more attributes are introduced. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26KVM: guest_memfd: move check for already-populated page to common codePaolo Bonzini
Do not allow populating the same page twice with startup data. In the case of SEV-SNP, for example, the firmware does not allow it anyway, since the launch-update operation is only possible on pages that are still shared in the RMP. Even if it worked, kvm_gmem_populate()'s callback is meant to have side effects such as updating launch measurements, and updating the same page twice is unlikely to have the desired results. Races between calls to the ioctl are not possible because kvm_gmem_populate() holds slots_lock and the VM should not be running. But again, even if this worked on other confidential computing technology, it doesn't matter to guest_memfd.c whether this is something fishy such as missing synchronization in userspace, or rather something intentional. One of the racers wins, and the page is initialized by either kvm_gmem_prepare_folio() or kvm_gmem_populate(). Anyway, out of paranoia, adjust sev_gmem_post_populate() anyway to use the same errno that kvm_gmem_populate() is using. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26KVM: remove kvm_arch_gmem_prepare_needed()Paolo Bonzini
It is enough to return 0 if a guest need not do any preparation. This is in fact how sev_gmem_prepare() works for non-SNP guests, and it extends naturally to Intel hosts: the x86 callback for gmem_prepare is optional and returns 0 if not defined. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*Paolo Bonzini
Add "ARCH" to the symbols; shortly, the "prepare" phase will include both the arch-independent step to clear out contents left in the page by the host, and the arch-dependent step enabled by CONFIG_HAVE_KVM_GMEM_PREPARE. For consistency do the same for CONFIG_HAVE_KVM_GMEM_INVALIDATE as well. Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26KVM: x86: disallow pre-fault for SNP VMs before initializationPaolo Bonzini
KVM_PRE_FAULT_MEMORY for an SNP guest can race with sev_gmem_post_populate() in bad ways. The following sequence for instance can potentially trigger an RMP fault: thread A, sev_gmem_post_populate: called thread B, sev_gmem_prepare: places below 'pfn' in a private state in RMP thread A, sev_gmem_post_populate: *vaddr = kmap_local_pfn(pfn + i); thread A, sev_gmem_post_populate: copy_from_user(vaddr, src + i * PAGE_SIZE, PAGE_SIZE); RMP #PF Fix this by only allowing KVM_PRE_FAULT_MEMORY to run after a guest's initial private memory contents have been finalized via KVM_SEV_SNP_LAUNCH_FINISH. Beyond fixing this issue, it just sort of makes sense to enforce this, since the KVM_PRE_FAULT_MEMORY documentation states: "KVM maps memory as if the vCPU generated a stage-2 read page fault" which sort of implies we should be acting on the same guest state that a vCPU would see post-launch after the initial guest memory is all set up. Co-developed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-26KVM: x86: Eliminate log spam from limited APIC timer periodsJim Mattson
SAP's vSMP MemoryONE continuously requests a local APIC timer period less than 500 us, resulting in the following kernel log spam: kvm: vcpu 15: requested 70240 ns lapic timer period limited to 500000 ns kvm: vcpu 19: requested 52848 ns lapic timer period limited to 500000 ns kvm: vcpu 15: requested 70256 ns lapic timer period limited to 500000 ns kvm: vcpu 9: requested 70256 ns lapic timer period limited to 500000 ns kvm: vcpu 9: requested 70208 ns lapic timer period limited to 500000 ns kvm: vcpu 9: requested 387520 ns lapic timer period limited to 500000 ns kvm: vcpu 9: requested 70160 ns lapic timer period limited to 500000 ns kvm: vcpu 66: requested 205744 ns lapic timer period limited to 500000 ns kvm: vcpu 9: requested 70224 ns lapic timer period limited to 500000 ns kvm: vcpu 9: requested 70256 ns lapic timer period limited to 500000 ns limit_periodic_timer_frequency: 7569 callbacks suppressed ... To eliminate this spam, change the pr_info_ratelimited() in limit_periodic_timer_frequency() to pr_info_once(). Reported-by: James Houghton <jthoughton@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Message-ID: <20240724190640.2449291-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-25Merge tag 'constfy-sysctl-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl constification from Joel Granados: "Treewide constification of the ctl_table argument of proc_handlers using a coccinelle script and some manual code formatting fixups. This is a prerequisite to moving the static ctl_table structs into read-only data section which will ensure that proc_handler function pointers cannot be modified" * tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: treewide: constify the ctl_table argument of proc_handlers
2024-07-25Merge tag 'uml-for-linus-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits) um: vector: always reset vp->opened um: vector: remove vp->lock um: register power-off handler um: line: always fill *error_out in setup_one_line() um: remove pcap driver from documentation um: Enable preemption in UML um: refactor TLB update handling um: simplify and consolidate TLB updates um: remove force_flush_all from fork_handler um: Do not flush MM in flush_thread um: Delay flushing syscalls until the thread is restarted um: remove copy_context_skas0 um: remove LDT support um: compress memory related stub syscalls while adding them um: Rework syscall handling um: Add generic stub_syscall6 function um: Create signal stack memory assignment in stub_data um: Remove stub-data.h include from common-offsets.h um: time-travel: fix signal blocking race/hang um: time-travel: remove time_exit() ...
2024-07-25x86/xen: fix memblock_reserve() usage on PVHRoger Pau Monne
The current usage of memblock_reserve() in init_pvh_bootparams() is done before the .bss is zeroed, and that used to be fine when memblock_reserved_init_regions implicitly ended up in the .meminit.data section. However after commit 73db3abdca58c memblock_reserved_init_regions ends up in the .bss section, thus breaking it's usage before the .bss is cleared. Move and rename the call to xen_reserve_extra_memory() so it's done in the x86_init.oem.arch_setup hook, which gets executed after the .bss has been zeroed, but before calling e820__memory_setup(). Fixes: 73db3abdca58c ("init/modpost: conditionally check section mismatch to __meminit*") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240725073116.14626-3-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-25x86/xen: move xen_reserve_extra_memory()Roger Pau Monne
In preparation for making the function static. No functional change. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Message-ID: <20240725073116.14626-2-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-24sysctl: treewide: constify the ctl_table argument of proc_handlersJoel Granados
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-24Merge tag 'random-6.11-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This adds getrandom() support to the vDSO. First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced in a generic manner and hooked into random.c. Next, this is implemented on x86. (Also, though it's not ready for this pull, somebody has begun an arm64 implementation already) Finally, two vDSO selftests are added. There are also two housekeeping cleanup commits" * tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: MAINTAINERS: add random.h headers to RNG subsection random: note that RNDGETPOOL was removed in 2.6.9-rc2 selftests/vDSO: add tests for vgetrandom x86: vdso: Wire up getrandom() vDSO implementation random: introduce generic vDSO getrandom() implementation mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-23Merge tag 'kbuild-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_KALLSYMS and CONFIG_DEBUG_INFO_BTF - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig * tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits) kbuild: doc: gcc to CC change kallsyms: change sym_entry::percpu_absolute to bool type kallsyms: unify seq and start_pos fields of struct sym_entry kallsyms: add more original symbol type/name in comment lines kallsyms: use \t instead of a tab in printf() kallsyms: avoid repeated calculation of array size for markers kbuild: add script and target to generate pacman package modpost: use generic macros for hash table implementation kbuild: move some helper headers from scripts/kconfig/ to scripts/include/ Makefile: add comment to discourage tools/* addition for kernel builds kbuild: clean up scripts/remove-stale-files kconfig: recursive checks drop file/lineno kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec kallsyms: get rid of code for absolute kallsyms kbuild: Create INSTALL_PATH directory if it does not exist kbuild: Abort make on install failures kconfig: remove 'e1' and 'e2' macros from expression deduplication kconfig: remove SYMBOL_CHOICEVAL flag kconfig: add const qualifiers to several function arguments kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups() ...
2024-07-23xen: fix multicall debug data referencingJuergen Gross
The recent adding of multicall debug mixed up the referencing of the debug data. A __percpu tagged pointer can't be initialized with a plain pointer, so use another percpu variable for the pointer and set it on each new cpu via a function. Fixes: 942d917cb92a ("xen: make multicall debug boot time selectable") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407151106.5s7Mnfpz-lkp@intel.com/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-20kbuild: Abort make on install failuresZhang Bingwu
Setting '-e' flag tells shells to exit with error exit code immediately after any of commands fails, and causes make(1) to regard recipes as failed. Before this, make will still continue to succeed even after the installation failed, for example, for insufficient permission or directory does not exist. Signed-off-by: Zhang Bingwu <xtexchooser@duck.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-07-19x86: vdso: Wire up getrandom() vDSO implementationJason A. Donenfeld
Hook up the generic vDSO implementation to the x86 vDSO data page. Since the existing vDSO infrastructure is heavily based on the timekeeping functionality, which works over arrays of bases, a new macro is introduced for vvars that are not arrays. The vDSO function requires a ChaCha20 implementation that does not write to the stack, yet can still do an entire ChaCha20 permutation, so provide this using SSE2, since this is userland code that must work on all x86-64 processors. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Samuel Neves <sneves@dei.uc.pt> # for vgetrandom-chacha.S Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19Merge tag 'v6.11-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Test setkey in no-SIMD context - Add skcipher speed test for user-specified algorithm Algorithms: - Add x25519 support on ppc64le - Add VAES and AVX512 / AVX10 optimized AES-GCM on x86 - Remove sm2 algorithm Drivers: - Add Allwinner H616 support to sun8i-ce - Use DMA in stm32 - Add Exynos850 hwrng support to exynos" * tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (81 commits) hwrng: core - remove (un)register_miscdev() crypto: lib/mpi - delete unnecessary condition crypto: testmgr - generate power-of-2 lengths more often crypto: mxs-dcp - Ensure payload is zero when using key slot hwrng: Kconfig - Do not enable by default CN10K driver crypto: starfive - Fix nent assignment in rsa dec crypto: starfive - Align rsa input data to 32-bit crypto: qat - fix unintentional re-enabling of error interrupts crypto: qat - extend scope of lock in adf_cfg_add_key_value_param() Documentation: qat: fix auto_reset attribute details crypto: sun8i-ce - add Allwinner H616 support crypto: sun8i-ce - wrap accesses to descriptor address fields dt-bindings: crypto: sun8i-ce: Add compatible for H616 hwrng: core - Fix wrong quality calculation at hw rng registration hwrng: exynos - Enable Exynos850 support hwrng: exynos - Add SMC based TRNG operation hwrng: exynos - Implement bus clock control hwrng: exynos - Use devm_clk_get_enabled() to get the clock hwrng: exynos - Improve coding style dt-bindings: rng: Add Exynos850 support to exynos-trng ...