summaryrefslogtreecommitdiff
path: root/arch/riscv
AgeCommit message (Collapse)Author
2024-09-03riscv: Fix RISCV_ALTERNATIVE_EARLYAlexandre Ghiti
RISCV_ALTERNATIVE_EARLY will issue sbi_ecall() very early in the boot process, before the first memory mapping is setup so we can't have any instrumentation happening here. In addition, when the kernel is relocatable, we must also not issue any relocation this early since they would have been patched virtually only. So, instead of disabling instrumentation for the whole kernel/sbi.c file and compiling it with -fno-pie, simply move __sbi_ecall() and __sbi_base_ecall() into their own file where this is fixed. Reported-by: Conor Dooley <conor.dooley@microchip.com> Closes: https://lore.kernel.org/linux-riscv/20240813-pony-truck-3e7a83e9759e@spud/ Reported-by: syzbot+cfbcb82adf6d7279fd35@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-riscv/00000000000065062c061fcec37b@google.com/ Fixes: 1745cfafebdf ("riscv: don't use global static vars to store alternative data") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240829165048.49756-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-03riscv: Do not restrict memory size because of linear mapping on nommuAlexandre Ghiti
It makes no sense to restrict physical memory size because of linear mapping size constraints when there is no linear mapping, so only do that when mmu is enabled. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/linux-riscv/CAMuHMdW0bnJt5GMRtOZGkTiM7GK4UaLJCDMF_Ouq++fnDKi3_A@mail.gmail.com/ Fixes: 3b6564427aea ("riscv: Fix linear mapping checks for non-contiguous memory regions") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240827065230.145021-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-03riscv: Fix toolchain vector detectionAnton Blanchard
A recent change to gcc flags rv64iv as no longer valid: cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension and as a result vector support is disabled. Fix this by adding m to our toolchain vector detection code. Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Fixes: fa8e7cce55da ("riscv: Enable Vector code to be built") Link: https://lore.kernel.org/r/20240819001131.1738806-1-antonb@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-31riscv: misaligned: Restrict user access to kernel memorySamuel Holland
raw_copy_{to,from}_user() do not call access_ok(), so this code allowed userspace to access any virtual memory address. Cc: stable@vger.kernel.org Fixes: 7c83232161f6 ("riscv: add support for misaligned trap handling in S-mode") Fixes: 441381506ba7 ("riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240815005714.1163136-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29Merge patch series "riscv: mm: Do not restrict mmap address based on hint"Palmer Dabbelt
Charlie Jenkins <charlie@rivosinc.com> says: There have been a couple of reports that using the hint address to restrict the address returned by mmap hint address has caused issues in applications. A different solution for restricting addresses returned by mmap is necessary to avoid breakages. [Palmer: This also just wasn't doing the right thing in the first place, as it didn't handle the sv39 cases we were trying to deal with.] * b4-shazam-merge: riscv: mm: Do not restrict mmap address based on hint riscv: selftests: Remove mmap hint address checks Revert "RISC-V: mm: Document mmap changes" Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-0-cd8962afe47f@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-29riscv: mm: Do not restrict mmap address based on hintCharlie Jenkins
The hint address should not forcefully restrict the addresses returned by mmap as this causes mmap to report ENOMEM when there is memory still available. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Fixes: b5b4287accd7 ("riscv: mm: Use hint address in mmap if available") Fixes: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") Closes: https://lore.kernel.org/linux-kernel/ZbxTNjQPFKBatMq+@ghost/T/#mccb1890466bf5a488c9ce7441e57e42271895765 Link: https://lore.kernel.org/r/20240826-riscv_mmap-v1-3-cd8962afe47f@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-15Merge patch series "RISC-V: hwprobe: Misaligned scalar perf fix and rename"Palmer Dabbelt
Evan Green <evan@rivosinc.com> says: The CPUPERF0 hwprobe key was documented and identified in code as a bitmask value, but its contents were an enum. This produced incorrect behavior in conjunction with the WHICH_CPUS hwprobe flag. The first patch in this series fixes the bitmask/enum problem by creating a new hwprobe key that returns the same data, but is properly described as a value instead of a bitmask. The second patch renames the value definitions in preparation for adding vector misaligned access info. As of this version, the old defines are kept in place to maintain source compatibility with older userspace programs. * b4-shazam-merge: RISC-V: hwprobe: Add SCALAR to misaligned perf defines RISC-V: hwprobe: Add MISALIGNED_PERF key Link: https://lore.kernel.org/r/20240809214444.3257596-1-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-15riscv: Fix out-of-bounds when accessing Andes per hart vendor extension arrayAlexandre Ghiti
The out-of-bounds access is reported by UBSAN: [ 0.000000] UBSAN: array-index-out-of-bounds in ../arch/riscv/kernel/vendor_extensions.c:41:66 [ 0.000000] index -1 is out of range for type 'riscv_isavendorinfo [32]' [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.11.0-rc2ubuntu-defconfig #2 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] Call Trace: [ 0.000000] [<ffffffff94e078ba>] dump_backtrace+0x32/0x40 [ 0.000000] [<ffffffff95c83c1a>] show_stack+0x38/0x44 [ 0.000000] [<ffffffff95c94614>] dump_stack_lvl+0x70/0x9c [ 0.000000] [<ffffffff95c94658>] dump_stack+0x18/0x20 [ 0.000000] [<ffffffff95c8bbb2>] ubsan_epilogue+0x10/0x46 [ 0.000000] [<ffffffff95485a82>] __ubsan_handle_out_of_bounds+0x94/0x9c [ 0.000000] [<ffffffff94e09442>] __riscv_isa_vendor_extension_available+0x90/0x92 [ 0.000000] [<ffffffff94e043b6>] riscv_cpufeature_patch_func+0xc4/0x148 [ 0.000000] [<ffffffff94e035f8>] _apply_alternatives+0x42/0x50 [ 0.000000] [<ffffffff95e04196>] apply_boot_alternatives+0x3c/0x100 [ 0.000000] [<ffffffff95e05b52>] setup_arch+0x85a/0x8bc [ 0.000000] [<ffffffff95e00ca0>] start_kernel+0xa4/0xfb6 The dereferencing using cpu should actually not happen, so remove it. Fixes: 23c996fc2bc1 ("riscv: Extend cpufeature.c to detect vendor extensions") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240814192619.276794-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14RISC-V: hwprobe: Add SCALAR to misaligned perf definesEvan Green
In preparation for misaligned vector performance hwprobe keys, rename the hwprobe key values associated with misaligned scalar accesses to include the term SCALAR. Leave the old defines in place to maintain source compatibility. This change is intended to be a functional no-op. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240809214444.3257596-3-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14RISC-V: hwprobe: Add MISALIGNED_PERF keyEvan Green
RISCV_HWPROBE_KEY_CPUPERF_0 was mistakenly flagged as a bitmask in hwprobe_key_is_bitmask(), when in reality it was an enum value. This causes problems when used in conjunction with RISCV_HWPROBE_WHICH_CPUS, since SLOW, FAST, and EMULATED have values whose bits overlap with each other. If the caller asked for the set of CPUs that was SLOW or EMULATED, the returned set would also include CPUs that were FAST. Introduce a new hwprobe key, RISCV_HWPROBE_KEY_MISALIGNED_PERF, which returns the same values in response to a direct query (with no flags), but is properly handled as an enumerated value. As a result, SLOW, FAST, and EMULATED are all correctly treated as distinct values under the new key when queried with the WHICH_CPUS flag. Leave the old key in place to avoid disturbing applications which may have already come to rely on the key, with or without its broken behavior with respect to the WHICH_CPUS flag. Fixes: e178bf146e4b ("RISC-V: hwprobe: Introduce which-cpus flag") Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240809214444.3257596-2-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODEHaibo Xu
Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE. Fixes: eabd9db64ea8 ("ACPI: RISCV: Add NUMA support based on SRAT and SLIT") Reported-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/0d362a8ae50558b95685da4c821b2ae9e8cf78be.1722828421.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14riscv: change XIP's kernel_map.size to be size of the entire kernelNam Cao
With XIP kernel, kernel_map.size is set to be only the size of data part of the kernel. This is inconsistent with "normal" kernel, who sets it to be the size of the entire kernel. More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is enabled, because there are checks on virtual addresses with the assumption that kernel_map.size is the size of the entire kernel (these checks are in arch/riscv/mm/physaddr.c). Change XIP's kernel_map.size to be the size of the entire kernel. Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> # v6.1+ Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240508191917.2892064-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14riscv: entry: always initialize regs->a0 to -ENOSYSCeleste Liu
Otherwise when the tracer changes syscall number to -1, the kernel fails to initialize a0 with -ENOSYS and subsequently fails to return the error code of the failed syscall to userspace. For example, it will break strace syscall tampering. Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1") Reported-by: "Dmitry V. Levin" <ldv@strace.io> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Cc: stable@vger.kernel.org Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com> Link: https://lore.kernel.org/r/20240627142338.5114-2-CoelacanthusHex@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-06riscv: Re-introduce global icache flush in patch_text_XXX()Alexandre Ghiti
commit edf2d546bfd6 ("riscv: patch: Flush the icache right after patching to avoid illegal insns") mistakenly removed the global icache flush in patch_text_nosync() and patch_text_set_nosync() functions, so reintroduce them. Fixes: edf2d546bfd6 ("riscv: patch: Flush the icache right after patching to avoid illegal insns") Reported-by: Samuel Holland <samuel.holland@sifive.com> Closes: https://lore.kernel.org/linux-riscv/a28ddc26-d77a-470a-a33f-88144f717e86@sifive.com/ Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240801191404.55181-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-02Merge tag 'riscv-for-linus-6.11-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix to avoid dropping some of the internal pseudo-extensions, which breaks *envcfg dependency parsing - The kernel entry address is now aligned in purgatory, which avoids a misaligned load that can lead to crash on systems that don't support misaligned accesses early in boot - The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of perf JSON configurations, one of them been updated to FW_SFENCE_VMA_ASID_SENT - The starfive cache driver is now restricted to 64-bit systems, as it isn't 32-bit clean - A fix for to avoid aliasing legacy-mode perf counters with software perf counters - VM_FAULT_SIGSEGV is now handled in the page fault code - A fix for stalls during CPU hotplug due to IPIs being disabled - A fix for memblock bounds checking. This manifests as a crash on systems with discontinuous memory maps that have regions that don't fit in the linear map * tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix linear mapping checks for non-contiguous memory regions RISC-V: Enable the IPI before workqueue_online_cpu() riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error() perf: riscv: Fix selecting counters in legacy mode cache: StarFive: Require a 64-bit system perf arch events: Fix duplicate RISC-V SBI firmware event name riscv/purgatory: align riscv_kernel_entry riscv: cpufeature: Do not drop Linux-internal extensions
2024-08-02syscalls: fix syscall macros for newfstat/newfstatatArnd Bergmann
The __NR_newfstat and __NR_newfstatat macros accidentally got renamed in the conversion to the syscall.tbl format, dropping the 'new' portion of the name. In an unrelated change, the two syscalls are no longer architecture specific but are once more defined on all 64-bit architectures, so the 'newstat' ABI keyword can be dropped from the table as a simplification. Fixes: Fixes: 4fe53bf2ba0a ("syscalls: add generic scripts/syscall.tbl") Closes: https://lore.kernel.org/lkml/838053e0-b186-4e9f-9668-9a3384a71f23@app.fastmail.com/T/#t Reported-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-01riscv: Fix linear mapping checks for non-contiguous memory regionsStuart Menefy
The RISC-V kernel already has checks to ensure that memory which would lie outside of the linear mapping is not used. However those checks use memory_limit, which is used to implement the mem= kernel command line option (to limit the total amount of memory, not its address range). When memory is made up of two or more non-contiguous memory banks this check is incorrect. Two changes are made here: - add a call in setup_bootmem() to memblock_cap_memory_range() which will cause any memory which falls outside the linear mapping to be removed from the memory regions. - remove the check in create_linear_mapping_page_table() which was intended to remove memory which is outside the liner mapping based on memory_limit, as it is no longer needed. Note a check for mapping more memory than memory_limit (to implement mem=) is unnecessary because of the existing call to memblock_enforce_memory_limit(). This issue was seen when booting on a SV39 platform with two memory banks: 0x00,80000000 1GiB 0x20,00000000 32GiB This memory range is 158GiB from top to bottom, but the linear mapping is limited to 128GiB, so the lower block of RAM will be mapped at PAGE_OFFSET, and the upper block straddles the top of the linear mapping. This causes the following Oops: [ 0.000000] Linux version 6.10.0-rc2-gd3b8dd5b51dd-dirty (stuart.menefy@codasip.com) (riscv64-codasip-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41.0.20231213) #20 SMP Sat Jun 22 11:34:22 BST 2024 [ 0.000000] memblock_add: [0x0000000080000000-0x00000000bfffffff] early_init_dt_add_memory_arch+0x4a/0x52 [ 0.000000] memblock_add: [0x0000002000000000-0x00000027ffffffff] early_init_dt_add_memory_arch+0x4a/0x52 ... [ 0.000000] memblock_alloc_try_nid: 23724 bytes align=0x8 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 early_init_dt_alloc_memory_arch+0x1e/0x48 [ 0.000000] memblock_reserve: [0x00000027ffff5350-0x00000027ffffaffb] memblock_alloc_range_nid+0xb8/0x132 [ 0.000000] Unable to handle kernel paging request at virtual address fffffffe7fff5350 [ 0.000000] Oops [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc2-gd3b8dd5b51dd-dirty #20 [ 0.000000] Hardware name: codasip,a70x (DT) [ 0.000000] epc : __memset+0x8c/0x104 [ 0.000000] ra : memblock_alloc_try_nid+0x74/0x84 [ 0.000000] epc : ffffffff805e88c8 ra : ffffffff806148f6 sp : ffffffff80e03d50 [ 0.000000] gp : ffffffff80ec4158 tp : ffffffff80e0bec0 t0 : fffffffe7fff52f8 [ 0.000000] t1 : 00000027ffffb000 t2 : 5f6b636f6c626d65 s0 : ffffffff80e03d90 [ 0.000000] s1 : 0000000000005cac a0 : fffffffe7fff5350 a1 : 0000000000000000 [ 0.000000] a2 : 0000000000005cac a3 : fffffffe7fffaff8 a4 : 000000000000002c [ 0.000000] a5 : ffffffff805e88c8 a6 : 0000000000005cac a7 : 0000000000000030 [ 0.000000] s2 : fffffffe7fff5350 s3 : ffffffffffffffff s4 : 0000000000000000 [ 0.000000] s5 : ffffffff8062347e s6 : 0000000000000000 s7 : 0000000000000001 [ 0.000000] s8 : 0000000000002000 s9 : 00000000800226d0 s10: 0000000000000000 [ 0.000000] s11: 0000000000000000 t3 : ffffffff8080a928 t4 : ffffffff8080a928 [ 0.000000] t5 : ffffffff8080a928 t6 : ffffffff8080a940 [ 0.000000] status: 0000000200000100 badaddr: fffffffe7fff5350 cause: 000000000000000f [ 0.000000] [<ffffffff805e88c8>] __memset+0x8c/0x104 [ 0.000000] [<ffffffff8062349c>] early_init_dt_alloc_memory_arch+0x1e/0x48 [ 0.000000] [<ffffffff8043e892>] __unflatten_device_tree+0x52/0x114 [ 0.000000] [<ffffffff8062441e>] unflatten_device_tree+0x9e/0xb8 [ 0.000000] [<ffffffff806046fe>] setup_arch+0xd4/0x5bc [ 0.000000] [<ffffffff806007aa>] start_kernel+0x76/0x81a [ 0.000000] Code: b823 02b2 bc23 02b2 b023 04b2 b423 04b2 b823 04b2 (bc23) 04b2 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- The problem is that memblock (unaware that some physical memory cannot be used) has allocated memory from the top of memory but which is outside the linear mapping region. Signed-off-by: Stuart Menefy <stuart.menefy@codasip.com> Fixes: c99127c45248 ("riscv: Make sure the linear mapping does not use the kernel mapping") Reviewed-by: David McKay <david.mckay@codasip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240622114217.2158495-1-stuart.menefy@codasip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01RISC-V: Enable the IPI before workqueue_online_cpu()Nick Hu
Sometimes the hotplug cpu stalls at the arch_cpu_idle() for a while after workqueue_online_cpu(). When cpu stalls at the idle loop, the reschedule IPI is pending. However the enable bit is not enabled yet so the cpu stalls at WFI until watchdog timeout. Therefore enable the IPI before the workqueue_online_cpu() to fix the issue. Fixes: 63c5484e7495 ("workqueue: Add multiple affinity scopes and interface to select them") Signed-off-by: Nick Hu <nick.hu@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240717031714.1946036-1-nick.hu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()Zhe Qiao
Handle VM_FAULT_SIGSEGV in the page fault path so that we correctly kill the process and we don't BUG() the kernel. Fixes: 07037db5d479 ("RISC-V: Paging and MMU") Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240731084547.85380-1-qiaozhe@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01riscv/purgatory: align riscv_kernel_entryDaniel Maslowski
When alignment handling is delegated to the kernel, everything must be word-aligned in purgatory, since the trap handler is then set to the kexec one. Without the alignment, hitting the exception would ultimately crash. On other occasions, the kernel's handler would take care of exceptions. This has been tested on a JH7110 SoC with oreboot and its SBI delegating unaligned access exceptions and the kernel configured to handle them. Fixes: 736e30af583fb ("RISC-V: Add purgatory") Signed-off-by: Daniel Maslowski <cyrevolt@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-31riscv: cpufeature: Do not drop Linux-internal extensionsSamuel Holland
The Linux-internal Xlinuxenvcfg ISA extension is omitted from the riscv_isa_ext array because it has no DT binding and should not appear in /proc/cpuinfo. The logic added in commit 625034abd52a ("riscv: add ISA extensions validation callback") assumes all extensions are included in riscv_isa_ext, and so riscv_resolve_isa() wrongly drops Xlinuxenvcfg from the final ISA string. Instead, accept such Linux-internal ISA extensions as if they have no validation callback. Fixes: 625034abd52a ("riscv: add ISA extensions validation callback") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240718213011.2600150-1-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-27Merge tag 'riscv-for-linus-6.11-mw2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and cache info (via PPTT) on ACPI-based systems. - The trap entry/exit code no longer breaks the return address stack predictor on many systems, which results in an improvement to trap latency. - Support for HAVE_ARCH_STACKLEAK. - The sv39 linear map has been extended to support 128GiB mappings. - The frequency of the mtime CSR is now visible via hwprobe. * tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits) RISC-V: Provide the frequency of time CSR via hwprobe riscv: Extend sv39 linear mapping max size to 128G riscv: enable HAVE_ARCH_STACKLEAK riscv: signal: Remove unlikely() from WARN_ON() condition riscv: Improve exception and system call latency RISC-V: Select ACPI PPTT drivers riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init() RISC-V: ACPI: Enable SPCR table for console output on RISC-V riscv: boot: remove duplicated targets line trace: riscv: Remove deprecated kprobe on ftrace support riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions RISC-V: run savedefconfig for defconfig RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ...
2024-07-26Merge tag 'bitmap-6.11-rc1' of https://github.com:/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: "Random fixes" * tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux: riscv: Remove unnecessary int cast in variable_fls() radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c bitops: Add a comment explaining the double underscore macros lib: bitmap: add missing MODULE_DESCRIPTION() macros cpumask: introduce assign_cpu() macro
2024-07-26RISC-V: Provide the frequency of time CSR via hwprobePalmer Dabbelt
The RISC-V architecture makes a real time counter CSR (via RDTIME instruction) available for applications in U-mode but there is no architected mechanism for an application to discover the frequency the counter is running at. Some applications (e.g., DPDK) use the time counter for basic performance analysis as well as fine grained time-keeping. Add support to the hwprobe system call to export the time CSR frequency to code running in U-mode. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Evan Green <evan@rivosinc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Punit Agrawal <punit.agrawal@bytedance.com> Link: https://lore.kernel.org/r/20240702033731.71955-2-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26riscv: Extend sv39 linear mapping max size to 128GStuart Menefy
This harmonizes all virtual addressing modes which can now all map (PGDIR_SIZE * PTRS_PER_PGD) / 4 of physical memory. The RISCV implementation of KASAN requires that the boundary between shallow mappings are aligned on an 8G boundary. In this case we need VMALLOC_START to be 8G aligned. So although we only need to move the start of the linear mapping down by 4GiB to allow 128GiB to be mapped, we actually move it down by 8GiB (creating a 4GiB hole between the linear mapping and KASAN shadow space) to maintain the alignment requirement. Signed-off-by: Stuart Menefy <stuart.menefy@codasip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240630110550.1731929-1-stuart.menefy@codasip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26Merge patch series "RISC-V: Select ACPI PPTT drivers"Palmer Dabbelt
This series adds support for ACPI PPTT via cacheinfo. * b4-shazam-merge: RISC-V: Select ACPI PPTT drivers riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init() Link: https://lore.kernel.org/r/20240617131425.7526-1-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26Merge patch "Enable SPCR table for console output on RISC-V"Palmer Dabbelt
Sia Jee Heng <jeeheng.sia@starfivetech.com> says: The ACPI SPCR code has been used to enable console output for ARM64 and X86. The same code can be reused for RISC-V. Furthermore, SPCR table is mandated for headless system as outlined in the RISC-V BRS Specification, chapter 6. * b4-shazam-merge: RISC-V: ACPI: Enable SPCR table for console output on RISC-V Link: https://lore.kernel.org/r/20240502073751.102093-1-jeeheng.sia@starfivetech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26riscv: enable HAVE_ARCH_STACKLEAKJisheng Zhang
Add support for the stackleak feature. Whenever the kernel returns to user space the kernel stack is filled with a poison value. At the same time, disables the plugin in EFI stub code because EFI stub is out of scope for the protection. Tested on qemu and milkv duo: / # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT [ 38.675575] lkdtm: Performing direct entry STACKLEAK_ERASING [ 38.678448] lkdtm: stackleak stack usage: [ 38.678448] high offset: 288 bytes [ 38.678448] current: 496 bytes [ 38.678448] lowest: 1328 bytes [ 38.678448] tracked: 1328 bytes [ 38.678448] untracked: 448 bytes [ 38.678448] poisoned: 14312 bytes [ 38.678448] low offset: 8 bytes [ 38.689887] lkdtm: OK: the rest of the thread stack is properly erased Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240623235316.2010-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26riscv: signal: Remove unlikely() from WARN_ON() conditionZhongqiu Han
"WARN_ON(unlikely(x))" is excessive. WARN_ON() already uses unlikely() internally. Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20240620033434.3778156-1-quic_zhonhan@quicinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-26riscv: Improve exception and system call latencyAnton Blanchard
Many CPUs implement return address branch prediction as a stack. The RISCV architecture refers to this as a return address stack (RAS). If this gets corrupted then the CPU will mispredict at least one but potentally many function returns. There are two issues with the current RISCV exception code: - We are using the alternate link stack (x5/t0) for the indirect branch which makes the hardware think this is a function return. This will corrupt the RAS. - We modify the return address of handle_exception to point to ret_from_exception. This will also corrupt the RAS. Testing the null system call latency before and after the patch: Visionfive2 (StarFive JH7110 / U74) baseline: 189.87 ns patched: 176.76 ns Lichee pi 4a (T-Head TH1520 / C910) baseline: 666.58 ns patched: 636.90 ns Just over 7% on the U74 and just over 4% on the C910. Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com> Tested-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Jisheng Zhang <jszhang@kernel.org> Link: https://lore.kernel.org/r/20240607061335.2197383-1-cyrilbur@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24RISC-V: Select ACPI PPTT driversYunhui Cui
After adding ACPI support to populate_cache_leaves(), RISC-V can build cacheinfo through the ACPI PPTT table, thus enabling the ACPI_PPTT configuration. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20240617131425.7526-3-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTTYunhui Cui
Before cacheinfo can be built correctly, we need to initialize level and type. Since RISC-V currently does not have a register group that describes cache-related attributes like ARM64, we cannot obtain them directly, so now we obtain cache leaves from the ACPI PPTT table (acpi_get_cache_info()) and set the cache type through split_levels. Suggested-by: Jeremy Linton <jeremy.linton@arm.com> Suggested-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Link: https://lore.kernel.org/r/20240617131425.7526-2-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init()Yunhui Cui
ci_leaf_init() is a declared static function. The implementation of the function body and the caller do not use the parameter (struct device_node *node) input parameter, so remove it. Fixes: 6a24915145c9 ("Revert "riscv: Set more data to cacheinfo"") Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20240617131425.7526-1-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24RISC-V: ACPI: Enable SPCR table for console output on RISC-VSia Jee Heng
The ACPI SPCR code has been used to enable console output for ARM64 and X86. The same code can be reused for RISC-V. Furthermore, SPCR table is mandated for headless system as outlined in the RISC-V BRS Specification, chapter 6. Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20240502073751.102093-2-jeeheng.sia@starfivetech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24riscv: boot: remove duplicated targets lineJisheng Zhang
The "targets:" is duplicated in another line, remove the one with less targets. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Link: https://lore.kernel.org/r/20240613153053.3835-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-24trace: riscv: Remove deprecated kprobe on ftrace supportJinjie Ruan
Since commit 7caa9765465f60 ("ftrace: riscv: move from REGS to ARGS"), kprobe on ftrace is not supported by riscv, because riscv's support for FTRACE_WITH_REGS has been replaced with support for FTRACE_WITH_ARGS, and KPROBES_ON_FTRACE will be supplanted by FPROBES. So remove the deprecated kprobe on ftrace support, which is misunderstood. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20240613111347.1745379-1-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-23Merge tag 'kbuild-v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_KALLSYMS and CONFIG_DEBUG_INFO_BTF - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig * tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits) kbuild: doc: gcc to CC change kallsyms: change sym_entry::percpu_absolute to bool type kallsyms: unify seq and start_pos fields of struct sym_entry kallsyms: add more original symbol type/name in comment lines kallsyms: use \t instead of a tab in printf() kallsyms: avoid repeated calculation of array size for markers kbuild: add script and target to generate pacman package modpost: use generic macros for hash table implementation kbuild: move some helper headers from scripts/kconfig/ to scripts/include/ Makefile: add comment to discourage tools/* addition for kernel builds kbuild: clean up scripts/remove-stale-files kconfig: recursive checks drop file/lineno kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec kallsyms: get rid of code for absolute kallsyms kbuild: Create INSTALL_PATH directory if it does not exist kbuild: Abort make on install failures kconfig: remove 'e1' and 'e2' macros from expression deduplication kconfig: remove SYMBOL_CHOICEVAL flag kconfig: add const qualifiers to several function arguments kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups() ...
2024-07-22Merge patch series "riscv: Separate vendor extensions from standard extensions"Palmer Dabbelt
Charlie Jenkins <charlie@rivosinc.com> says: All extensions, both standard and vendor, live in one struct "riscv_isa_ext". There is currently one vendor extension, xandespmu, but it is likely that more vendor extensions will be added to the kernel in the future. As more vendor extensions (and standard extensions) are added, riscv_isa_ext will become more bloated with a mix of vendor and standard extensions. This also allows each vendor to be conditionally enabled through Kconfig. * b4-shazam-merge: riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-0-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22riscv: cpufeature: Extract common elements from extension checkingCharlie Jenkins
The __riscv_has_extension_likely() and __riscv_has_extension_unlikely() functions from the vendor_extensions.h can be used to simplify the standard extension checking code as well. Migrate those functions to cpufeature.h and reorganize the code in the file to use the functions. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-4-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22riscv: Introduce vendor variants of extension helpersCharlie Jenkins
Vendor extensions are maintained in per-vendor structs (separate from standard extensions which live in riscv_isa). Create vendor variants for the existing extension helpers to interface with the riscv_isa_vendor bitmaps. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-3-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22riscv: Add vendor extensions to /proc/cpuinfoCharlie Jenkins
All of the supported vendor extensions that have been listed in riscv_isa_vendor_ext_list can be exported through /proc/cpuinfo. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-2-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22riscv: Extend cpufeature.c to detect vendor extensionsCharlie Jenkins
Instead of grouping all vendor extensions into the same riscv_isa_ext that standard instructions use, create a struct "riscv_isa_vendor_ext_data_list" that allows each vendor to maintain their vendor extensions independently of the standard extensions. xandespmu is currently the only vendor extension so that is the only extension that is affected by this change. An additional benefit of this is that the extensions of each vendor can be conditionally enabled. A config RISCV_ISA_VENDOR_EXT_ANDES has been added to allow for that. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com> Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-1-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22RISC-V: run savedefconfig for defconfigConor Dooley
It's been a while since this was run, and there's a few things that have changed. Firstly, almost all of the Renesas stuff vanishes because the config for the RZ/Five is gated behind NONPORTABLE. Several options (like CONFIG_PM) are removed as they are the default values. To retain DEFVFREQ_THERMAL and BLK_DEV_THROTTLING, add PM_DEVFREQ and BLK_CGROUP respectively. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240717-shrubs-concise-51600886babf@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabeticallyConor Dooley
Currently the entries appear to be in a random order (although according to Palmer he has tried to sort them by key value) which makes it harder to find entries in a growing list, and more likely to have conflicts as all patches are adding to the end of the list. Sort them alphabetically instead. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20240717-dedicate-squeamish-7e4ab54df58f@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22Merge patch series "Add ACPI NUMA support for RISC-V"Palmer Dabbelt
Haibo Xu <haibo1.xu@intel.com> says: This patch series enable RISC-V ACPI NUMA support which was based on the recently approved ACPI ECR[1]. Patch 1/4 add RISC-V specific acpi_numa.c file to parse NUMA information from SRAT and SLIT ACPI tables. Patch 2/4 add the common SRAT RINTC affinity structure handler. Patch 3/4 change the ACPI_NUMA to a hidden option since it would be selected by default on all supported platform. Patch 4/4 replace pr_info with pr_debug in arch_acpi_numa_init() to avoid potential boot noise on ACPI platforms that are not NUMA. Based-on: https://github.com/linux-riscv/linux-riscv/tree/for-next [1] https://drive.google.com/file/d/1YTdDx2IPm5IeZjAW932EYU-tUtgS08tX/view?usp=sharing Testing: Since the ACPI AIA/PLIC support patch set is still under upstream review, hence it is tested using the poll based HVC SBI console and RAM disk. 1) Build latest Qemu with the following patch backported https://github.com/vlsunil/qemu/commit/42bd4eeefd5d4410a68f02d54fee406d8a1269b0 2) Build latest EDK-II https://github.com/tianocore/edk2/blob/master/OvmfPkg/RiscVVirt/README.md 3) Build Linux with the following configs enabled CONFIG_RISCV_SBI_V01=y CONFIG_SERIAL_EARLYCON_RISCV_SBI=y CONFIG_NONPORTABLE=y CONFIG_HVC_RISCV_SBI=y CONFIG_NUMA=y CONFIG_ACPI_NUMA=y 4) Build buildroot rootfs.cpio 5) Launch the Qemu machine qemu-system-riscv64 -nographic \ -machine virt,pflash0=pflash0,pflash1=pflash1 -smp 4 -m 8G \ -blockdev node-name=pflash0,driver=file,read-only=on,filename=RISCV_VIRT_CODE.fd \ -blockdev node-name=pflash1,driver=file,filename=RISCV_VIRT_VARS.fd \ -object memory-backend-ram,size=4G,id=m0 \ -object memory-backend-ram,size=4G,id=m1 \ -numa node,memdev=m0,cpus=0-1,nodeid=0 \ -numa node,memdev=m1,cpus=2-3,nodeid=1 \ -numa dist,src=0,dst=1,val=30 \ -kernel linux/arch/riscv/boot/Image \ -initrd buildroot/output/images/rootfs.cpio \ -append "root=/dev/ram ro console=hvc0 earlycon=sbi" [ 0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x80000000-0x17fffffff] [ 0.000000] ACPI: SRAT: Node 1 PXM 1 [mem 0x180000000-0x27fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x17fe3bc40-0x17fe3cfff] [ 0.000000] NUMA: NODE_DATA [mem 0x27fff4c40-0x27fff5fff] ... [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> HARTID 0x0 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 0 -> HARTID 0x1 -> Node 0 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> HARTID 0x2 -> Node 1 [ 0.000000] ACPI: NUMA: SRAT: PXM 1 -> HARTID 0x3 -> Node 1 * b4-shazam-merge: ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ACPI: RISCV: Add NUMA support based on SRAT and SLIT Link: https://lore.kernel.org/r/cover.1718268003.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22ACPI: RISCV: Add NUMA support based on SRAT and SLITHaibo Xu
Add acpi_numa.c file to enable parse NUMA information from ACPI SRAT and SLIT tables. SRAT table provide CPUs(Hart) and memory nodes to proximity domain mapping, while SLIT table provide the distance metrics between proximity domains. Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/65dbad1fda08a32922c44886e4581e49b4a2fecc.1718268003.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-20Merge tag 'riscv-for-linus-6.11-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various new ISA extensions: * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector extension * Zimop and Zcmop for may-be-operations * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension * Zawrs - riscv,cpu-intc is now dtschema - A handful of performance improvements and cleanups to text patching - Support for memory hot{,un}plug - The highest user-allocatable virtual address is now visible in hwprobe * tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (58 commits) riscv: lib: relax assembly constraints in hweight riscv: set trap vector earlier KVM: riscv: selftests: Add Zawrs extension to get-reg-list test KVM: riscv: Support guest wrs.nto riscv: hwprobe: export Zawrs ISA extension riscv: Add Zawrs support for spinlocks dt-bindings: riscv: Add Zawrs ISA extension description riscv: Provide a definition for 'pause' riscv: hwprobe: export highest virtual userspace address riscv: Improve sbi_ecall() code generation by reordering arguments riscv: Add tracepoints for SBI calls and returns riscv: Optimize crc32 with Zbc extension riscv: Enable DAX VMEMMAP optimization riscv: mm: Add support for ZONE_DEVICE virtio-mem: Enable virtio-mem for RISC-V riscv: Enable memory hotplugging for RISC-V riscv: mm: Take memory hotplug read-lock during kernel page table dump riscv: mm: Add memory hotplugging support riscv: mm: Add pfn_to_kaddr() implementation riscv: mm: Refactor create_linear_mapping_range() for memory hot add ...
2024-07-20kbuild: Abort make on install failuresZhang Bingwu
Setting '-e' flag tells shells to exit with error exit code immediately after any of commands fails, and causes make(1) to regard recipes as failed. Before this, make will still continue to succeed even after the installation failed, for example, for insufficient permission or directory does not exist. Signed-off-by: Zhang Bingwu <xtexchooser@duck.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>