Age | Commit message (Collapse) | Author |
|
Make has_vector() to check for ZVE32X. Every in-kernel usage of V that
requires a more complicate version of V must then call out explicitly.
Also, change riscv_v_first_use_handler(), and boot code that calls
riscv_v_setup_vsize() to accept ZVE32X.
Most kernel/user interfaces requires minimum of ZVE32X. Thus, programs
compiled and run with ZVE32X should be supported by the kernel on most
aspects. This includes context-switch, signal, ptrace, prctl, and
hwprobe.
One exception is that ELF_HWCAP returns 'V' only if full V is supported
on the platform. This means that the system without a full V must not
rely on ELF_HWCAP to tell whether it is allowable to execute Vector
without first invoking a prctl() check.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Joel Granados <j.granados@samsung.com>
Link: https://lore.kernel.org/r/20240510-zve-detection-v5-7-0711bdd26c12@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The bytes copy for unaligned head would cover at most SZREG-1 bytes, so
it's better to set the threshold as >= (SZREG-1 + word_copy stride size)
which equals to 9*SZREG-1.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240313091929.4029960-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
When the dst buffer pointer points to the last accessible aligned addr, we
could still run another iteration of unrolled copy.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240313103334.4036554-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Andy Chiu <andy.chiu@sifive.com> says:
This series provides support running Vector in kernel mode.
Additionally, kernel-mode Vector can be configured to run without
turnning off preemption on a CONFIG_PREEMPT kernel. Along with the
suport, we add Vector optimized copy_{to,from}_user. And provide a
simple threshold to decide when to run the vectorized functions.
We decided to drop vectorized memcpy/memset/memmove for the moment due
to the concern of memory side-effect in kernel_vector_begin(). The
detailed description can be found at v9[0]
This series is composed by 4 parts:
patch 1-4: adds basic support for kernel-mode Vector
patch 5: includes vectorized copy_{to,from}_user into the kernel
patch 6: refactor context switch code in fpu [1]
patch 7-10: provides some code refactors and support for preemptible
kernel-mode Vector.
This series can be merged if we feel any part of {1~4, 5, 6, 7~10} is
mature enough.
This patch is tested on a QEMU with V and verified that booting, normal
userspace operations all work as usual with thresholds set to 0. Also,
we test by launching multiple kernel threads which continuously executes
and verifies Vector operations in the background. The module that tests
these operation is expected to be upstream later.
* b4-shazam-merge:
riscv: vector: allow kernel-mode Vector with preemption
riscv: vector: use kmem_cache to manage vector context
riscv: vector: use a mask to write vstate_ctrl
riscv: vector: do not pass task_struct into riscv_v_vstate_{save,restore}()
riscv: fpu: drop SR_SD bit checking
riscv: lib: vectorize copy_to_user/copy_from_user
riscv: sched: defer restoring Vector context for user
riscv: Add vector extension XOR implementation
riscv: vector: make Vector always available for softirq context
riscv: Add support for kernel mode vector
Link: https://lore.kernel.org/r/20240115055929.4736-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch utilizes Vector to perform copy_to_user/copy_from_user. If
Vector is available and the size of copy is large enough for Vector to
perform better than scalar, then direct the kernel to do Vector copies
for userspace. Though the best programming practice for users is to
reduce the copy, this provides a faster variant when copies are
inevitable.
The optimal size for using Vector, copy_to_user_thres, is only a
heuristic for now. We can add DT parsing if people feel the need of
customizing it.
The exception fixup code of the __asm_vector_usercopy must fallback to
the scalar one because accessing user pages might fault, and must be
sleepable. Current kernel-mode Vector does not allow tasks to be
preemptible, so we must disactivate Vector and perform a scalar fallback
in such case.
The original implementation of Vector operations comes from
https://github.com/sifive/sifive-libc, which we agree to contribute to
Linux kernel.
Co-developed-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
Co-developed-by: Nick Knight <nick.knight@sifive.com>
Signed-off-by: Nick Knight <nick.knight@sifive.com>
Suggested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20240115055929.4736-6-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
asm-generic/export.h is a wrapper for linux/export.h, with explicit request
to use linux/export.h directly.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20231214191922.GQ1674809@ZenIV
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
ENTRY()/END()/WEAK() macros are deprecated and we should make use of the
new SYM_*() macros [1] for better annotation of symbols. Replace the
deprecated ones with the new ones and fix wrong usage of END()/ENDPROC()
to correctly describe the symbols.
[1] https://docs.kernel.org/core-api/asm-annotations.html
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231024132655.730417-3-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
It was reported that the riscv kernel hangs while executing the test
in [1].
Indeed, the test hangs when trying to write a buffer to a file. The
problem is that the riscv implementation of raw_copy_from_user() does not
return the correct number of bytes not written when an exception happens
and is fixed up, instead it always returns the initial size to copy,
even if some bytes were actually copied.
generic_perform_write() pre-faults the user pages and bails out if nothing
can be written, otherwise it will access the userspace buffer: here the
riscv implementation keeps returning it was not able to copy any byte
though the pre-faulting indicates otherwise. So generic_perform_write()
keeps retrying to access the user memory and ends up in an infinite
loop.
Note that before the commit mentioned in [1] that introduced this
regression, it worked because generic_perform_write() would bail out if
only one byte could not be written.
So fix this by returning the number of bytes effectively not written in
__asm_copy_[to|from]_user() and __clear_user(), as it is expected.
Link: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/ [1]
Fixes: ebcbd75e3962 ("riscv: Fix the bug in memory access fixup code")
Reported-by: Bo YU <tsu.yubo@gmail.com>
Closes: https://lore.kernel.org/linux-riscv/20230309151841.bomov6hq3ybyp42a@debian/#t
Reported-by: Aurelien Jarno <aurelien@aurel32.net>
Closes: https://lore.kernel.org/linux-riscv/ZNOnCakhwIeue3yr@aurel32.net/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Link: https://lore.kernel.org/r/20230811150604.1621784-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Since commit 5d8544e2d007 ("RISC-V: Generic library routines and assembly")
and commit ebcbd75e3962 ("riscv: Fix the bug in memory access fixup code"),
if __clear_user and __copy_user return from an fixup branch,
CSR_STATUS SR_SUM bit will be set, it is a vulnerability, so that
S-mode memory accesses to pages that are accessible by U-mode will success.
Disable S-mode access to U-mode memory should clear SR_SUM bit.
Fixes: 5d8544e2d007 ("RISC-V: Generic library routines and assembly")
Fixes: ebcbd75e3962 ("riscv: Fix the bug in memory access fixup code")
Signed-off-by: Chen Lifu <chenlifu@huawei.com>
Reviewed-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20220615014714.1650349-1-chenlifu@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This is a riscv port of commit 819771cc2892 ("arm64: extable:
consolidate definitions").
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
uaccess functions such __asm_copy_to_user(), __arch_copy_from_user()
and __clear_user() place their exception fixups in the `.fixup` section
without any clear association with themselves. If we backtrace the
fixup code, it will be symbolized as an offset from the nearest prior
symbol.
Similar as arm64 does, we must move fixups into the body of the
functions themselves, after the usual fast-path returns.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Similar as other architectures such as arm64, x86 and so on, use
offsets relative to the exception table entry values rather than
absolute addresses for both the exception locationand the fixup.
However, RISCV label difference will actually produce two relocations,
a pair of R_RISCV_ADD32 and R_RISCV_SUB32. Take below simple code for
example:
$ cat test.S
.section .text
1:
nop
.section __ex_table,"a"
.balign 4
.long (1b - .)
.previous
$ riscv64-linux-gnu-gcc -c test.S
$ riscv64-linux-gnu-readelf -r test.o
Relocation section '.rela__ex_table' at offset 0x100 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000000 000600000023 R_RISCV_ADD32 0000000000000000 .L1^B1 + 0
000000000000 000500000027 R_RISCV_SUB32 0000000000000000 .L0 + 0
The modpost will complain the R_RISCV_SUB32 relocation, so we need to
patch modpost.c to skip this relocation for .rela__ex_table section.
After this patch, the __ex_table section size of defconfig vmlinux is
reduced from 7072 Bytes to 3536 Bytes.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Fixing typos and grammar mistakes and using more intuitive label
name.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Clean up:
The size of 0 will be evaluated in the next step. Not
required here.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Had a bug when converting bytes to bits when the cpu was rv32.
The a3 contains the number of bytes and multiple of 8
would be the bits. The LGREG is holding 2 for RV32 and 3 for
RV32, so to achieve multiple of 8 it must always be constant 3.
The 2 was mistakenly used for rv32.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
There were two causes for the overrun memory access.
The threshold size was too small.
The aligning dst require one SZREG and unrolling word copy requires
8*SZREG, total have to be at least 9*SZREG.
Inside the unrolling copy, the subtracting -(8*SZREG-1) would make
iteration happening one extra loop. Proper value is -(8*SZREG).
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
pipeline stall
This patch will reduce cpu usage dramatically in kernel space especially
for application which use sys-call with large buffer size, such as
network applications. The main reason behind this is that every
unaligned memory access will raise exceptions and switch between s-mode
and m-mode causing large overhead.
First copy in bytes until reaches the first word aligned boundary in
destination memory address. This is the preparation before the bulk
aligned word copy.
The destination address is aligned now, but oftentimes the source
address is not in an aligned boundary. To reduce the unaligned memory
access, it reads the data from source in aligned boundaries, which will
cause the data to have an offset, and then combines the data in the next
iteration by fixing offset with shifting before writing to destination.
The majority of the improving copy speed comes from this shift copy.
In the lucky situation that the both source and destination address are
on the aligned boundary, perform load and store with register size to
copy the data. Without the unrolling, it will reduce the speed since the
next store instruction for the same register using from the load will
stall the pipeline.
At last, copying the remainder in one byte at a time.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
LLVM's integrated assembler doesn't support the LOCAL directive, which we're
using when generating our uaccess fixup tables. Luckily the table fragment is
small enough that there's only one internal symbol, so using a relative symbol
reference doesn't really complicate anything.
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
When support for !MMU was added, the declaration of
__asm_copy_to_user() & __asm_copy_from_user() were #ifdefed
out hence their EXPORT_SYMBOL() give an error message like:
.../riscv_ksyms.c:13:15: error: '__asm_copy_to_user' undeclared here
.../riscv_ksyms.c:14:15: error: '__asm_copy_from_user' undeclared here
Since these symbols are not defined with !MMU it's wrong to export them.
Same for __clear_user() (even though this one is also declared in
include/asm-generic/uaccess.h and thus doesn't give an error message).
Fix this by doing the EXPORT_SYMBOL() directly where these symbols
are defined: inside lib/uaccess.S itself.
Fixes: 6bd33e1ece52 ("riscv: fix compile failure with EXPORT_SYMBOL() & !MMU")
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
Many of the privileged CSRs exist in a supervisor and machine version
that are used very similarly. Provide versions of the CSR names and
fields that map to either the S-mode or M-mode variant depending on
a new CONFIG_RISCV_M_MODE kconfig symbol.
Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com>
and Paul Walmsley <paul.walmsley@sifive.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de> # for drivers/clocksource, drivers/irqchip
[paul.walmsley@sifive.com: updated to apply]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
Since commit a3182c91ef4e ("RISC-V: Access CSRs using CSR numbers"),
we should prefer accessing CSRs using their CSR numbers, but there
are several leftovers like sstatus / sptbr we missed.
Signed-off-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
This patch set contains a handful of fixes that clean up the sparse
results for the RISC-V port. These patches shouldn't have any
functional difference. The patches:
* Use NULL instead of 0.
* Clean up __user annotations.
* Split __copy_user into two functions, to make the __user annotations
valid.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
We use a single __copy_user assembly function to copy memory both from
and to userspace. While this works, it triggers sparse errors because
we're implicitly casting between the kernel and user address spaces by
calling __copy_user.
This patch splits the C declaration into a pair of functions,
__asm_copy_{to,from}_user, that have sane semantics WRT __user. This
split make things fine from sparse's point of view. The assembly
implementation keeps a single definition but add a double ENTRY() for it,
one for __asm_copy_to_user and another one for __asm_copy_from_user.
The result is a spare-safe implementation that pays no performance
or code size penalty.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
A piece of fixup code is currently shared by __copy_user and
__clear_user. It first disables the access to user-space memory
and then returns the "n" argument, which represents #(bytes not processed).
However,__copy_user's "n" is in register a2, while __clear_user's in a1,
and thus it causes errors for programs like setdomainname02 testcase in LTP.
This patch fixes this issue by separating their fixup code and returning
the right value for the kernel to handle a relative fault properly.
Signed-off-by: Alan Kao <alankao@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Zong Li <zong@andestech.com>
Cc: Vincent Chen <vincentc@andestech.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
This patch contains code that is more specific to the RISC-V ISA than it
is to Linux. It contains string and math operations, C wrappers for
various assembly instructions, stack walking code, and uaccess.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|