Age | Commit message (Collapse) | Author |
|
Charlie Jenkins <charlie@rivosinc.com> says:
Each architecture generally implements fine-tuned checksum functions to
leverage the instruction set. This patch adds the main checksum
functions that are used in networking. Tested on QEMU, this series
allows the CHECKSUM_KUNIT tests to complete an average of 50.9% faster.
This patch takes heavy use of the Zbb extension using alternatives
patching.
To test this patch, enable the configs for KUNIT, then CHECKSUM_KUNIT.
I have attempted to make these functions as optimal as possible, but I
have not ran anything on actual riscv hardware. My performance testing
has been limited to inspecting the assembly, running the algorithms on
x86 hardware, and running in QEMU.
ip_fast_csum is a relatively small function so even though it is
possible to read 64 bits at a time on compatible hardware, the
bottleneck becomes the clean up and setup code so loading 32 bits at a
time is actually faster.
* b4-shazam-merge:
kunit: Add tests for csum_ipv6_magic and ip_fast_csum
riscv: Add checksum library
riscv: Add checksum header
riscv: Add static key for misaligned accesses
asm-generic: Improve csum_fold
Link: https://lore.kernel.org/r/20240108-optimize_checksum-v15-0-1c50de5f2167@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Provide a 32 and 64 bit version of do_csum. When compiled for 32-bit
will load from the buffer in groups of 32 bits, and when compiled for
64-bit will load in groups of 64 bits.
Additionally provide riscv optimized implementation of csum_ipv6_magic.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Xiao Wang <xiao.w.wang@intel.com>
Link: https://lore.kernel.org/r/20240108-optimize_checksum-v15-4-1c50de5f2167@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch utilizes Vector to perform copy_to_user/copy_from_user. If
Vector is available and the size of copy is large enough for Vector to
perform better than scalar, then direct the kernel to do Vector copies
for userspace. Though the best programming practice for users is to
reduce the copy, this provides a faster variant when copies are
inevitable.
The optimal size for using Vector, copy_to_user_thres, is only a
heuristic for now. We can add DT parsing if people feel the need of
customizing it.
The exception fixup code of the __asm_vector_usercopy must fallback to
the scalar one because accessing user pages might fault, and must be
sleepable. Current kernel-mode Vector does not allow tasks to be
preemptible, so we must disactivate Vector and perform a scalar fallback
in such case.
The original implementation of Vector operations comes from
https://github.com/sifive/sifive-libc, which we agree to contribute to
Linux kernel.
Co-developed-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
Co-developed-by: Nick Knight <nick.knight@sifive.com>
Signed-off-by: Nick Knight <nick.knight@sifive.com>
Suggested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20240115055929.4736-6-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
This patch adds support for vector optimized XOR and it is tested in
qemu.
Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://lore.kernel.org/r/20240115055929.4736-4-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Using memset() to zero a 4K page takes 563 total instructions, where
20 are branches. clear_page(), with Zicboz and a 64 byte block size,
takes 169 total instructions, where 4 are branches and 33 are nops.
Even though the block size is a variable, thanks to alternatives, we
can still implement a Duff device without having to do any preliminary
calculations. This is achieved by using the alternatives' cpufeature
value (the upper 16 bits of patch_id). The value used is the maximum
zicboz block size order accepted at the patch site. This enables us
to stop patching / unrolling when 4K bytes have been zeroed (we would
loop and continue after 4K if the page size would be larger)
For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
only loop a few times and larger block sizes to not loop at all. Since
cbo.zero doesn't take an offset, we also need an 'add' after each
instruction, making the loop body 112 to 160 bytes. Hopefully this
is small enough to not cause icache misses.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Depending on supported extensions on specific RISC-V cores,
optimized str* functions might make sense.
This adds basic infrastructure to allow patching the function calls
via alternatives later on.
The Linux kernel provides standard implementations for string functions
but when architectures want to extend them, they need to provide their
own.
The added generic string functions are done in assembler (taken from
disassembling the main-kernel functions for now) to allow us to control
the used registers and extend them with optimized variants.
This doesn't override the compiler's use of builtin replacements. So still
first of all the compiler will select if a builtin will be better suitable
i.e. for known strings. For all regular cases we will want to later
select possible optimized variants and in the worst case fall back to the
generic implemention added with this change.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230113212301.3534711-2-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Inspired by the commit 42d038c4fb00 ("arm64: Add support for function
error injection"), this patch supports function error injection for
riscv.
This patch mainly support two functions: one is regs_set_return_value()
which is used to overwrite the return value; the another function is
override_function_with_return() which is to override the probed
function returning and jump to its caller.
Test log:
cd /sys/kernel/debug/fail_function
echo sys_clone > inject
echo 100 > probability
echo 1 > interval
ls /
[ 313.176875] FAULT_INJECTION: forcing a failure.
[ 313.176875] name fail_function, interval 1, probability 100, space 0, times 1
[ 313.184357] CPU: 0 PID: 87 Comm: sh Not tainted 5.8.0-rc5-00007-g6a758cc #117
[ 313.187616] Call Trace:
[ 313.189100] [<ffffffe0002036b6>] walk_stackframe+0x0/0xc2
[ 313.191626] [<ffffffe00020395c>] show_stack+0x40/0x4c
[ 313.193927] [<ffffffe000556c60>] dump_stack+0x7c/0x96
[ 313.194795] [<ffffffe0005522e8>] should_fail+0x140/0x142
[ 313.195923] [<ffffffe000299ffc>] fei_kprobe_handler+0x2c/0x5a
[ 313.197687] [<ffffffe0009e2ec4>] kprobe_breakpoint_handler+0xb4/0x18a
[ 313.200054] [<ffffffe00020357e>] do_trap_break+0x36/0xca
[ 313.202147] [<ffffffe000201bca>] ret_from_exception+0x0/0xc
[ 313.204556] [<ffffffe000201bbc>] ret_from_syscall+0x0/0x2
-sh: can't fork: Invalid argument
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The memmove used by the kernel feature like KASAN.
Signed-off-by: Nick Hu <nickhu@andestech.com>
Signed-off-by: Nick Hu <nick650823@gmail.com>
Signed-off-by: Nylon Chen <nylon7@andestech.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This reverts commit adccfb1a805ea84d2db38eb53032533279bdaa97.
Now that the generic uaccess by mempcy code handles unaligned addresses
the generic code can be used for all RISC-V CPUs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
It might have the unaligned access exception when trying to exchange data
with user space program. In this case, it failed in tty_ioctl(). Therefore
we should enable uaccess.S for NOMMU mode since the generic code doesn't
handle the unaligned access cases.
0x8013a212 <tty_ioctl+462>: ld a5,460(s1)
[ 0.115279] Oops - load address misaligned [#1]
[ 0.115284] CPU: 0 PID: 29 Comm: sh Not tainted 5.4.0-rc5-00020-gb4c27160d562-dirty #36
[ 0.115294] epc: 000000008013a212 ra : 000000008013a212 sp : 000000008f48dd50
[ 0.115303] gp : 00000000801cac28 tp : 000000008fb80000 t0 : 00000000000000e8
[ 0.115312] t1 : 000000008f58f108 t2 : 0000000000000009 s0 : 000000008f48ddf0
[ 0.115321] s1 : 000000008f8c6220 a0 : 0000000000000001 a1 : 000000008f48dd28
[ 0.115330] a2 : 000000008fb80000 a3 : 00000000801a7398 a4 : 0000000000000000
[ 0.115339] a5 : 0000000000000000 a6 : 000000008f58f0c6 a7 : 000000000000001d
[ 0.115348] s2 : 000000008f8c6308 s3 : 000000008f78b7c8 s4 : 000000008fb834c0
[ 0.115357] s5 : 0000000000005413 s6 : 0000000000000000 s7 : 000000008f58f2b0
[ 0.115366] s8 : 000000008f858008 s9 : 000000008f776818 s10: 000000008f776830
[ 0.115375] s11: 000000008fb840a8 t3 : 1999999999999999 t4 : 000000008f78704c
[ 0.115384] t5 : 0000000000000005 t6 : 0000000000000002
[ 0.115391] status: 0000000200001880 badaddr: 000000008f8c63ec cause: 0000000000000004
[ 0.115401] ---[ end trace 00d490c6a8b6c9ac ]---
This failure could be fixed after this patch applied.
[ 0.002282] Run /init as init process
Initializing random number generator... [ 0.005573] random: dd: uninitialized urandom read (512 bytes read)
done.
Welcome to Buildroot
buildroot login: root
Password:
Jan 1 00:00:00 login[62]: root login on 'ttySIF0'
~ #
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The kernel runs in M-mode without using page tables, and thus can't run
bare metal without help from additional firmware.
Most of the patch is just stubbing out code not needed without page
tables, but there is an interesting detail in the signals implementation:
- The normal RISC-V syscall ABI only implements rt_sigreturn as VDSO
entry point, but the ELF VDSO is not supported for nommu Linux.
We instead copy the code to call the syscall onto the stack.
In addition to enabling the nommu code a new defconfig for a small
kernel image that can run in nommu mode on qemu is also provided, to run
a kernel in qemu you can use the following command line:
qemu-system-riscv64 -smp 2 -m 64 -machine virt -nographic \
-kernel arch/riscv/boot/loader \
-drive file=rootfs.ext2,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0
Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
[paul.walmsley@sifive.com: updated to apply; add CONFIG_MMU guards
around PCI_IOBASE definition to fix build issues; fixed checkpatch
issues; move the PCI_IO_* and VMEMMAP address space macros along
with the others; resolve sparse warning]
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
This should never have landed in the first place: it was added as part
of 64-bit divide support for 32-bit systems, but the kernel doesn't
allow this sort of division. I must have forgotten to remove it.
This patch removes the support. Since this routine only worked on
64-bit platforms but was only built on 32-bit platforms, it's
essentially just nonsense anyway.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Link: https://lore.kernel.org/linux-riscv/nycvar.YSQ.7.76.1908061413360.19480@knanqh.ubzr/T/#t
Reported-by: Eric Lin <tesheng@andestech.com>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
|
|
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fixes the following build error from tinyconfig:
riscv64-unknown-linux-gnu-ld: kernel/sched/fair.o: in function `.L8':
fair.c:(.text+0x70): undefined reference to `__lshrti3'
riscv64-unknown-linux-gnu-ld: kernel/time/clocksource.o: in function `.L0 ':
clocksource.c:(.text+0x334): undefined reference to `__lshrti3'
Fixes: 7f47c73b355f ("RISC-V: Build tishift only on 64-bit")
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
Only RV64 supports 128 integer size.
Signed-off-by: Zong Li <zong@andestech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
Signed-off-by: Alex Guo <xfguo@jlsemi.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
|
|
This patch contains all the build infrastructure that actually enables
the RISC-V port. This includes Makefiles, linker scripts, and Kconfig
files. It also contains the only top-level change, which adds RISC-V to
the list of architectures that need a sed run to produce the ARCH
variable when building locally.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|