Age | Commit message (Collapse) | Author |
|
These don't do anything anymore, the only user of the symbol was
VFIO_CCW/AP which already "depends on VFIO" and VFIO itself selects
IOMMU_API.
When this was added VFIO was wrongly doing "depends on IOMMU_API" which
required some contortions like this to ensure IOMMU_API was turned on.
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v2-eb322ce2e547+188f-rm_iommu_ccw_jgg@nvidia.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Commit f1045056c726 ("topology/sysfs: rework book and drawer topology
ifdefery") activates the book and drawer topology, previously activated by
CONFIG_SCHED_{BOOK,DRAWER}, dependent on the existence of certain macro
definitions. Hence, since then, CONFIG_SCHED_{BOOK,DRAWER} have no effect
and any further purpose.
Remove the obsolete configs SCHED_{BOOK,DRAWER}.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20230508040916.16733-1-lukas.bulwahn@gmail.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Commit 0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") makes
config GCC11_NO_ARRAY_BOUNDS to be for disabling -Warray-bounds in any gcc
version 11 and upwards, and with that, removes the GCC12_NO_ARRAY_BOUNDS
config as it is now covered by the semantics of GCC11_NO_ARRAY_BOUNDS.
As GCC11_NO_ARRAY_BOUNDS is yes by default, there is no need for the s390
architecture to explicitly select GCC11_NO_ARRAY_BOUNDS. Hence, the select
GCC12_NO_ARRAY_BOUNDS in arch/s390/Kconfig can simply be dropped.
Remove the unneeded "select GCC12_NO_ARRAY_BOUNDS".
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for stackleak feature. Also allow specifying
architecture-specific stackleak poison function to enable faster
implementation. On s390, the mvc-based implementation helps decrease
typical overhead from a factor of 3 to just 25%
- Convert all assembler files to use SYM* style macros, deprecating the
ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS
- Improve KASLR to also randomize module and special amode31 code base
load addresses
- Rework decompressor memory tracking to support memory holes and
improve error handling
- Add support for protected virtualization AP binding
- Add support for set_direct_map() calls
- Implement set_memory_rox() and noexec module_alloc()
- Remove obsolete overriding of mem*() functions for KASAN
- Rework kexec/kdump to avoid using nodat_stack to call purgatory
- Convert the rest of the s390 code to use flexible-array member
instead of a zero-length array
- Clean up uaccess inline asm
- Enable ARCH_HAS_MEMBARRIER_SYNC_CORE
- Convert to using CONFIG_FUNCTION_ALIGNMENT and enable
DEBUG_FORCE_FUNCTION_ALIGN_64B
- Resolve last_break in userspace fault reports
- Simplify one-level sysctl registration
- Clean up branch prediction handling
- Rework CPU counter facility to retrieve available counter sets just
once
- Other various small fixes and improvements all over the code
* tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits)
s390/stackleak: provide fast __stackleak_poison() implementation
stackleak: allow to specify arch specific stackleak poison function
s390: select ARCH_USE_SYM_ANNOTATIONS
s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc()
s390: wire up memfd_secret system call
s390/mm: enable ARCH_HAS_SET_DIRECT_MAP
s390/mm: use BIT macro to generate SET_MEMORY bit masks
s390/relocate_kernel: adjust indentation
s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc.
s390/entry: use SYM* macros instead of ENTRY(), etc.
s390/purgatory: use SYM* macros instead of ENTRY(), etc.
s390/kprobes: use SYM* macros instead of ENTRY(), etc.
s390/reipl: use SYM* macros instead of ENTRY(), etc.
s390/head64: use SYM* macros instead of ENTRY(), etc.
s390/earlypgm: use SYM* macros instead of ENTRY(), etc.
s390/mcount: use SYM* macros instead of ENTRY(), etc.
s390/crc32le: use SYM* macros instead of ENTRY(), etc.
s390/crc32be: use SYM* macros instead of ENTRY(), etc.
s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc.
s390/amode31: use SYM* macros instead of ENTRY(), etc.
...
|
|
All old style assembly annotations have been converted for s390. Select
ARCH_USE_SYM_ANNOTATIONS to make sure the old macros like ENTRY() aren't
available anymore. This prevents that new code which uses the old macros
will be added again.
This follows what has been done for x86 with commit 2ce0d7f9766f ("x86/asm:
Provide a Kconfig symbol for disabling old assembly annotations") and for
arm64 with commit 50479d58eaa3 ("arm64: Disable old style assembly
annotations").
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Implement the set_direct_map_*() API, which allows to invalidate and set
default permissions to pages within the direct mapping.
Note that kernel_page_present(), which is also supposed to be part of this
API, is intentionally not implemented. The reason for this is that
kernel_page_present() is only used (and currently only makes sense) for
suspend/resume, which isn't supported on s390.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to
indicate devdax and hugetlb vmemmap optimization support. Hence rename
that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP
Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Tarun Sahu <tsahu@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for
KASAN") switched s390 to use the generic checksum functions, so that KASAN
instrumentation also works checksum functions by avoiding architecture
specific inline assemblies.
There is however the problem that the generic csum_partial() function
returns a 32 bit value with a 16 bit folded checksum, while the original
s390 variant does not fold to 16 bit. This in turn causes that the
ipib_checksum in lowcore contains different values depending on kernel
config options.
The ipib_checksum is used by system dumpers to verify if pointers in
lowcore point to valid data. Verification is done by comparing checksum
values. The system dumpers still use 32 bit checksum values which are not
folded, and therefore the checksum verification fails (incorrectly).
Symptom is that reboot after dump does not work anymore when a KASAN
instrumented kernel is dumped.
Fix this by not using the generic checksum implementation. Instead add an
explicit kasan_check_read() so that KASAN knows about the read access from
within the inline assembly.
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Fixes: dfe843dce775 ("s390/checksum: support GENERIC_CSUM, enable it for KASAN")
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.
This is the s390 variant of "x86/mm: try VMA lock-based page fault handling
first".
Link: https://lkml.kernel.org/r/20230314132808.1266335-1-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add support for the stackleak feature. Whenever the kernel returns to user
space the kernel stack is filled with a poison value.
Enabling this feature is quite expensive: e.g. after instrumenting the
getpid() system call function to have a 4kb stack the result is an
increased runtime of the system call by a factor of 3.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
s390 trivially supports the ARCH_HAS_MEMBARRIER_SYNC_CORE requirements
since the used lpswe(y) instruction to return from any kernel context to
user space performs CPU serialization. This is very similar to arm, arm64
and powerpc.
See commit 70216e18e519 ("membarrier: Provide core serializing command,
*_SYNC_CORE") for further details.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make use of CONFIG_FUNCTION_ALIGNMENT which was introduced with commit
d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT").
Select FUNCTION_ALIGNMENT_8B for gcc in order to reflect gcc's default
function alignment. For all other compilers, which is only clang, select
a function alignment of 16 bytes which reflects the default function
alignment for clang.
Also change the __ALIGN define to follow whatever the value of
CONFIG_FUNCTION_ALIGNMENT is. This makes sure that the alignment of C and
assembler functions is the same.
In result everything still uses the default function alignment for both
compilers. However in addition this is now also true for all assembly
functions, so that all functions have a consistent alignment.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Add empty command line parameter handling stubs to kernel for all
command line parameters which are handled in the decompressor. This
avoids invalid "Unknown kernel command line parameters" messages from
the kernel, and also avoids that these will be incorrectly passed to
user space. This caused already confusion, therefore add the empty
stubs
- Add missing phys_to_virt() handling to machine check handler
- Introduce and use a union to be used for zcrypt inline assemblies.
This makes sure that only a register wide member of the union is
passed as input and output parameter to inline assemblies, while
usual C code uses other members of the union to access bit fields of
it
- Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to
atomically read a 128-bit value from memory. This replaces the
(mis-)use of the 128-bit cmpxchg operation to do the same in cpum_sf
code. Currently gcc does not generate the used lpq instruction if
__READ_ONCE() is used for aligned 128-bit accesses, therefore use
this s390 specific helper
- Simplify machine check handler code if a task needs to be killed
because of e.g. register corruption due to a machine malfunction
- Perform CPU reset to clear pending interrupts and TLB entries on an
already stopped target CPU before delegating work to it
- Generate arch/s390/boot/vmlinux.map link map for the decompressor,
when CONFIG_VMLINUX_MAP is enabled for debugging purposes
- Fix segment type handling for dcssblk devices. It incorrectly always
returned type "READ/WRITE" even for read-only segements, which can
result in a kernel panic if somebody tries to write to a read-only
device
- Sort config S390 select list again
- Fix two kprobe reenter bugs revealed by a recently added kprobe kunit
test
* tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kprobes: fix current_kprobe never cleared after kprobes reenter
s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
s390/Kconfig: sort config S390 select list again
s390/extmem: return correct segment type in __segment_load()
s390/decompressor: add link map saving
s390/smp: perform cpu reset before delegating work to target cpu
s390/mcck: cleanup user process termination path
s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
s390/rwonce: add READ_ONCE_ALIGNED_128() macro
s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union
s390/nmi: fix virtual-physical address confusion
s390/setup: do not complain about parameters handled in decompressor
|
|
Keep the config S390 select list sorted.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Pull VFIO updates from Alex Williamson:
- Remove redundant resource check in vfio-platform (Angus Chen)
- Use GFP_KERNEL_ACCOUNT for persistent userspace allocations, allowing
removal of arbitrary kernel limits in favor of cgroup control (Yishai
Hadas)
- mdev tidy-ups, including removing the module-only build restriction
for sample drivers, Kconfig changes to select mdev support,
documentation movement to keep sample driver usage instructions with
sample drivers rather than with API docs, remove references to
out-of-tree drivers in docs (Christoph Hellwig)
- Fix collateral breakages from mdev Kconfig changes (Arnd Bergmann)
- Make mlx5 migration support match device support, improve source and
target flows to improve pre-copy support and reduce downtime (Yishai
Hadas)
- Convert additional mdev sysfs case to use sysfs_emit() (Bo Liu)
- Resolve copy-paste error in mdev mbochs sample driver Kconfig (Ye
Xingchen)
- Avoid propagating missing reset error in vfio-platform if reset
requirement is relaxed by module option (Tomasz Duszynski)
- Range size fixes in mlx5 variant driver for missed last byte and
stricter range calculation (Yishai Hadas)
- Fixes to suspended vaddr support and locked_vm accounting, excluding
mdev configurations from the former due to potential to indefinitely
block kernel threads, fix underflow and restore locked_vm on new mm
(Steve Sistare)
- Update outdated vfio documentation due to new IOMMUFD interfaces in
recent kernels (Yi Liu)
- Resolve deadlock between group_lock and kvm_lock, finally (Matthew
Rosato)
- Fix NULL pointer in group initialization error path with IOMMUFD (Yan
Zhao)
* tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfio: (32 commits)
vfio: Fix NULL pointer dereference caused by uninitialized group->iommufd
docs: vfio: Update vfio.rst per latest interfaces
vfio: Update the kdoc for vfio_device_ops
vfio/mlx5: Fix range size calculation upon tracker creation
vfio: no need to pass kvm pointer during device open
vfio: fix deadlock between group lock and kvm lock
vfio: revert "iommu driver notify callback"
vfio/type1: revert "implement notify callback"
vfio/type1: revert "block on invalid vaddr"
vfio/type1: restore locked_vm
vfio/type1: track locked_vm per dma
vfio/type1: prevent underflow of locked_vm via exec()
vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
vfio: platform: ignore missing reset if disabled at module init
vfio/mlx5: Improve the target side flow to reduce downtime
vfio/mlx5: Improve the source side flow upon pre_copy
vfio/mlx5: Check whether VF is migratable
samples: fix the prompt about SAMPLE_VFIO_MDEV_MBOCHS
vfio/mdev: Use sysfs_emit() to instead of sprintf()
vfio-mdev: add back CONFIG_VFIO dependency
...
|
|
CONFIG_VFIO_MDEV cannot be selected when VFIO itself is
disabled, otherwise we get a link failure:
WARNING: unmet direct dependencies detected for VFIO_MDEV
Depends on [n]: VFIO [=n]
Selected by [y]:
- SAMPLE_VFIO_MDEV_MTTY [=y] && SAMPLES [=y]
- SAMPLE_VFIO_MDEV_MDPY [=y] && SAMPLES [=y]
- SAMPLE_VFIO_MDEV_MBOCHS [=y] && SAMPLES [=y]
/home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_remove':
mdpy.c:(.text+0x1e1): undefined reference to `vfio_unregister_group_dev'
/home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_probe':
mdpy.c:(.text+0x149e): undefined reference to `_vfio_alloc_device'
Fixes: 8bf8c5ee1f38 ("vfio-mdev: turn VFIO_MDEV into a selectable symbol")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230126211211.1762319-1-arnd@kernel.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
VFIO_MDEV is just a library with helpers for the drivers. Stop making
it a user choice and just select it by the drivers that use the helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20230110091009.474427-3-hch@lst.de
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
That's an adaptation of commit f3a112c0c40d ("x86,rethook,kprobes:
Replace kretprobe with rethook on x86") to s390.
Replaces the kretprobe code with rethook on s390. With this patch,
kretprobe on s390 uses the rethook instead of kretprobe specific
trampoline code.
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Factor out handle_write() function and simplify 3215 console write
operation
- When 3170 terminal emulator is connected to the 3215 console driver
the boot time could be very long due to limited buffer space or
missing operator input. Add con3215_drop command line parameter and
con3215_drop sysfs attribute file to instruct the kernel drop console
data when such conditions are met
- Fix white space errors in 3215 console driver
- Move enum paiext_mode definition to a header file and rename it to
paievt_mode to indicate this is now used for several events. Rename
PAI_MODE_COUNTER to PAI_MODE_COUNTING to make consistent with
PAI_MODE_SAMPLING
- Simplify the logic of PMU pai_crypto mapped buffer reference counter
and make it consistent with PMU pai_ext
- Rename PMU pai_crypto mapped buffer structure member users to
active_events to make it consistent with PMU pai_ext
- Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP configuration option. This
results in saving of 12K per 1M hugetlb page (~1.2%) and 32764K per
2G hugetlb page (~1.6%)
- Use generic serial.h, bugs.h, shmparam.h and vga.h header files and
scrap s390-specific versions
- The generic percpu setup code does not expect the s390-like
implementation and emits a warning. To get rid of that warning and
provide sane CPU-to-node and CPU-to-CPU distance mappings implementat
a minimal version of setup_per_cpu_areas()
- Use kstrtobool() instead of strtobool() for re-IPL sysfs device
attributes
- Avoid unnecessary lookup of a pointer to MSI descriptor when setting
IRQ affinity for a PCI device
- Get rid of "an incompatible function type cast" warning by changing
debug_sprintf_format_fn() function prototype so it matches the
debug_format_proc_t function type
- Remove unused info_blk_hdr__pcpus() and get_page_state() functions
- Get rid of clang "unused unused insn cache ops function" warning by
moving s390_insn definition to a private header
- Get rid of clang "unused function" warning by making function
raw3270_state_final() only available if CONFIG_TN3270_CONSOLE is
enabled
- Use kstrobool() to parse sclp_con_drop parameter to make it identical
to the con3215_drop parameter and allow passing values like "yes" and
"true"
- Use sysfs_emit() for all SCLP sysfs show functions, which is the
current standard way to generate output strings
- Make SCLP con_drop sysfs attribute also writable and allow to change
its value during runtime. This makes SCLP console drop handling
consistent with the 3215 device driver
- Virtual and physical addresses are indentical on s390. However, there
is still a confusion when pointers are directly casted to physical
addresses or vice versa. Use correct address converters
virt_to_phys() and phys_to_virt() for s390 channel IO drivers
- Support for power managemant has been removed from s390 since quite
some time. Remove unused power managemant code from the appldata
device driver
- Allow memory tools like KASAN see memory accesses from the checksum
code. Switch to GENERIC_CSUM if KASAN is enabled, just like x86 does
- Add support of ECKD DASDs disks so it could be used as boot and dump
devices
- Follow checkpatch recommendations and use octal values instead of
S_IRUGO and S_IWUSR for dump device attributes in sysfs
- Changes to vx-insn.h do not cause a recompile of C files that use
asm(".include \"asm/vx-insn.h\"\n") magic to access vector
instruction macros from inline assemblies. Add wrapper include header
file to avoid this problem
- Use vector instruction macros instead of byte patterns to increase
register validation routine readability
- The current machine check register validation handling does not take
into account various scenarios and might lead to killing a wrong user
process or potentially ignore corrupted FPU registers. Simplify logic
of the machine check handler and stop the whole machine if the
previous context was kerenel mode. If the previous context was user
mode, kill the current task
- Introduce sclp_emergency_printk() function which can be used to emit
a message in emergency cases. It is supposed to be used in cases
where regular console device drivers may not work anymore, e.g.
unrecoverable machine checks
Keep the early Service-Call Control Block so it can also be used
after initdata has been freed to allow sclp_emergency_printk()
implementation
- In case a system will be stopped because of an unrecoverable machine
check error print the machine check interruption code to give a hint
of what went wrong
- Move storage error checking from the assembly entry code to C in
order to simplify machine check handling. Enter the handler with DAT
turned on, which simplifies the entry code even more
- The machine check extended save areas are allocated using a private
"nmi_save_areas" slab cache which guarantees a required power-of-two
alignment. Get rid of that cache in favour of kmalloc()
* tag 's390-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (38 commits)
s390/nmi: get rid of private slab cache
s390/nmi: move storage error checking back to C, enter with DAT on
s390/nmi: print machine check interruption code before stopping system
s390/sclp: introduce sclp_emergency_printk()
s390/sclp: keep sclp_early_sccb
s390/nmi: rework register validation handling
s390/nmi: use vector instruction macros instead of byte patterns
s390/vx: add vx-insn.h wrapper include file
s390/ipl: use octal values instead of S_* macros
s390/ipl: add eckd dump support
s390/ipl: add eckd support
vfio/ccw: identify CCW data addresses as physical
vfio/ccw: sort out physical vs virtual pointers usage
s390/checksum: support GENERIC_CSUM, enable it for KASAN
s390/appldata: remove power management callbacks
s390/cio: sort out physical vs virtual pointers usage
s390/sclp: allow to change sclp_console_drop during runtime
s390/sclp: convert to use sysfs_emit()
s390/sclp: use kstrobool() to parse sclp_con_drop parameter
s390/3270: make raw3270_state_final() depend on CONFIG_TN3270_CONSOLE
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates. This is the second in a series from an ongoing
review of the RCU documentation.
- Miscellaneous fixes.
- Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU
that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument
CPU lists, causes call_rcu() to introduce delays.
These delays result in significant power savings on nearly idle
Android and ChromeOS systems. These savings range from a few percent
to more than ten percent.
This series also includes several commits that change call_rcu() to a
new call_rcu_hurry() function that avoids these delays in a few
cases, for example, where timely wakeups are required. Several of
these are outside of RCU and thus have acks and reviews from the
relevant maintainers.
- Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe()
for architectures that support NMIs, but which do not provide
NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required
by the upcoming lockless printk() work by John Ogness et al.
- Changes providing minor but important increases in torture test
coverage for the new RCU polled-grace-period APIs.
- Changes to torturescript that avoid redundant kernel builds, thus
providing about a 30% speedup for the torture.sh acceptance test.
* tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
net: devinet: Reduce refcount before grace period
net: Use call_rcu_hurry() for dst_release()
workqueue: Make queue_rcu_work() use call_rcu_hurry()
percpu-refcount: Use call_rcu_hurry() for atomic switch
scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
rcu/rcutorture: Use call_rcu_hurry() where needed
rcu/rcuscale: Use call_rcu_hurry() for async reader test
rcu/sync: Use call_rcu_hurry() instead of call_rcu
rcuscale: Add laziness and kfree tests
rcu: Shrinker for lazy rcu
rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
rcu: Make call_rcu() lazy to save power
rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
srcu: Debug NMI safety even on archs that don't require it
srcu: Explain the reason behind the read side critical section on GP start
srcu: Warn when NMI-unsafe API is used in NMI
arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
rcu-tasks: Make grace-period-age message human-readable
...
|
|
This is the s390 variant of commit d911c67e10b4 ("x86: kasan: kmsan:
support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN"). Even
though most of the s390 specific checksum code is written in C there is
still the csum_partial() inline assembly which could prevent KASAN and
KMSAN from seeing all memory accesses.
Therefore switch to GENERIC_CSUM if KASAN is enabled just like x86.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
s390 allows to enable CONFIG_NUMA, mainly to enable a couple of system
calls which are only present if NUMA is enabled. The NUMA specific system
calls are required by a couple of applications, which wouldn't work if the
system calls wouldn't be present.
The NUMA implementation itself maps all CPUs and memory to node 0. A
special case is the generic percpu setup code, which doesn't expect an s390
like implementation and therefore emits a message/warning:
"percpu: cpu 0 has no node -1 or node-local memory".
In order to get rid of this message, and also to provide sane CPU to node
and CPU distance mappings implement a minimal setup_per_cpu_areas()
function, which is very close to the generic variant.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Enable HUGETLB_PAGE_OPTIMIZE_VMEMMAP for s390.
With this, vmemmap pages used to back struct pages for compound tail
pages of hugetlb pages are freed and remapped to compound head page
frame as RO, see also Documentation/vm/vmemmap_dedup.rst.
For 1M hugetlb pages, this results in freeing 3 of 4 vmemmap pages,
saving 12K of memory for each 1M hugetlb page (~1.2%).
/sys/kernel/debug/kernel_page_tables will show the impact:
---[ vmemmap Area Start ]---
[...]
0x0000037202d84000-0x0000037202d85000 4K PTE RW NX
0x0000037202d85000-0x0000037202d88000 12K PTE RO NX
For 2G hugetlb pages, this results in freeing 8191 of 8192 vmemmap
pages, saving 32764K of memory for each 2G hugetlb page (~1.6%)
/sys/kernel/debug/kernel_page_tables will show the impact:
---[ vmemmap Area Start ]---
[...]
0x000003720a000000-0x000003720a001000 4K PTE RW NX
0x000003720a001000-0x000003720c000000 32764K PTE RO NX
The memory savings come with some costs:
- vmemmap mapping for compound hugetlb pages is not a PMD mapping any
more, but split to 4K PTE mappings, and it will not be coalesced back
to PMD mapping after freeing hugetlb pages from the pool.
Apart from theoretical performance impact, this will also (slightly)
relativize the memory savings because of additional 2K PTE pagetable
allocations.
- Workload using "on the fly" hugetlb allocations via
"nr_overcommit_hugepages" instead of using the hugetlb pool via
"nr_hugepages" will suffer from considerably increased fault handling
time, see also description from commit 78f39084b41d
("mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl").
- Freeing hugetlb pages from the pool will require re-allocation of the
freed struct pages, and therefore needs some memory available to the
kernel. This might fail in memory constrained scenarios.
- For the same reason, memory offline might fail even for ZONE_MOVABLE
when hugetlb pages are present (but not for s390, since we do not
support ARCH_ENABLE_HUGEPAGE_MIGRATION, and therefore cannot have
hugetlb pages in ZONE_MOVABLE).
- General increased complexity and overhead in kernel handling of
compound (head) pages.
Therefore, this feature is disabled by default, and has to be enabled
explicitly either by adding "hugetlb_free_vmemmap=on" kernel parameter,
or during run-time via "/proc/sys/vm/hugetlb_optimize_vmemmap" sysctl.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Nathan Chancellor reported several link errors on s390 with
CONFIG_RELOCATABLE disabled, after binutils commit 906f69cf65da ("IBM
zSystems: Issue error for *DBL relocs on misaligned symbols"). The binutils
commit reveals potential miscompiles that might have happened already
before with linker script defined symbols at odd addresses.
A similar bug was recently fixed in the kernel with commit c9305b6c1f52
("s390: fix nospec table alignments").
See https://github.com/ClangBuiltLinux/linux/issues/1747 for an analysis
from Ulich Weigand.
Therefore always build a relocatable kernel to avoid this problem. There is
hardly any use-case for non-relocatable kernels, so this shouldn't be
controversial.
Link: https://github.com/ClangBuiltLinux/linux/issues/1747
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20221030182202.2062705-1-hca@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The s390 architecture uses either a cmpxchg loop (old systems)
or the laa add-to-memory instruction (new systems) to implement
this_cpu_add(), both of which are NMI safe. This means that the old
and more-efficient srcu_read_lock() may be used in NMI context, without
the need for srcu_read_lock_nmisafe(). Therefore, add the new Kconfig
option ARCH_HAS_NMI_SAFE_THIS_CPU_OPS to arch/s390/Kconfig, which will
cause NEED_SRCU_NMI_SAFE to be deselected, thus preserving the current
srcu_read_lock() behavior.
[ paulmck: Apply Christian Borntraeger feedback. ]
Link: https://lore.kernel.org/all/20220910221947.171557773@linutronix.de/
Suggested-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: <linux-s390@vger.kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Though there's been a decent amount of RNG-related development during
this last cycle, not all of it is coming through this tree, as this
cycle saw a shift toward tackling early boot time seeding issues,
which took place in other trees as well.
Here's a summary of the various patches:
- The CONFIG_ARCH_RANDOM .config option and the "nordrand" boot
option have been removed, as they overlapped with the more widely
supported and more sensible options, CONFIG_RANDOM_TRUST_CPU and
"random.trust_cpu". This change allowed simplifying a bit of arch
code.
- x86's RDRAND boot time test has been made a bit more robust, with
RDRAND disabled if it's clearly producing bogus results. This would
be a tip.git commit, technically, but I took it through random.git
to avoid a large merge conflict.
- The RNG has long since mixed in a timestamp very early in boot, on
the premise that a computer that does the same things, but does so
starting at different points in wall time, could be made to still
produce a different RNG state. Unfortunately, the clock isn't set
early in boot on all systems, so now we mix in that timestamp when
the time is actually set.
- User Mode Linux now uses the host OS's getrandom() syscall to
generate a bootloader RNG seed and later on treats getrandom() as
the platform's RDRAND-like faculty.
- The arch_get_random_{seed_,}_long() family of functions is now
arch_get_random_{seed_,}_longs(), which enables certain platforms,
such as s390, to exploit considerable performance advantages from
requesting multiple CPU random numbers at once, while at the same
time compiling down to the same code as before on platforms like
x86.
- A small cleanup changing a cmpxchg() into a try_cmpxchg(), from
Uros.
- A comment spelling fix"
More info about other random number changes that come in through various
architecture trees in the full commentary in the pull request:
https://lore.kernel.org/all/20220731232428.2219258-1-Jason@zx2c4.com/
* tag 'random-6.0-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: correct spelling of "overwrites"
random: handle archrandom with multiple longs
um: seed rng using host OS rng
random: use try_cmpxchg in _credit_init_bits
timekeeping: contribute wall clock to rng on time change
x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"
random: remove CONFIG_ARCH_RANDOM
|
|
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma().
Provide two new MMU_GATHER_knobs to enumerate them and remove the per
arch tlb_{start,end}_vma() implementations.
- MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range()
but does *NOT* want to call it for each VMA.
- MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the
invalidate across multiple VMAs if possible.
With these it is possible to capture the three forms:
1) empty stubs;
select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS
2) start: flush_cache_range(), end: empty;
select MMU_GATHER_MERGE_VMAS
3) start: flush_cache_range(), end: flush_tlb_range();
default
Obviously, if the architecture does not have flush_cache_range() then
it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When RDRAND was introduced, there was much discussion on whether it
should be trusted and how the kernel should handle that. Initially, two
mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and
"nordrand", a boot-time switch.
Later the thinking evolved. With a properly designed RNG, using RDRAND
values alone won't harm anything, even if the outputs are malicious.
Rather, the issue is whether those values are being *trusted* to be good
or not. And so a new set of options were introduced as the real
ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu".
With these options, RDRAND is used, but it's not always credited. So in
the worst case, it does nothing, and in the best case, maybe it helps.
Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the
center and became something certain platforms force-select.
The old options don't really help with much, and it's a bit odd to have
special handling for these instructions when the kernel can deal fine
with the existence or untrusted existence or broken existence or
non-existence of that CPU capability.
Simplify the situation by removing CONFIG_ARCH_RANDOM and using the
ordinary asm-generic fallback pattern instead, keeping the two options
that are actually used. For now it leaves "nordrand" for now, as the
removal of that will take a different route.
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Since commit 4c0f032d4963 ("s390/purgatory: Omit use of bin2c"),
s390 builds the purgatory without using bin2c.
Remove 'select BUILD_BIN2C' to avoid the unneeded build of bin2c.
Fixes: 4c0f032d4963 ("s390/purgatory: Omit use of bin2c")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20220613170902.1775211-1-masahiroy@kernel.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- a small cleanup removing "export" of an __init function
- a small series adding a new infrastructure for platform flags
- a series adding generic virtio support for Xen guests (frontend side)
* tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: unexport __init-annotated xen_xlate_map_ballooned_pages()
arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices
xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices
xen/grant-dma-iommu: Introduce stub IOMMU driver
dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops
xen/virtio: Enable restricted memory access using Xen grant mappings
xen/grant-dma-ops: Add option to restrict memory access under Xen
xen/grants: support allocating consecutive grants
arm/xen: Introduce xen_setup_dma_ops()
virtio: replace arch_has_restricted_virtio_memory_access()
kernel: add platform_has() infrastructure
|
|
In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people
disabled the '-Warray-bounds' warning for gcc-12, because the new logic
in gcc would cause warnings for their use of the S390_lowcore macro,
which accesses absolute pointers.
It turns out gcc-12 has many other issues in this area, so this takes
that s390 warning disable logic, and turns it into a kernel build config
entry instead.
Part of the intent is that we can make this all much more targeted, and
use this conflig flag to disable it in only particular configurations
that cause problems, with the s390 case as an example:
select GCC12_NO_ARRAY_BOUNDS
and we could do that for other configuration cases that cause issues.
Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more
targeted way, and disable the warning only for particular uses: again
the s390 case as an example:
KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds)
but this ends up just doing it globally in the top-level Makefile, since
the current issues are spread fairly widely all over:
KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
We'll try to limit this later, since the gcc-12 problems are rare enough
that *much* of the kernel can be built with it without disabling this
warning.
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Instead of using arch_has_restricted_virtio_memory_access() together
with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those
with platform_has() and a new platform feature
PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
"Just a couple of small improvements, bug fixes and cleanups:
- Add Eric Farman as maintainer for s390 virtio drivers.
- Improve machine check handling, and avoid incorrectly injecting a
machine check into a kvm guest.
- Add cond_resched() call to gmap page table walker in order to avoid
possible huge latencies. Also use non-quiesing sske instruction to
speed up storage key handling.
- Add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP so s390 behaves
similar like common code.
- Get sie control block address from correct stack slot in perf event
code. This fixes potential random memory accesses.
- Change uaccess code so that the exception handler sets the result
of get_user() and __get_kernel_nofault() to zero in case of a
fault. Until now this was done via input parameters for inline
assemblies. Doing it via fault handling is what most or even all
other architectures are doing.
- Couple of other small cleanups and fixes"
* tag 's390-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/stack: add union to reflect kvm stack slot usages
s390/stack: merge empty stack frame slots
s390/uaccess: whitespace cleanup
s390/uaccess: use __noreturn instead of __attribute__((noreturn))
s390/uaccess: use exception handler to zero result on get_user() failure
s390/uaccess: use symbolic names for inline assembler operands
s390/mcck: isolate SIE instruction when setting CIF_MCCK_GUEST flag
s390/mm: use non-quiescing sske for KVM switch to keyed guest
s390/gmap: voluntarily schedule during key setting
MAINTAINERS: Update s390 virtio-ccw
s390/kexec: add __GFP_NORETRY to KEXEC_CONTROL_MEMORY_GFP
s390/Kconfig.debug: fix indentation
s390/Kconfig: fix indentation
s390/perf: obtain sie_block from the right address
s390: generate register offsets into pt_regs automatically
s390: simplify early program check handler
s390/crypto: fix scatterwalk_unmap() callers in AES-GCM
|
|
The convention for indentation seems to be a single tab. Help text is
further indented by an additional two whitespaces. Fix the lines that
violate these rules.
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Link: https://lore.kernel.org/r/20220525120140.39534-1-juerg.haefliger@canonical.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the Svpbmt extension, which allows memory attributes to
be encoded in pages
- Support for the Allwinner D1's implementation of page-based memory
attributes
- Support for running rv32 binaries on rv64 systems, via the compat
subsystem
- Support for kexec_file()
- Support for the new generic ticket-based spinlocks, which allows us
to also move to qrwlock. These should have already gone in through
the asm-geneic tree as well
- A handful of cleanups and fixes, include some larger ones around
atomics and XIP
* tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
riscv: compat: Using seperated vdso_maps for compat_vdso_info
RISC-V: Fix the XIP build
RISC-V: Split out the XIP fixups into their own file
RISC-V: ignore xipImage
RISC-V: Avoid empty create_*_mapping definitions
riscv: Don't output a bogus mmu-type on a no MMU kernel
riscv: atomic: Add custom conditional atomic operation implementation
riscv: atomic: Optimize dec_if_positive functions
riscv: atomic: Cleanup unnecessary definition
RISC-V: Load purgatory in kexec_file
RISC-V: Add purgatory
RISC-V: Support for kexec_file on panic
RISC-V: Add kexec_file support
RISC-V: use memcpy for kexec_file mode
kexec_file: Fix kexec_file.c build error for riscv platform
riscv: compat: Add COMPAT Kbuild skeletal support
riscv: compat: ptrace: Add compat_arch_ptrace implement
riscv: compat: signal: Add rt_frame implementation
riscv: add memory-type errata for T-Head
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add HOSTPKG_CONFIG env variable to allow users to override pkg-config
- Support W=e as a shorthand for KCFLAGS=-Werror
- Fix CONFIG_IKHEADERS build to support toybox cpio
- Add scripts/dummy-tools/pahole to ease distro packagers' life
- Suppress false-positive warnings from checksyscalls.sh for W=2 build
- Factor out the common code of arch/*/boot/install.sh into
scripts/install.sh
- Support 'kernel-install' tool in scripts/prune-kernel
- Refactor module-versioning to link the symbol versions at the final
link of vmlinux and modules
- Remove CONFIG_MODULE_REL_CRCS because module-versioning now works in
an arch-agnostic way
- Refactor modpost, Makefiles
* tag 'kbuild-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (56 commits)
genksyms: adjust the output format to modpost
kbuild: stop merging *.symversions
kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS
modpost: extract symbol versions from *.cmd files
modpost: add sym_find_with_module() helper
modpost: change the license of EXPORT_SYMBOL to bool type
modpost: remove left-over cross_compile declaration
kbuild: record symbol versions in *.cmd files
kbuild: generate a list of objects in vmlinux
modpost: move *.mod.c generation to write_mod_c_files()
modpost: merge add_{intree_flag,retpoline,staging_flag} to add_header
scripts/prune-kernel: Use kernel-install if available
kbuild: factor out the common installation code into scripts/install.sh
modpost: split new_symbol() to symbol allocation and hash table addition
modpost: make sym_add_exported() always allocate a new symbol
modpost: make multiple export error
modpost: dump Module.symvers in the same order of modules.order
modpost: traverse the namespace_list in order
modpost: use doubly linked list for dump_lists
modpost: traverse unresolved symbols in order
...
|
|
include/{linux,asm-generic}/export.h defines a weak symbol, __crc_*
as a placeholder.
Genksyms writes the version CRCs into the linker script, which will be
used for filling the __crc_* symbols. The linker script format depends
on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset
to the reference of CRC.
It is time to get rid of this complexity.
Now that modpost parses text files (.*.cmd) to collect all the CRCs,
it can generate C code that will be linked to the vmlinux or modules.
Generate a new C file, .vmlinux.export.c, which contains the CRCs of
symbols exported by vmlinux. It is compiled and linked to vmlinux in
scripts/link-vmlinux.sh.
Put the CRCs of symbols exported by modules into the existing *.mod.c
files. No additional build step is needed for modules. As before,
*.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal.
No linker magic is used here. The new C implementation works in the
same way, whether CONFIG_RELOCATABLE is enabled or not.
CONFIG_MODULE_REL_CRCS is no longer needed.
Previously, Kbuild invoked additional $(LD) to update the CRCs in
objects, but this step is unneeded too.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
|
|
The existing per-arch definitions are pretty much historic cruft.
Move SYSVIPC_COMPAT into init/Kconfig.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Link: https://lore.kernel.org/r/20220405071314.3225832-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add config and compile options which allow to compile with z16
optimizations if the compiler supports it.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Currently when unwinding starts from pt_regs or encounters pt_regs along
the way unwinder tries to yield 2 unwinding entries:
1. (reliable) ip1: pt_regs->psw.addr, sp1: regs->gprs[15]
2. (non-reliable) ip2: sp1->gprs[8] (r14), sp2: regs->gprs[15]
In case of kretprobes those are identical and serves no other purpose
than causing confusion over duplicated entries and cause kprobes tests
to fail. So, skip a duplicate non-reliable entry in this case.
With that kretprobes and unwinder implementation now comply with
ARCH_CORRECT_STACKTRACE_ON_KRETPROBE.
Reviewed-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Raise minimum supported machine generation to z10, which comes with
various cleanups and code simplifications (usercopy/spectre
mitigation/etc).
- Rework extables and get rid of anonymous out-of-line fixups.
- Page table helpers cleanup. Add set_pXd()/set_pte() helper functions.
Covert pte_val()/pXd_val() macros to functions.
- Optimize kretprobe handling by avoiding extra kprobe on
__kretprobe_trampoline.
- Add support for CEX8 crypto cards.
- Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
- Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
group sections which simplifies kpatch support.
- Always use the packed stack layout and extend kernel unwinder tests.
- Add sanity checks for ftrace code patching.
- Add s390dbf debug log for the vfio_ap device driver.
- Various virtual vs physical address confusion fixes.
- Various small fixes and improvements all over the code.
* tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits)
s390/test_unwind: add kretprobe tests
s390/kprobes: Avoid additional kprobe in kretprobe handling
s390: convert ".insn" encoding to instruction names
s390: assume stckf is always present
s390/nospec: move to single register thunks
s390: raise minimum supported machine generation to z10
s390/uaccess: Add copy_from/to_user_key functions
s390/nospec: align and size extern thunks
s390/nospec: add an option to use thunk-extern
s390/nospec: generate single register thunks if possible
s390/pci: make zpci_set_irq()/zpci_clear_irq() static
s390: remove unused expoline to BC instructions
s390/irq: use assignment instead of cast
s390/traps: get rid of magic cast for per code
s390/traps: get rid of magic cast for program interruption code
s390/signal: fix typo in comments
s390/asm-offsets: remove unused defines
s390/test_unwind: avoid build warning with W=1
s390: remove .fixup section
s390/bpf: encode register within extable entry
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
|
|
Machine generations up to z9 (released in May 2006) have been officially
out of service for several years now (z9 end of service - January 31, 2019).
No distributions build kernels supporting those old machine generations
anymore, except Debian, which seems to pick the oldest supported
generation. The team supporting Debian on s390 has been notified about
the change.
Raising minimum supported machine generation to z10 helps to reduce
maintenance cost and effectively remove code, which is not getting
enough testing coverage due to lack of older hardware and distributions
support. Besides that this unblocks some optimization opportunities and
allows to use wider instruction set in asm files for future features
implementation. Due to this change spectre mitigation and usercopy
implementations could be drastically simplified and many newer instructions
could be converted from ".insn" encoding to instruction names.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Currently with -mindirect-branch=thunk and -mfunction-return=thunk compiler
options expoline thunks are put into individual COMDAT group sections. s390
is the only architecture which has group sections and it has implications
for kpatch and objtool tools support.
Using -mindirect-branch=thunk-extern and -mfunction-return=thunk-extern
is an alternative, which comes with a need to generate all required
expoline thunks manually. Unfortunately modules area is too far away from
the kernel image, and expolines from the kernel image cannon be used.
But since all new distributions (except Debian) build kernels for machine
generations newer than z10, where "exrl" instruction is available, that
leaves only 16 expolines thunks possible.
Provide an option to build the kernel with
-mindirect-branch=thunk-extern and -mfunction-return=thunk-extern for
z10 or newer. This also requires to postlink expoline thunks into all
modules explicitly. Currently modules already contain most expolines
anyhow.
Unfortunately -mindirect-branch=thunk-extern and
-mfunction-return=thunk-extern options support is broken in gcc <= 11.2.
Additional compile test is required to verify proper gcc support.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Co-developed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
-mpacked-stack option has been supported by both minimum
gcc and clang versions for a while. With commit e2bc3e91d91e
("scripts/min-tool-version.sh: Raise minimum clang version to 13.0.0
for s390") minimum clang version now also supports a combination
of flags -mpacked-stack -mbackchain -pg -mfentry and fulfills
all requirements to always enable the packed stack layout.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether
an object that is about to be copied to/from userspace is overlapping
the stack at all. If it is, it performs a number of inexpensive
bounds checks. One of the finer-grained checks is whether an object
crosses stack frames within the stack region. Doing this on x86 with
CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too
heavy, and was left out (a while ago), leaving the courser whole-stack
check.
The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM
try to exercise these cross-frame cases to validate the defense is
working. They have been failing ever since ORC was added (which was
expected). While Muhammad was investigating various LKDTM failures[1],
he asked me for additional details on them, and I realized that when
exact stack frame boundary checking is not available (i.e. everything
except x86 with FRAME_POINTER), it could check if a stack object is at
least "current depth valid", in the sense that any object within the
stack region but not between start-of-stack and current_stack_pointer
should be considered unavailable (i.e. its lifetime is from a call no
longer present on the stack).
Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
have actually implemented the common global register alias.
Additionally report usercopy bounds checking failures with an offset
from current_stack_pointer, which may assist with diagnosing failures.
The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
(once slightly adjusted in a separate patch) pass again with this fixed.
[1] https://github.com/kernelci/kernelci-project/issues/84
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org
v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org
v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org
v4: - improve commit log (akpm)
|
|
There are many different ways that access_ok() is defined across
architectures, but in the end, they all just compare against the
user_addr_max() value or they accept anything.
Provide one definition that works for most architectures, checking
against TASK_SIZE_MAX for user processes or skipping the check inside
of uaccess_kernel() sections.
For architectures without CONFIG_SET_FS(), this should be the fastest
check, as it comes down to a single comparison of a pointer against a
compile-time constant, while the architecture specific versions tend to
do something more complex for historic reasons or get something wrong.
Type checking for __user annotations is handled inconsistently across
architectures, but this is easily simplified as well by using an inline
function that takes a 'const void __user *' argument. A handful of
callers need an extra __user annotation for this.
Some architectures had trick to use 33-bit or 65-bit arithmetic on the
addresses to calculate the overflow, however this simpler version uses
fewer registers, which means it can produce better object code in the
end despite needing a second (statically predicted) branch.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64, asm-generic]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Add a test in order to prevent regressions.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
|
|
In 5.12 cycle we enabled GENERIC_FIND_FIRST_BIT config option for ARM64
and MIPS. It increased performance and shrunk .text size; and so far
I didn't receive any negative feedback on the change.
https://lore.kernel.org/linux-arch/20210225135700.1381396-1-yury.norov@gmail.com/
Now I think it's a good time to switch all architectures to use
find_{first,last}_bit() unconditionally, and so remove corresponding
config option.
The patch does't introduce functioal changes for arc, arm, arm64, mips,
m68k, s390 and x86, for other architectures I expect improvement both in
performance and .text size.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Alexander Lobakin <alobakin@pm.me> (mips)
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|