diff options
Diffstat (limited to 'Documentation/arch')
-rw-r--r-- | Documentation/arch/arm64/cpu-hotplug.rst | 79 | ||||
-rw-r--r-- | Documentation/arch/arm64/index.rst | 1 | ||||
-rw-r--r-- | Documentation/arch/arm64/memory.rst | 42 | ||||
-rw-r--r-- | Documentation/arch/arm64/silicon-errata.rst | 16 | ||||
-rw-r--r-- | Documentation/arch/riscv/cmodx.rst | 4 | ||||
-rw-r--r-- | Documentation/arch/riscv/uabi.rst | 4 | ||||
-rw-r--r-- | Documentation/arch/x86/amd-memory-encryption.rst | 29 | ||||
-rw-r--r-- | Documentation/arch/x86/resctrl.rst | 27 |
8 files changed, 175 insertions, 27 deletions
diff --git a/Documentation/arch/arm64/cpu-hotplug.rst b/Documentation/arch/arm64/cpu-hotplug.rst new file mode 100644 index 000000000000..76ba8d932c72 --- /dev/null +++ b/Documentation/arch/arm64/cpu-hotplug.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _cpuhp_index: + +==================== +CPU Hotplug and ACPI +==================== + +CPU hotplug in the arm64 world is commonly used to describe the kernel taking +CPUs online/offline using PSCI. This document is about ACPI firmware allowing +CPUs that were not available during boot to be added to the system later. + +``possible`` and ``present`` refer to the state of the CPU as seen by linux. + + +CPU Hotplug on physical systems - CPUs not present at boot +---------------------------------------------------------- + +Physical systems need to mark a CPU that is ``possible`` but not ``present`` as +being ``present``. An example would be a dual socket machine, where the package +in one of the sockets can be replaced while the system is running. + +This is not supported. + +In the arm64 world CPUs are not a single device but a slice of the system. +There are no systems that support the physical addition (or removal) of CPUs +while the system is running, and ACPI is not able to sufficiently describe +them. + +e.g. New CPUs come with new caches, but the platform's cache toplogy is +described in a static table, the PPTT. How caches are shared between CPUs is +not discoverable, and must be described by firmware. + +e.g. The GIC redistributor for each CPU must be accessed by the driver during +boot to discover the system wide supported features. ACPI's MADT GICC +structures can describe a redistributor associated with a disabled CPU, but +can't describe whether the redistributor is accessible, only that it is not +'always on'. + +arm64's ACPI tables assume that everything described is ``present``. + + +CPU Hotplug on virtual systems - CPUs not enabled at boot +--------------------------------------------------------- + +Virtual systems have the advantage that all the properties the system will +ever have can be described at boot. There are no power-domain considerations +as such devices are emulated. + +CPU Hotplug on virtual systems is supported. It is distinct from physical +CPU Hotplug as all resources are described as ``present``, but CPUs may be +marked as disabled by firmware. Only the CPU's online/offline behaviour is +influenced by firmware. An example is where a virtual machine boots with a +single CPU, and additional CPUs are added once a cloud orchestrator deploys +the workload. + +For a virtual machine, the VMM (e.g. Qemu) plays the part of firmware. + +Virtual hotplug is implemented as a firmware policy affecting which CPUs can be +brought online. Firmware can enforce its policy via PSCI's return codes. e.g. +``DENIED``. + +The ACPI tables must describe all the resources of the virtual machine. CPUs +that firmware wishes to disable either from boot (or later) should not be +``enabled`` in the MADT GICC structures, but should have the ``online capable`` +bit set, to indicate they can be enabled later. The boot CPU must be marked as +``enabled``. The 'always on' GICR structure must be used to describe the +redistributors. + +CPUs described as ``online capable`` but not ``enabled`` can be set to enabled +by the DSDT's Processor object's _STA method. On virtual systems the _STA method +must always report the CPU as ``present``. Changes to the firmware policy can +be notified to the OS via device-check or eject-request. + +CPUs described as ``enabled`` in the static table, should not have their _STA +modified dynamically by firmware. Soft-restart features such as kexec will +re-read the static properties of the system from these static tables, and +may malfunction if these no longer describe the running system. Linux will +re-discover the dynamic properties of the system from the _STA method later +during boot. diff --git a/Documentation/arch/arm64/index.rst b/Documentation/arch/arm64/index.rst index d08e924204bf..78544de0a8a9 100644 --- a/Documentation/arch/arm64/index.rst +++ b/Documentation/arch/arm64/index.rst @@ -13,6 +13,7 @@ ARM64 Architecture asymmetric-32bit booting cpu-feature-registers + cpu-hotplug elf_hwcaps hugetlbpage kdump diff --git a/Documentation/arch/arm64/memory.rst b/Documentation/arch/arm64/memory.rst index 55a55f30eed8..8a658984b8bb 100644 --- a/Documentation/arch/arm64/memory.rst +++ b/Documentation/arch/arm64/memory.rst @@ -18,12 +18,10 @@ ARMv8.2 adds optional support for Large Virtual Address space. This is only available when running with a 64KB page size and expands the number of descriptors in the first level of translation. -User addresses have bits 63:48 set to 0 while the kernel addresses have -the same bits set to 1. TTBRx selection is given by bit 63 of the -virtual address. The swapper_pg_dir contains only kernel (global) -mappings while the user pgd contains only user (non-global) mappings. -The swapper_pg_dir address is written to TTBR1 and never written to -TTBR0. +TTBRx selection is given by bit 55 of the virtual address. The +swapper_pg_dir contains only kernel (global) mappings while the user pgd +contains only user (non-global) mappings. The swapper_pg_dir address is +written to TTBR1 and never written to TTBR0. AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: @@ -65,14 +63,14 @@ Translation table lookup with 4KB pages:: +--------+--------+--------+--------+--------+--------+--------+--------+ |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| +--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | | - | | | | | v - | | | | | [11:0] in-page offset - | | | | +-> [20:12] L3 index - | | | +-----------> [29:21] L2 index - | | +---------------------> [38:30] L1 index - | +-------------------------------> [47:39] L0 index - +-------------------------------------------------> [63] TTBR0/1 + | | | | | | + | | | | | v + | | | | | [11:0] in-page offset + | | | | +-> [20:12] L3 index + | | | +-----------> [29:21] L2 index + | | +---------------------> [38:30] L1 index + | +-------------------------------> [47:39] L0 index + +----------------------------------------> [55] TTBR0/1 Translation table lookup with 64KB pages:: @@ -80,14 +78,14 @@ Translation table lookup with 64KB pages:: +--------+--------+--------+--------+--------+--------+--------+--------+ |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| +--------+--------+--------+--------+--------+--------+--------+--------+ - | | | | | - | | | | v - | | | | [15:0] in-page offset - | | | +----------> [28:16] L3 index - | | +--------------------------> [41:29] L2 index - | +-------------------------------> [47:42] L1 index (48-bit) - | [51:42] L1 index (52-bit) - +-------------------------------------------------> [63] TTBR0/1 + | | | | | + | | | | v + | | | | [15:0] in-page offset + | | | +----------> [28:16] L3 index + | | +--------------------------> [41:29] L2 index + | +-------------------------------> [47:42] L1 index (48-bit) + | [51:42] L1 index (52-bit) + +----------------------------------------> [55] TTBR0/1 When using KVM without the Virtualization Host Extensions, the diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst index eb8af8032c31..bb83c5d8c675 100644 --- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -132,16 +132,26 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #3324338 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A715 | #2645198 | ARM64_ERRATUM_2645198 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A720 | #3456091 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-X1 | #1502854 | N/A | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X2 | #3324338 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X3 | #3324335 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-X4 | #3194386 | ARM64_ERRATUM_3194386 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X925 | #3324334 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1349291 | N/A | @@ -156,9 +166,13 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #3324339 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-V1 | #1619801 | N/A | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | Neoverse-V3 | #3312417 | ARM64_ERRATUM_3312417 | +| ARM | Neoverse-V2 | #3324336 | ARM64_ERRATUM_3194386 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-V3 | #3312417 | ARM64_ERRATUM_3194386 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arch/riscv/cmodx.rst b/Documentation/arch/riscv/cmodx.rst index 1c0ca06b6c97..8c48bcff3df9 100644 --- a/Documentation/arch/riscv/cmodx.rst +++ b/Documentation/arch/riscv/cmodx.rst @@ -62,10 +62,10 @@ cmodx.c:: printf("Value before cmodx: %d\n", value); // Call prctl before first fence.i is called inside modify_instruction - prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX_ON, PR_RISCV_CTX_SW_FENCEI, PR_RISCV_SCOPE_PER_PROCESS); + prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_ON, PR_RISCV_SCOPE_PER_PROCESS); modify_instruction(); // Call prctl after final fence.i is called in process - prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX_OFF, PR_RISCV_CTX_SW_FENCEI, PR_RISCV_SCOPE_PER_PROCESS); + prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_OFF, PR_RISCV_SCOPE_PER_PROCESS); value = get_value(); printf("Value after cmodx: %d\n", value); diff --git a/Documentation/arch/riscv/uabi.rst b/Documentation/arch/riscv/uabi.rst index 54d199dce78b..2b420bab0527 100644 --- a/Documentation/arch/riscv/uabi.rst +++ b/Documentation/arch/riscv/uabi.rst @@ -65,4 +65,6 @@ the extension, or may have deliberately removed it from the listing. Misaligned accesses ------------------- -Misaligned accesses are supported in userspace, but they may perform poorly. +Misaligned scalar accesses are supported in userspace, but they may perform +poorly. Misaligned vector accesses are only supported if the Zicclsm extension +is supported. diff --git a/Documentation/arch/x86/amd-memory-encryption.rst b/Documentation/arch/x86/amd-memory-encryption.rst index 414bc7402ae7..6df3264f23b9 100644 --- a/Documentation/arch/x86/amd-memory-encryption.rst +++ b/Documentation/arch/x86/amd-memory-encryption.rst @@ -130,4 +130,31 @@ SNP feature support. More details in AMD64 APM[1] Vol 2: 15.34.10 SEV_STATUS MSR -[1] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf +Secure VM Service Module (SVSM) +=============================== +SNP provides a feature called Virtual Machine Privilege Levels (VMPL) which +defines four privilege levels at which guest software can run. The most +privileged level is 0 and numerically higher numbers have lesser privileges. +More details in the AMD64 APM Vol 2, section "15.35.7 Virtual Machine +Privilege Levels", docID: 24593. + +When using that feature, different services can run at different protection +levels, apart from the guest OS but still within the secure SNP environment. +They can provide services to the guest, like a vTPM, for example. + +When a guest is not running at VMPL0, it needs to communicate with the software +running at VMPL0 to perform privileged operations or to interact with secure +services. An example fur such a privileged operation is PVALIDATE which is +*required* to be executed at VMPL0. + +In this scenario, the software running at VMPL0 is usually called a Secure VM +Service Module (SVSM). Discovery of an SVSM and the API used to communicate +with it is documented in "Secure VM Service Module for SEV-SNP Guests", docID: +58019. + +(Latest versions of the above-mentioned documents can be found by using +a search engine like duckduckgo.com and typing in: + + site:amd.com "Secure VM Service Module for SEV-SNP Guests", docID: 58019 + +for example.) diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst index 627e23869bca..a824affd741d 100644 --- a/Documentation/arch/x86/resctrl.rst +++ b/Documentation/arch/x86/resctrl.rst @@ -375,6 +375,10 @@ When monitoring is enabled all MON groups will also contain: all tasks in the group. In CTRL_MON groups these files provide the sum for all tasks in the CTRL_MON group and all tasks in MON groups. Please see example section for more details on usage. + On systems with Sub-NUMA Cluster (SNC) enabled there are extra + directories for each node (located within the "mon_L3_XX" directory + for the L3 cache they occupy). These are named "mon_sub_L3_YY" + where "YY" is the node number. "mon_hw_id": Available only with debug option. The identifier used by hardware @@ -484,6 +488,29 @@ if non-contiguous 1s value is supported. On a system with a 20-bit mask each bit represents 5% of the capacity of the cache. You could partition the cache into four equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000. +Notes on Sub-NUMA Cluster mode +============================== +When SNC mode is enabled, Linux may load balance tasks between Sub-NUMA +nodes much more readily than between regular NUMA nodes since the CPUs +on Sub-NUMA nodes share the same L3 cache and the system may report +the NUMA distance between Sub-NUMA nodes with a lower value than used +for regular NUMA nodes. + +The top-level monitoring files in each "mon_L3_XX" directory provide +the sum of data across all SNC nodes sharing an L3 cache instance. +Users who bind tasks to the CPUs of a specific Sub-NUMA node can read +the "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes" in the +"mon_sub_L3_YY" directories to get node local data. + +Memory bandwidth allocation is still performed at the L3 cache +level. I.e. throttling controls are applied to all SNC nodes. + +L3 cache allocation bitmaps also apply to all SNC nodes. But note that +the amount of L3 cache represented by each bit is divided by the number +of SNC nodes per L3 cache. E.g. with a 100MB cache on a system with 10-bit +allocation masks each bit normally represents 10MB. With SNC mode enabled +with two SNC nodes per L3 cache, each bit only represents 5MB. + Memory bandwidth Allocation and monitoring ========================================== |