Age | Commit message (Collapse) | Author |
|
Commit fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each
CFMWS not in SRAT") did not account for the case where the BIOS
only partially describes a CFMWS Window in the SRAT. That means
the omitted address ranges, of a partially described CFMWS Window,
do not get assigned to a NUMA node.
Replace the call to phys_to_target_node() with numa_add_memblks().
Numa_add_memblks() searches an HPA range for existing memblk(s)
and extends those memblk(s) to fill the entire CFMWS Window.
Extending the existing memblks is a simple strategy that reuses
SRAT defined proximity domains from part of a window to fill out
the entire window, based on the knowledge* that all of a CFMWS
window is of a similar performance class.
*Note that this heuristic will evolve when CFMWS Windows present
a wider range of characteristics. The extension of the proximity
domain, implemented here, is likely a step in developing a more
sophisticated performance profile in the future.
There is no change in behavior when the SRAT does not describe
the CFMWS Window at all. In that case, a new NUMA node with a
single memblk covering the entire CFMWS Window is created.
Fixes: fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
Reported-by: Derick Marks <derick.w.marks@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Derick Marks <derick.w.marks@intel.com>
Link: https://lore.kernel.org/all/eaa0b7cffb0951a126223eef3cbe7b55b8300ad9.1689018477.git.alison.schofield%40intel.com
|
|
The ACPI CEDT.CFMWS indicates a range of possible address where new CXL
regions can appear. Each range is associated with a QTG id (QoS
Throttling Group id). For each range + QTG pair that is not covered by a proximity
domain in the SRAT, Linux creates a new NUMA node. However, the commit
that added the new ranges missed updating the node_possible mask which
causes memory_group_register() to fail. Add the new nodes to the
nodes_possible mask.
Cc: <stable@vger.kernel.org>
Fixes: fd49f99c1809 ("ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT")
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/166631003537.1167078.9373680312035292395.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
"The highlight is initial support for CXL memory hotplug. The static
NUMA node (ACPI SRAT Physical Address to Proximity Domain) information
known to platform firmware is extended to support the potential
performance-class / memory-target nodes dynamically created from
available CXL memory device capacity.
New unit test infrastructure is added for validating health
information payloads.
Fixes to module reload stress and stack usage from exposure in -next
are included. A symbol rename and some other miscellaneous fixups are
included as well.
Summary:
- Rework ACPI sub-table infrastructure to optionally be used outside
of __init scenarios and use it for CEDT.CFMWS sub-table parsing.
- Add support for extending num_possible_nodes by the potential
hotplug CXL memory ranges
- Extend tools/testing/cxl with mock memory device health information
- Fix a module-reload workqueue race
- Fix excessive stack-frame usage
- Rename the driver context data structure from "cxl_mem" since that
name collides with a proposed driver name
- Use EXPORT_SYMBOL_NS_GPL instead of -DDEFAULT_SYMBOL_NAMESPACE at
build time"
* tag 'cxl-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/core: Remove cxld_const_init in cxl_decoder_alloc()
cxl/pmem: Fix module reload vs workqueue state
ACPI: NUMA: Add a node and memblk for each CFMWS not in SRAT
cxl/test: Mock acpi_table_parse_cedt()
cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers
ACPI: Add a context argument for table parsing handlers
ACPI: Teach ACPI table parsing about the CEDT header format
ACPI: Keep sub-table parsing infrastructure available for modules
tools/testing/cxl: add mock output for the GET_HEALTH_INFO command
cxl/memdev: Remove unused cxlmd field
cxl/core: Convert to EXPORT_SYMBOL_NS_GPL
cxl/memdev: Change cxl_mem to a more descriptive name
cxl/mbox: Remove bad comment
cxl/pmem: Fix reference counting for delayed work
|
|
Some systems (e.g. Hyper-V guests) have all their memory marked as
hotpluggable in SRAT. acpi_numa_memory_affinity_init(), however,
ignores all such regions when !CONFIG_MEMORY_HOTPLUG and this is
unfortunate as memory affinity (NUMA) information gets lost.
'Hot Pluggable' flag in SRAT only means that "system hardware supports
hot-add and hot-remove of this memory region", it doesn't prevent
memory from being cold-plugged there.
Ignore 'Hot Pluggable' bit instead of skipping the whole memory
affinity information when !CONFIG_MEMORY_HOTPLUG.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
During NUMA init, CXL memory defined in the SRAT Memory Affinity
subtable may be assigned to a NUMA node. Since there is no
requirement that the SRAT be comprehensive for CXL memory another
mechanism is needed to assign NUMA nodes to CXL memory not identified
in the SRAT.
Use the CXL Fixed Memory Window Structure (CFMWS) of the ACPI CXL
Early Discovery Table (CEDT) to find all CXL memory ranges.
Create a NUMA node for each CFMWS that is not already assigned to
a NUMA node. Add a memblk attaching its host physical address
range to the node.
Note that these ranges may not actually map any memory at boot time.
They may describe persistent capacity or may be present to enable
hot-plug.
Consumers can use phys_to_target_node() to discover the NUMA node.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/163553711933.2509508.2203471175679990.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
We are preparing to add new Loongson (based on LoongArch, not MIPS)
support. LoongArch use ACPI other than DT as its boot protocol, so
add its support for ACPI_PROCESSOR/ACPI_NUMA.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These add support for generic initiator-only proximity domains to the
ACPI NUMA code and the architectures using it, clean up some
non-ACPICA code referring to debug facilities from ACPICA, reduce the
overhead related to accessing GPE registers, add a new DPTF (Dynamic
Power and Thermal Framework) participant driver, update the ACPICA
code in the kernel to upstream revision 20200925, add a new ACPI
backlight whitelist entry, fix a few assorted issues and clean up some
code.
Specifics:
- Add support for generic initiator-only proximity domains to the
ACPI NUMA code and the architectures using it (Jonathan Cameron)
- Clean up some non-ACPICA code referring to debug facilities from
ACPICA that are not actually used in there (Hanjun Guo)
- Add new DPTF driver for the PCH FIVR participant (Srinivas
Pandruvada)
- Reduce overhead related to accessing GPE registers in ACPICA and
the OS interface layer and make it possible to access GPE registers
using logical addresses if they are memory-mapped (Rafael Wysocki)
- Update the ACPICA code in the kernel to upstream revision 20200925
including changes as follows:
+ Add predefined names from the SMBus sepcification (Bob Moore)
+ Update acpi_help UUID list (Bob Moore)
+ Return exceptions for string-to-integer conversions in iASL (Bob
Moore)
+ Add a new "ALL <NameSeg>" debugger command (Bob Moore)
+ Add support for 64 bit risc-v compilation (Colin Ian King)
+ Do assorted cleanups (Bob Moore, Colin Ian King, Randy Dunlap)
- Add new ACPI backlight whitelist entry for HP 635 Notebook (Alex
Hung)
- Move TPS68470 OpRegion driver to drivers/acpi/pmic/ and split out
Kconfig and Makefile specific for ACPI PMIC (Andy Shevchenko)
- Clean up the ACPI SoC driver for AMD SoCs (Hanjun Guo)
- Add missing config_item_put() to fix refcount leak (Hanjun Guo)
- Drop lefrover field from struct acpi_memory_device (Hanjun Guo)
- Make the ACPI extlog driver check for RDMSR failures (Ben
Hutchings)
- Fix handling of lid state changes in the ACPI button driver when
input device is closed (Dmitry Torokhov)
- Fix several assorted build issues (Barnabás Pőcze, John Garry,
Nathan Chancellor, Tian Tao)
- Drop unused inline functions and reduce code duplication by using
kobj_to_dev() in the NFIT parsing code (YueHaibing, Wang Qing)
- Serialize tools/power/acpi Makefile (Thomas Renninger)"
* tag 'acpi-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
ACPICA: Update version to 20200925 Version 20200925
ACPICA: Remove unnecessary semicolon
ACPICA: Debugger: Add a new command: "ALL <NameSeg>"
ACPICA: iASL: Return exceptions for string-to-integer conversions
ACPICA: acpi_help: Update UUID list
ACPICA: Add predefined names found in the SMBus sepcification
ACPICA: Tree-wide: fix various typos and spelling mistakes
ACPICA: Drop the repeated word "an" in a comment
ACPICA: Add support for 64 bit risc-v compilation
ACPI: button: fix handling lid state changes when input device closed
tools/power/acpi: Serialize Makefile
ACPI: scan: Replace ACPI_DEBUG_PRINT() with pr_debug()
ACPI: memhotplug: Remove 'state' from struct acpi_memory_device
ACPI / extlog: Check for RDMSR failure
ACPI: Make acpi_evaluate_dsm() prototype consistent
docs: mm: numaperf.rst Add brief description for access class 1.
node: Add access1 class to represent CPU to memory characteristics
ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
ACPI: Let ACPI know we support Generic Initiator Affinity Structures
x86: Support Generic Initiator only proximity domains
...
|
|
Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5.
The device-dax facility allows an address range to be directly mapped
through a chardev, or optionally hotplugged to the core kernel page
allocator as System-RAM. It is the mechanism for converting persistent
memory (pmem) to be used as another volatile memory pool i.e. the current
Memory Tiering hot topic on linux-mm.
In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
it, but that labeling mechanism is not available / applicable to
soft-reserved ("EFI specific purpose") memory [3]. This series provides a
sysfs-mechanism for the daxctl utility to enable provisioning of
volatile-soft-reserved memory ranges.
The motivations for this facility are:
1/ Allow performance differentiated memory ranges to be split between
kernel-managed and directly-accessed use cases.
2/ Allow physical memory to be provisioned along performance relevant
address boundaries. For example, divide a memory-side cache [4] along
cache-color boundaries.
3/ Parcel out soft-reserved memory to VMs using device-dax as a security
/ permissions boundary [5]. Specifically I have seen people (ab)using
memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
device-dax interface on custom address ranges. A follow-on for the VM
use case is to teach device-dax to dynamically allocate 'struct page' at
runtime to reduce the duplication of 'struct page' space in both the
guest and the host kernel for the same physical pages.
[2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com
[3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
[4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
[5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com
This patch (of 23):
In preparation for adding a new numa= option clean up the existing ones to
avoid ifdefs in numa_setup(), and provide feedback when the option is
numa=fake= option is invalid due to kernel config. The same does not need
to be done for numa=noacpi, since the capability is already hard disabled
at compile-time.
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: https://lkml.kernel.org/r/160106109960.30709.7379926726669669398.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Generic Initiators are a new ACPI concept that allows for the
description of proximity domains that contain a device which
performs memory access (such as a network card) but neither
host CPU nor Memory.
This patch has the parsing code and provides the infrastructure
for an architecture to associate these new domains with their
nearest memory processing node.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_get_node() calls acpi_get_pxm() to evaluate the _PXM AML method
for entries found in DSDT/SSDT. ACPI 6.3 sec 6.2.14 states
"_PXM evaluates to an integer that identifies a device as belonging to
a Proximity Domain defined in the System Resource Affinity Table (SRAT)."
Hence a _PXM method should not result in creation of a new NUMA node.
Before this patch, _PXM could result in partial instantiation of
NUMA node, missing elements such as zone lists. A call to
devm_kzalloc(), for example, results in a NULL pointer dereference.
This patch therefore replaces the acpi_map_pxm_to_node() with a call
to pxm_to_node().
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The function should check the validity of the pxm value before using
it to index the pxm_to_node_map[] array.
Whilst hardening this code may be good in general, the main intent
here is to enable following patches that use this function to replace
acpi_map_pxm_to_node() for non SRAT usecases which should return
NO_NUMA_NODE for PXM entries not matching with those in SRAT.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
acpi_map_pxm_to_node() will never return a NUMA node greater than
MAX_NUMNODES, so the 'node >= MAX_NUMNODES' check is not needed.
Remove it.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In acpi_parse_entries_array(), the subtable entries (entry.hdr)
will never be NULL, so for ACPI subtable handler in struct
acpi_subtable_proc, will never handle NULL subtable entries.
Remove those useless subtable pointer checks in the callback
handlers.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
We want to allow to specify (similar as for a DIMM), to which node a
virtio-mem device (and, therefore, its memory) belongs. Add a new
virtio-mem feature flag and export pxm_to_node, so it can be used in kernel
module context.
Acked-by: Michal Hocko <mhocko@suse.com> # for the export
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> # for the export
Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-4-david@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The acpi_map_pxm_to_online_node() helper is used to find the closest
online node to a given proximity domain. This is used to map devices in
a proximity domain with no online memory or cpus to the closest online
node and populate a device's 'numa_node' property. The numa_node
property allows applications to be migrated "close" to a resource.
In preparation for providing a generic facility to optionally map an
address range to its closest online node, or the node the range would
represent were it to be onlined (target_node), up-level the core of
acpi_map_pxm_to_online_node() to a generic mm/numa helper.
Cc: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/158188324802.894464.13128795207831894206.stgit@dwillia2-desk3.amr.corp.intel.com
|
|
Currently hmat.c lives under an "hmat" directory which does not enhance
the description of the file. The initial motivation for giving hmat.c
its own directory was to delineate it as mm functionality in contrast to
ACPI device driver functionality.
As ACPI continues to play an increasing role in conveying
memory location and performance topology information to the OS take the
opportunity to co-locate these NUMA relevant tables in a combined
directory.
numa.c is renamed to srat.c and moved to drivers/acpi/numa/ along with
hmat.c.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|