summaryrefslogtreecommitdiff
path: root/arch/x86/events
AgeCommit message (Collapse)Author
2024-07-16Merge tag 'perf-core-2024-07-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Intel PT support enhancements & fixes - Fix leaked SIGTRAP events - Improve and fix the Intel uncore driver - Add support for Intel HBM and CXL uncore counters - Add Intel Lake and Arrow Lake support - AMD uncore driver fixes - Make SIGTRAP and __perf_pending_irq() work on RT - Micro-optimizations - Misc cleanups and fixes * tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) perf/x86/intel: Add a distinct name for Granite Rapids perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR perf: Split __perf_pending_irq() out of perf_pending_irq() perf: Don't disable preemption in perf_pending_task(). perf: Move swevent_htable::recursion into task_struct. perf: Shrink the size of the recursion counter. perf: Enqueue SIGTRAP always via task_work. task_work: Add TWA_NMI_CURRENT as an additional notify mode. perf: Move irq_work_queue() where the event is prepared. perf: Fix event leak upon exec and file release perf: Fix event leak upon exit task_work: Introduce task_work_cancel() again task_work: s/task_work_cancel()/task_work_cancel_func()/ perf/x86/amd/uncore: Fix DF and UMC domain identification perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable perf/x86/intel: Support Perfmon MSRs aliasing perf/x86/intel: Support PERFEVTSEL extension perf/x86: Add config_mask to represent EVENTSEL bitmask ...
2024-07-16Merge tag 'locking-core-2024-07-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Jump label fixes, including a perf events fix that originally manifested as jump label failures, but was a serialization bug at the usage site - Mark down_write*() helpers as __always_inline, to improve WCHAN debuggability - Misc cleanups and fixes * tag 'locking-core-2024-07-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Add __always_inline annotation to __down_write_common() and inlined callers jump_label: Simplify and clarify static_key_fast_inc_cpus_locked() jump_label: Clarify condition in static_key_fast_inc_not_disabled() jump_label: Fix concurrency issues in static_key_slow_dec() perf/x86: Serialize set_attr_rdpmc() cleanup: Standardize the header guard define's name
2024-07-15Merge tag 'x86_cpu_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu model updates from Borislav Petkov: - Flip the logic to add feature names to /proc/cpuinfo to having to explicitly specify the flag if there's a valid reason to show it in /proc/cpuinfo - Switch a bunch of Intel x86 model checking code to the new CPU model defines - Fixes and cleanups * tag 'x86_cpu_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/intel: Drop stray FAM6 check with new Intel CPU model defines x86/cpufeatures: Flip the /proc/cpuinfo appearance logic x86/CPU/AMD: Always inline amd_clear_divider() x86/mce/inject: Add missing MODULE_DESCRIPTION() line perf/x86/rapl: Switch to new Intel CPU model defines x86/boot: Switch to new Intel CPU model defines x86/cpu: Switch to new Intel CPU model defines perf/x86/intel: Switch to new Intel CPU model defines x86/virt/tdx: Switch to new Intel CPU model defines x86/PCI: Switch to new Intel CPU model defines x86/cpu/intel: Switch to new Intel CPU model defines x86/platform/intel-mid: Switch to new Intel CPU model defines x86/pconfig: Remove unused MKTME pconfig code x86/cpu: Remove useless work in detect_tme_early()
2024-07-09perf/x86/intel: Add a distinct name for Granite RapidsKan Liang
Currently, the Sapphire Rapids and Granite Rapids share the same PMU name, sapphire_rapids. Because from the kernel’s perspective, GNR is similar to SPR. The only key difference is that they support different extra MSRs. The code path and the PMU name are shared. However, from end users' perspective, they are quite different. Besides the extra MSRs, GNR has a newer PEBS format, supports Retire Latency, supports new CPUID enumeration architecture, doesn't required the load-latency AUX event, has additional TMA Level 1 Architectural Events, etc. The differences can be enumerated by CPUID or the PERF_CAPABILITIES MSR. They weren't reflected in the model-specific kernel setup. But it is worth to have a distinct PMU name for GNR. Fixes: a6742cb90b56 ("perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL") Suggested-by: Ahmad Yasin <ahmad.yasin@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240708193336.1192217-3-kan.liang@linux.intel.com
2024-07-09perf/x86/intel/ds: Fix non 0 retire latency on RaptorlakeKan Liang
A non-0 retire latency can be observed on a Raptorlake which doesn't support the retire latency feature. By design, the retire latency shares the PERF_SAMPLE_WEIGHT_STRUCT sample type with other types of latency. That could avoid adding too many different sample types to support all kinds of latency. For the machine which doesn't support some kind of latency, 0 should be returned. Perf doesn’t clear/init all the fields of a sample data for the sake of performance. It expects the later perf_{prepare,output}_sample() to update the uninitialized field. However, the current implementation doesn't touch the field of the retire latency if the feature is not supported. The memory garbage is dumped into the perf data. Clear the retire latency if the feature is not supported. Fixes: c87a31093c70 ("perf/x86: Support Retire Latency") Reported-by: "Bayduraev, Alexey V" <alexey.v.bayduraev@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: "Bayduraev, Alexey V" <alexey.v.bayduraev@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240708193336.1192217-4-kan.liang@linux.intel.com
2024-07-09perf/x86/intel: Hide Topdown metrics events if the feature is not enumeratedKan Liang
The below error is observed on Ice Lake VM. $ perf stat Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (slots). /bin/dmesg | grep -i perf may provide additional information. In a virtualization env, the Topdown metrics and the slots event haven't been supported yet. The guest CPUID doesn't enumerate them. However, the current kernel unconditionally exposes the slots event and the Topdown metrics events to sysfs, which misleads the perf tool and triggers the error. Hide the perf-metrics topdown events and the slots event if the perf-metrics feature is not enumerated. The big core of a hybrid platform can also supports the perf-metrics feature. Fix the hybrid platform as well. Closes: https://lore.kernel.org/lkml/CAM9d7cj8z+ryyzUHR+P1Dcpot2jjW+Qcc4CPQpfafTXN=LEU0Q@mail.gmail.com/ Reported-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Dongli Zhang <dongli.zhang@oracle.com> Link: https://lkml.kernel.org/r/20240708193336.1192217-2-kan.liang@linux.intel.com
2024-07-09perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPRKan Liang
The perf stat errors out with UNC_CHA_TOR_INSERTS.IA_HIT_CXL_ACC_LOCAL event. $perf stat -e uncore_cha_55/event=0x35,umask=0x10c0008101/ -a -- ls event syntax error: '..0x35,umask=0x10c0008101/' \___ Bad event or PMU The definition of the CHA umask is config:8-15,32-55, which is 32bit. However, the umask of the event is bigger than 32bit. This is an error in the original uncore spec. Add a new umask_ext5 for the new CHA umask range. Fixes: 949b11381f81 ("perf/x86/intel/uncore: Add Sapphire Rapids server CHA support") Closes: https://lore.kernel.org/linux-perf-users/alpine.LRH.2.20.2401300733310.11354@Diego/ Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240708185524.1185505-1-kan.liang@linux.intel.com
2024-07-04perf/x86/amd/uncore: Fix DF and UMC domain identificationSandipan Das
For uncore PMUs, a single context is shared across all CPUs in a domain. The domain can be a CCX, like in the case of the L3 PMU, or a socket, like in the case of DF and UMC PMUs. This information is available via the PMU's cpumask. For contexts shared across a socket, the domain is currently determined from topology_die_id() which is incorrect after the introduction of commit 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") as it now returns a CCX identifier on Zen 4 and later systems which support CPUID leaf 0x80000026. Use topology_logical_package_id() instead as it always returns a socket identifier irrespective of the availability of CPUID leaf 0x80000026. Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240626074942.1044818-1-sandipan.das@amd.com
2024-07-04perf/x86/amd/uncore: Avoid PMU registration if counters are unavailableSandipan Das
X86_FEATURE_PERFCTR_NB and X86_FEATURE_PERFCTR_LLC are derived from CPUID leaf 0x80000001 ECX bits 24 and 28 respectively and denote the availability of DF and L3 counters. When these bits are not set, the corresponding PMUs have no counters and hence, should not be registered. Fixes: 07888daa056e ("perf/x86/amd/uncore: Move discovery and registration") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240626074404.1044230-1-sandipan.das@amd.com
2024-07-04perf/x86/intel: Support Perfmon MSRs aliasingKan Liang
The architectural performance monitoring V6 supports a new range of counters' MSRs in the 19xxH address range. They include all the GP counter MSRs, the GP control MSRs, and the fixed counter MSRs. The step between each sibling counter is 4. Add intel_pmu_addr_offset() to calculate the correct offset. Add fixedctr in struct x86_pmu to store the address of the fixed counter 0. It can be used to calculate the rest of the fixed counters. The MSR address of the fixed counter control is not changed. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-9-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Support PERFEVTSEL extensionKan Liang
Two new fields (the unit mask2, and the equal flag) are added in the IA32_PERFEVTSELx MSRs. They can be enumerated by the CPUID.23H.0.EBX. Update the config_mask in x86_pmu and x86_hybrid_pmu for the true layout of the PERFEVTSEL. Expose the new formats into sysfs if they are available. The umask extension reuses the same format attr name "umask" as the previous umask. Add umask2_show to determine/display the correct format for the current machine. Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-8-kan.liang@linux.intel.com
2024-07-04perf/x86: Add config_mask to represent EVENTSEL bitmaskKan Liang
Different vendors may support different fields in EVENTSEL MSR, such as Intel would introduce new fields umask2 and eq bits in EVENTSEL MSR since Perfmon version 6. However, a fixed mask X86_RAW_EVENT_MASK is used to filter the attr.config. Introduce a new config_mask to record the real supported EVENTSEL bitmask. Only apply it to the existing code now. No functional change. Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-7-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Support new data source for Lunar LakeKan Liang
A new PEBS data source format is introduced for the p-core of Lunar Lake. The data source field is extended to 8 bits with new encodings. A new layout is introduced into the union intel_x86_pebs_dse. Introduce the lnl_latency_data() to parse the new format. Enlarge the pebs_data_source[] accordingly to include new encodings. Only the mem load and the mem store events can generate the data source. Introduce INTEL_HYBRID_LDLAT_CONSTRAINT and INTEL_HYBRID_STLAT_CONSTRAINT to mark them. Add two new bits for the new cache-related data src, L2_MHB and MSC. The L2_MHB is short for L2 Miss Handling Buffer, which is similar to LFB (Line Fill Buffer), but to track the L2 Cache misses. The MSC stands for the memory-side cache. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-6-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Rename model-specific pebs_latency_data functionsKan Liang
The model-specific pebs_latency_data functions of ADL and MTL use the "small" as a postfix to indicate the e-core. The postfix is too generic for a model-specific function. It cannot provide useful information that can directly map it to a specific uarch, which can facilitate the development and maintenance. Use the abbr of the uarch to rename the model-specific functions. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-5-kan.liang@linux.intel.com
2024-07-04perf/x86: Add Lunar Lake and Arrow Lake supportKan Liang
From PMU's perspective, Lunar Lake and Arrow Lake are similar to the previous generation Meteor Lake. Both are hybrid platforms, with e-core and p-core. The key differences include: - The e-core supports 3 new fixed counters - The p-core supports an updated PEBS Data Source format - More GP counters (Updated event constraint table) - New Architectural performance monitoring V6 (New Perfmon MSRs aliasing, umask2, eq). - New PEBS format V6 (Counters Snapshotting group) - New RDPMC metrics clear mode The legacy features, the 3 new fixed counters and updated event constraint table are enabled in this patch. The new PEBS data source format, the architectural performance monitoring V6, the PEBS format V6, and the new RDPMC metrics clear mode are supported in the following patches. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-4-kan.liang@linux.intel.com
2024-07-04perf/x86: Support counter maskKan Liang
The current perf assumes that both GP and fixed counters are contiguous. But it's not guaranteed on newer Intel platforms or in a virtualization environment. Use the counter mask to replace the number of counters for both GP and the fixed counters. For the other ARCHs or old platforms which don't support a counter mask, using GENMASK_ULL(num_counter - 1, 0) to replace. There is no functional change for them. The interface to KVM is not changed. The number of counters still be passed to KVM. It can be updated later separately. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-3-kan.liang@linux.intel.com
2024-07-04perf/x86/intel: Support the PEBS event maskKan Liang
The current perf assumes that the counters that support PEBS are contiguous. But it's not guaranteed with the new leaf 0x23 introduced. The counters are enumerated with a counter mask. There may be holes in the counter mask for future platforms or in a virtualization environment. Store the PEBS event mask rather than the maximum number of PEBS counters in the x86 PMU structures. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Link: https://lkml.kernel.org/r/20240626143545.480761-2-kan.liang@linux.intel.com
2024-07-04perf/x86/intel/cstate: Add Lunarlake supportZhang Rui
Compared with previous client platforms, PC8 is removed from Lunarlake. It supports CC1/CC6/CC7 and PC2/PC3/PC6/PC10 residency counters. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20240628031758.43103-4-rui.zhang@intel.com
2024-07-04perf/x86/intel/cstate: Add Arrowlake supportZhang Rui
Like Alderlake, Arrowlake supports CC1/CC6/CC7 and PC2/PC3/PC6/PC8/PC10. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20240628031758.43103-3-rui.zhang@intel.com
2024-07-04perf/x86/intel/cstate: Fix Alderlake/Raptorlake/MeteorlakeZhang Rui
For Alderlake, the spec changes after the patch submitted and PC7/PC9 are removed. Raptorlake and Meteorlake, which copy the Alderlake cstate PMU, also don't have PC7/PC9. Remove PC7/PC9 support for Alderlake/Raptorlake/Meteorlake. Fixes: d0ca946bcf84 ("perf/x86/cstate: Add Alder Lake CPU support") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20240628031758.43103-2-rui.zhang@intel.com
2024-07-04Merge branch 'tip/x86/cpu'Peter Zijlstra
The Lunarlake patches rely on the new VFM stuff. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2024-07-04perf/x86/intel/pt: Fix pt_topa_entry_for_page() address calculationAdrian Hunter
Currently, perf allocates an array of page pointers which is limited in size by MAX_PAGE_ORDER. That in turn limits the maximum Intel PT buffer size to 2GiB. Should that limitation be lifted, the Intel PT driver can support larger sizes, except for one calculation in pt_topa_entry_for_page(), which is limited to 32-bits. Fix pt_topa_entry_for_page() address calculation by adding a cast. Fixes: 39152ee51b77 ("perf/x86/intel/pt: Get rid of reverse lookup table for ToPA") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240624201101.60186-4-adrian.hunter@intel.com
2024-07-04perf/x86/intel/pt: Fix a topa_entry base address calculationAdrian Hunter
topa_entry->base is a bit-field. Bit-fields are not promoted to a 64-bit type, even if the underlying type is 64-bit, and so, if necessary, must be cast to a larger type when calculations are done. Fix a topa_entry->base address calculation by adding a cast. Without the cast, the address was limited to 36-bits i.e. 64GiB. The address calculation is used on systems that do not support Multiple Entry ToPA (only Broadwell), and affects physical addresses on or above 64GiB. Instead of writing to the correct address, the address comprising the first 36 bits would be written to. Intel PT snapshot and sampling modes are not affected. Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver") Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240624201101.60186-3-adrian.hunter@intel.com
2024-07-04perf/x86/intel/pt: Fix topa_entry base lengthMarco Cavenati
topa_entry->base needs to store a pfn. It obviously needs to be large enough to store the largest possible x86 pfn which is MAXPHYADDR-PAGE_SIZE (52-12). So it is 4 bits too small. Increase the size of topa_entry->base from 36 bits to 40 bits. Note, systems where physical addresses can be 256TiB or more are affected. [ Adrian: Amend commit message as suggested by Dave Hansen ] Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver") Signed-off-by: Marco Cavenati <cavenati.marco@gmail.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240624201101.60186-2-adrian.hunter@intel.com
2024-06-17perf/x86/intel/uncore: Support HBM and CXL PMON countersKan Liang
Unknown uncore PMON types can be found in both SPR and EMR with HBM or CXL. $ls /sys/devices/ | grep type uncore_type_12_16 uncore_type_12_18 uncore_type_12_2 uncore_type_12_4 uncore_type_12_6 uncore_type_12_8 uncore_type_13_17 uncore_type_13_19 uncore_type_13_3 uncore_type_13_5 uncore_type_13_7 uncore_type_13_9 The unknown PMON types are HBM and CXL PMON. Except for the name, the other information regarding the HBM and CXL PMON counters can be retrieved via the discovery table. Add them into the uncores tables for SPR and EMR. The event config registers for all CXL related units are 8-byte apart. Add SPR_UNCORE_MMIO_OFFS8_COMMON_FORMAT to specially handle it. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-9-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Cleanup unused unit structureKan Liang
The unit control and ID information are retrieved from the unit control RB tree. No one uses the old structure anymore. Remove them. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-8-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Apply the unit control RB tree to PCI uncore unitsKan Liang
The unit control RB tree has the unit control and unit ID information for all the PCI units. Use them to replace the box_ctls/pci_offsets to get an accurate unit control address for PCI uncore units. The UPI/M3UPI units in the discovery table are ignored. Please see the commit 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR"). Manually allocate a unit control RB tree for UPI/M3UPI. Add cleanup_extra_boxes to release such manual allocation. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-7-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Apply the unit control RB tree to MSR uncore unitsKan Liang
The unit control RB tree has the unit control and unit ID information for all the MSR units. Use them to replace the box_ctl and uncore_msr_box_ctl() to get an accurate unit control address for MSR uncore units. Add intel_generic_uncore_assign_hw_event(), which utilizes the accurate unit control address from the unit control RB tree to calculate the config_base and event_base. The unit id related information should be retrieved from the unit control RB tree as well. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-6-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Apply the unit control RB tree to MMIO uncore unitsKan Liang
The unit control RB tree has the unit control and unit ID information for all the units. Use it to replace the box_ctls/mmio_offsets to get an accurate unit control address for MMIO uncore units. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-5-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Retrieve the unit ID from the unit control RB treeKan Liang
The box_ids only save the unit ID for the first die. If a unit, e.g., a CXL unit, doesn't exist in the first die. The unit ID cannot be retrieved. The unit control RB tree also stores the unit ID information. Retrieve the unit ID from the unit control RB tree Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-4-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Support per PMU cpumaskKan Liang
The cpumask of some uncore units, e.g., CXL uncore units, may be wrong under some configurations. Perf may access an uncore counter of a non-existent uncore unit. The uncore driver assumes that all uncore units are symmetric among dies. A global cpumask is shared among all uncore PMUs. However, some CXL uncore units may only be available on some dies. A per PMU cpumask is introduced to track the CPU mask of this PMU. The driver searches the unit control RB tree to check whether the PMU is available on a given die, and updates the per PMU cpumask accordingly. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Yunying Sun <yunying.sun@intel.com> Link: https://lore.kernel.org/r/20240614134631.1092359-3-kan.liang@linux.intel.com
2024-06-17perf/x86/uncore: Save the unit control address of all unitsKan Liang
The unit control address of some CXL units may be wrongly calculated under some configuration on a EMR machine. The current implementation only saves the unit control address of the units from the first die, and the first unit of the rest of dies. Perf assumed that the units from the other dies have the same offset as the first die. So the unit control address of the rest of the units can be calculated. However, the assumption is wrong, especially for the CXL units. Introduce an RB tree for each uncore type to save the unit control address and three kinds of ID information (unit ID, PMU ID, and die ID) for all units. The unit ID is a physical ID of a unit. The PMU ID is a logical ID assigned to a unit. The logical IDs start from 0 and must be contiguous. The physical ID and the logical ID are 1:1 mapping. The units with the same physical ID in different dies share the same PMU. The die ID indicates which die a unit belongs to. The RB tree can be searched by two different keys (unit ID or PMU ID + die ID). During the RB tree setup, the unit ID is used as a key to look up the RB tree. The perf can create/assign a proper PMU ID to the unit. Later, after the RB tree is setup, PMU ID + die ID is used as a key to look up the RB tree to fill the cpumask of a PMU. It's used more frequently, so PMU ID + die ID is compared in the unit_less(). The uncore_find_unit() has to be O(N). But the RB tree setup only occurs once during the driver load time. It should be acceptable. Compared with the current implementation, more space is required to save the information of all units. The extra size should be acceptable. For example, on EMR, there are 221 units at most. For a 2-socket machine, the extra space is ~6KB at most. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20240614134631.1092359-2-kan.liang@linux.intel.com
2024-06-11perf/x86: Serialize set_attr_rdpmc()Thomas Gleixner
Yue and Xingwei reported a jump label failure. It's caused by the lack of serialization in set_attr_rdpmc(): CPU0 CPU1 Assume: x86_pmu.attr_rdpmc == 0 if (val != x86_pmu.attr_rdpmc) { if (val == 0) ... else if (x86_pmu.attr_rdpmc == 0) static_branch_dec(&rdpmc_never_available_key); if (val != x86_pmu.attr_rdpmc) { if (val == 0) ... else if (x86_pmu.attr_rdpmc == 0) FAIL, due to imbalance ---> static_branch_dec(&rdpmc_never_available_key); The reported BUG() is a consequence of the above and of another bug in the jump label core code. The core code needs a separate fix, but that cannot prevent the imbalance problem caused by set_attr_rdpmc(). Prevent this by serializing set_attr_rdpmc() locally. Fixes: a66734297f78 ("perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks") Closes: https://lore.kernel.org/r/CAEkJfYNzfW1vG=ZTMdz_Weoo=RXY1NDunbxnDaLyj8R4kEoE_w@mail.gmail.com Reported-by: Yue Sun <samsun1006219@gmail.com> Reported-by: Xingwei Lee <xrivendell7@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20240610124406.359476013@linutronix.de
2024-05-31perf/x86/intel: Add missing MODULE_DESCRIPTION() linesJeff Johnson
Fix the 'make W=1 C=1' warnings: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-uncore.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-cstate.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-intel-v1-1-8252194ed20a@quicinc.com
2024-05-31perf/x86/rapl: Add missing MODULE_DESCRIPTION() lineJeff Johnson
Fix the warning from 'make C=1 W=1': WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/rapl.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-v1-1-e45ffa8af99f@quicinc.com
2024-05-28perf/x86/rapl: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240520224620.9480-44-tony.luck%40intel.com
2024-05-28x86/cpu: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Update INTEL_CPU_DESC() to work with vendor/family/model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240520224620.9480-34-tony.luck%40intel.com
2024-05-28perf/x86/intel: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240520224620.9480-32-tony.luck%40intel.com
2024-05-22Merge tag 'driver-core-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the small set of driver core and kernfs changes for 6.10-rc1. Nothing major here at all, just a small set of changes for some driver core apis, and minor fixups. Included in here are: - sysfs_bin_attr_simple_read() helper added and used - device_show_string() helper added and used All usages of these were acked by the various maintainers. Also in here are: - kernfs minor cleanup - removed unused functions - typo fix in documentation - pay attention to sysfs_create_link() failures in module.c finally All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: device property: Fix a typo in the description of device_get_child_node_count() kernfs: mount: Remove unnecessary ‘NULL’ values from knparent scsi: Use device_show_string() helper for sysfs attributes platform/x86: Use device_show_string() helper for sysfs attributes perf: Use device_show_string() helper for sysfs attributes IB/qib: Use device_show_string() helper for sysfs attributes hwmon: Use device_show_string() helper for sysfs attributes driver core: Add device_show_string() helper for sysfs attributes treewide: Use sysfs_bin_attr_simple_read() helper sysfs: Add sysfs_bin_attr_simple_read() helper module: don't ignore sysfs_create_link() failures driver core: Remove unused platform_notify, platform_notify_remove
2024-05-18perf/x86/amd: Use try_cmpxchg() in events/amd/{un,}core.cUros Bizjak
Replace this pattern in events/amd/{un,}core.c: cmpxchg(*ptr, old, new) == old ... with the simpler and faster: try_cmpxchg(*ptr, &old, new) The x86 CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after the CMPXCHG. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240425101708.5025-1-ubizjak@gmail.com
2024-05-08perf/x86/cstate: Remove unused 'struct perf_cstate_msr'Ingo Molnar
Use of this structure was removed in: 8f2a28c5859b ("perf/x86/cstate: Use new probe function") Remove the now stale type as well. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-04perf: Use device_show_string() helper for sysfs attributesLukas Wunner
Deduplicate sysfs ->show() callbacks which expose a string at a static memory location. Use the newly introduced device_show_string() helper in the driver core instead by declaring those sysfs attributes with DEVICE_STRING_ATTR_RO(). No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Link: https://lore.kernel.org/r/3a297850312b4ecb62d6872121de04496900f502.1713608122.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-02perf/x86/rapl: Rename 'maxdie' to nr_rapl_pmu and 'dieid' to rapl_pmu_idxDhananjay Ugwekar
AMD CPUs have the scope of RAPL energy-pkg event as package, whereas Intel Cascade Lake CPUs have the scope as die. To account for the difference in the energy-pkg event scope between AMD and Intel CPUs, give more generic and semantically correct names to the maxdie and dieid variables. No functional change. Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240502095115.177713-2-Dhananjay.Ugwekar@amd.com
2024-05-02Merge branch 'x86/cpu' into perf/core, to pick up dependent commitsIngo Molnar
We are going to fix perf-events fallout of changes in tip:x86/cpu, so merge in that branch first. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-05-02Merge tag 'v6.9-rc6' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-04-29perf/x86/msr: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20240424181503.41614-1-tony.luck%40intel.com
2024-04-29perf/x86/intel/uncore: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. [ bp: Squash *three* uncore patches into one. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/all/20240424181501.41557-1-tony.luck%40intel.com
2024-04-25perf/x86/intel/pt: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240424181500.41538-1-tony.luck%40intel.com
2024-04-25perf/x86/lbr: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240424181500.41519-1-tony.luck%40intel.com
2024-04-25perf/x86/intel/cstate: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20240424181459.41500-1-tony.luck%40intel.com