summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2019-05-21treewide: Add SPDX license identifier for more missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have MODULE_LICENCE("GPL*") inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16Merge tag 'edac_fixes_for_5.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC fixes from Borislav Petkov: - Do not build mpc85_edac as a module (Michael Ellerman) - Correct edac_mc_find()'s return value on error (Robert Richter) * tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC/mc: Fix edac_mc_find() in case no device is found EDAC/mpc85xx: Prevent building as a module
2019-05-14EDAC/mc: Fix edac_mc_find() in case no device is foundRobert Richter
The function should return NULL in case no device is found, but it always returns the last checked mc device from the list even if the index did not match. Fix that. I did some analysis why this did not raise any issues for about 3 years and the reason is that edac_mc_find() is mostly used to search for existing devices. Thus, the bug is not triggered. [ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ] Fixes: c73e8833bec5 ("EDAC, mc: Fix locking around mc_devices list") Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
2019-05-10EDAC/mpc85xx: Prevent building as a moduleMichael Ellerman
The mpc85xx EDAC driver can be configured as a module but then fails to build because it uses two unexported symbols: ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined! ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined! We don't want to export those symbols just for this driver, so make the driver only configurable as a built-in. This seems to have been broken since at least c92132f59806 ("edac/85xx: Add PCIe error interrupt edac support") (Nov 2013). [ bp: make it depend on EDAC=y so that the EDAC core doesn't get built as a module. ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Johannes Thumshirn <jth@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@ozlabs.org Cc: morbidrsa@gmail.com Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
2019-05-06Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Support for varying MCA bank numbers per CPU: this is in preparation for future CPU enablement (Yazen Ghannam) - MCA banks read race fix (Tony Luck) - Facility to filter MCEs which should not be logged (Yazen Ghannam) - The usual round of cleanups and fixes * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models x86/MCE: Add an MCE-record filtering function RAS/CEC: Increment cec_entered under the mutex lock x86/mce: Fix debugfs_simple_attr.cocci warnings x86/mce: Remove mce_report_event() x86/mce: Handle varying MCA bank counts x86/mce: Fix machine_check_poll() tests for error types MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE x86/MCE: Group AMD function prototypes in <asm/mce.h>
2019-04-25Revert "EDAC/amd64: Support more than two controllers for chip select handling"Borislav Petkov
This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea. Unfortunately, this commit caused wrong detection of chip select sizes on some F17h client machines: --- 00-rc6+ 2019-02-14 14:28:03.126622904 +0100 +++ 01-rc4+ 2019-04-14 21:06:16.060614790 +0200 EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0MB EDAC MC: UMC1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB -EDAC amd64: MC: 2: 16383MB 3: 16383MB +EDAC amd64: MC: 2: 0MB 3: 2097151MB EDAC amd64: MC: 4: 0MB 5: 0MB EDAC amd64: MC: 6: 0MB 7: 0M Revert it for now until it has been solved properly. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2019-04-23x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h modelsYazen Ghannam
AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA errors under certain conditions. The errors are benign and can safely be ignored. However, the high error rate may cause the MCA threshold counter to overflow causing a high rate of thresholding interrupts. In addition, users may see the errors reported through the AMD MCE decoder module, even with the interrupt disabled, due to MCA polling. Clear the "Counter Present" bit in the Instruction Fetch bank's MCA_MISC0 register. This will prevent enabling MCA thresholding on this bank which will prevent the high interrupt rate due to this error. Define an AMD-specific function to filter these errors from the MCE event pool so that they don't get reported during early boot. Rename filter function in EDAC/mce_amd to avoid a naming conflict, while at it. [ bp: Move function prototype to the internal header and massage/cleanup, fix typos. ] Reported-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "clemej@gmail.com" <clemej@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Cc: <stable@vger.kernel.org> # 5.0.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models Cc: <stable@vger.kernel.org> # 5.0.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk Cc: <stable@vger.kernel.org> # 5.0.x: 9308fd407455: x86/MCE: Group AMD function prototypes in <asm/mce.h> Cc: <stable@vger.kernel.org> # 5.0.x Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com
2019-04-02EDAC/altera, firmware/intel: Add Stratix10 ECC DBE SMC callThor Thayer
Reserve ECC Double Bit Error SMC call to alert U-Boot that a DBE has occurred. Move the call from local EDAC header file to a common header. [ bp: Merge the two patches. ] Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Richard Gong <richard.gong@intel.com> Reviewed-by: Alan Tull <atull@kernel.org> # firmware Cc: Greg KH <greg@kroah.com> Cc: James Morse <james.morse@arm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@kernel.org Link: https://lkml.kernel.org/r/1553870639-23895-1-git-send-email-thor.thayer@linux.intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2019-03-29EDAC/altera: Initialize peripheral FIFOs in probe()Thor Thayer
The FIFO memory and ECC initialization doesn't need to be done as a separate operation early in the startup. Improve the Arria10 and Stratix10 peripheral FIFO init by initializing memory and enabling ECC as part of the device driver initialization. Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/1553635771-32693-2-git-send-email-thor.thayer@linux.intel.com
2019-03-29EDAC/altera: Do less intrusive error injectionThor Thayer
Improve the Arria10 and Stratix10 error injection routine by reading the data and changing just 1 bit before writing back out. Previous routine would overwrite the first bytes to 0 then change 1 bit but this method is less intrusive. Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/1553635771-32693-1-git-send-email-thor.thayer@linux.intel.com
2019-03-27EDAC/amd64: Adjust printed chip select sizes when interleavedYazen Ghannam
AMD systems may support chip select interleaving. However, on family 17h+ this was not taken into account when printing the chip select sizes. Add support to detect if chip selects are interleaved on family 17h+, and adjust the sizes accordingly. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-6-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Support more than two controllers for chip select handlingYazen Ghannam
The struct chip_select array that's used for saving chip select bases and masks is fixed at length of two. There should be one struct chip_select for each controller, so this array should be increased to support systems that may have more than two controllers. Increase the size of the struct chip_select array to eight, which is the largest number of controllers per die currently supported on AMD systems. Also, carve out the Family 17h+ reading of the bases/masks into a separate function. This effectively reverts the original bases/masks reading code to before Family 17h support was added. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Recognize x16 symbol sizeYazen Ghannam
Future AMD systems may support x16 symbol sizes. Recognize if a system is using x16 symbol size. Also, simplify the print statement. Note that a x16 syndrome vector table is not necessary like with x4 or x8 syndromes. This is because systems that support x16 symbol sizes are SMCA systems and in that case, the syndrome can be directly extracted from the MCA_SYND[Syndrome] field. [ bp: massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Set maximum channel layer size depending on familyYazen Ghannam
The AMD64 EDAC module currently hardcodes the EDAC channel layer size count to two. Future AMD systems may have more channels than this. Set the EDAC channel layer size equal to the maximum number of channels possible for the system. On Family 17h and later, this is set in the num_umcs variable. Older systems will continue to use two as the default. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190325203319.7603-1-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Support more than two Unified Memory ControllersYazen Ghannam
The first few models of Family 17h all had 2 Unified Memory Controllers per Die, so this was treated as a fixed value. However, future systems may have more Unified Memory Controllers per Die. Related to this, the channel number and base address of a Unified Memory Controller were found by matching on fixed, known values. However, current and future systems follow this pattern for the channel number and base address of a Unified Memory Controller: 0xYXXXXX, where Y is the channel number. So matching on hardcoded values is not necessary. Set the number of Unified Memory Controllers at driver init time based on the family/model. Also, update the functions that find the channel number and base address of a Unified Memory Controller to support more than two. [ bp: Move num_umcs into the .c file and simplify comment. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Use a macro for iterating over Unified Memory ControllersYazen Ghannam
Define and use a macro for looping over the number of Unified Memory Controllers. No functional change. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-2-Yazen.Ghannam@amd.com
2019-03-27EDAC/amd64: Add Family 17h Model 30h PCI IDsYazen Ghannam
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module. This also fixes a probe failure that appeared when some other PCI IDs for Family 17h Model 30h were added to the AMD NB code. Fixes: be3518a16ef2 (x86/amd_nb: Add PCI device IDs for family 17h, model 30h) Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
2019-03-23EDAC, altera: Fix S10 Double Bit Error NotificationThor Thayer
Stratix10 Double Bit Error Address was always read from SDRAM Address register instead of each device's Address register. To determine which device had the DBE, cycle through the EDAC devices comparing the DBE value to the db_irq value. Once found, report the DBE Address from the device registers as well as the device name. Finally, notify the system via an SMC call and indicate the panic should result in a system reboot. Change a run-time check to a Stratix10 compile-time check for a clean SMC notification. Fixes: d5fc9125566c ("EDAC, altera: Combine Stratix10 and Arria10 probe functions") Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/1552490842-25440-1-git-send-email-thor.thayer@linux.intel.com
2019-03-23EDAC, skx, i10nm: Make skx_common.c a pure libraryQiuxu Zhuo
The following Kconfig constellations fail randconfig builds: CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=m CONFIG_EDAC_I10NM=y or CONFIG_ACPI_NFIT=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_SKX=y CONFIG_EDAC_I10NM=m with: ... CC [M] drivers/edac/skx_common.o ... .../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module' That is because if one of the two drivers - skx_edac or i10nm_edac - is built-in and the other one is a module, the shared file skx_common.c gets linked into a module object by kbuild. Therefore, when linking that same file into vmlinux, the '__this_module' symbol used in debugfs isn't defined, leading to the above error. Fix it by moving all debugfs code from skx_common.c to both skx_base.c and i10nm_base.c respectively. Thus, skx_common.c doesn't refer to the '__this_module' symbol anymore. Clarify skx_common.c's purpose at the top of the file for future reference, while at it. [ bp: Make text more readable. ] Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190321221339.GA32323@agluck-desk
2019-03-08Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: "This time around we have in store: - Disable MC4_MISC thresholding banks on all AMD family 0x15 models (Shirish S) - AMD MCE error descriptions update and error decode improvements (Yazen Ghannam) - The usual smaller conversions and fixes" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Improve error message when kernel cannot recover, p2 EDAC/mce_amd: Decode MCA_STATUS in bit definition order EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit EDAC, mce_amd: Print ExtErrorCode and description on a single line EDAC, mce_amd: Match error descriptions to latest documentation x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types RAS: Add a MAINTAINERS entry RAS: Use consistent types for UUIDs x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models x86/MCE: Switch to use the new generic UUID API
2019-03-08Merge tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: - A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler) - New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck) - Altera SDRAM functionality carveout for separate enablement of RAS and SDRAM capabilities on some Altera chips. (Thor Thayer) - The usual round of cleanups and fixes And last but not least: recruit James Morse as a reviewer for the ARM side. * tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC/altera: Add separate SDRAM EDAC config EDAC, altera: Add missing of_node_put() EDAC, skx_common: Add code to recognise new compound error code EDAC, i10nm: Fix randconfig builds EDAC, i10nm: Add a driver for Intel 10nm server processors EDAC, skx_edac: Delete duplicated code EDAC, skx_common: Separate common code out from skx_edac EDAC: Do not check return value of debugfs_create() functions EDAC: Add James Morse as a reviewer dt-bindings, EDAC: Add Aspeed AST2500 EDAC, aspeed: Add an Aspeed AST2500 EDAC driver
2019-02-26EDAC/altera: Add separate SDRAM EDAC configThor Thayer
The CONFIG_ALTERA_EDAC Kconfig symbol always enables the SDRAM EDAC functionality. On the newer architectures, however, there are cases where the peripheral EDAC functionality is enabled but SDRAM needs to be disabled. Move SDRAM functions so they can be contained inside the conditional CONFIG. Create new CONFIG option just for SDRAM. [ bp: Massage commit message. ] Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: dinguyen@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linux@armlinux.org.uk Link: https://lkml.kernel.org/r/1551121006-4657-2-git-send-email-thor.thayer@linux.intel.com
2019-02-15EDAC/mce_amd: Decode MCA_STATUS in bit definition orderYazen Ghannam
Sort the MCA_STATUS bits in decode output to follow how they are defined in the register. The order is as follows: Bit | Decode ------------ 62 | Over 61 | UC 59 | MiscV 58 | AddrV 57 | PCC 55 | TCC 53 | SyndV 46 | CECC 45 | UECC 44 | Deferred 43 | Poison 40 | Scrub [ bp: Massage a bit. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190212212417.107049-2-Yazen.Ghannam@amd.com
2019-02-15EDAC/mce_amd: Decode MCA_STATUS[Scrub] bitYazen Ghannam
Previous AMD systems have had a bit in MCA_STATUS to indicate that an error was detected on a scrub operation. However, this bit was defined differently within different banks and families/models. Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero or defined as "Scrub", for all MCA banks and CPU models. Therefore, this bit can be defined as the "Scrub" bit. Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding module for Family 17h and newer systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190212212417.107049-1-Yazen.Ghannam@amd.com
2019-02-15EDAC, altera: Add missing of_node_put()Huang Zijiang
The call to of_parse_phandle() returns a node pointer with refcount incremented thus it must be explicitly decremented here after the last usage. Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: wang.yi59@zte.com.cn Link: https://lkml.kernel.org/r/1550126347-27984-1-git-send-email-huang.zijiang@zte.com.cn
2019-02-06EDAC, skx_common: Add code to recognise new compound error codeTony Luck
A new error code for systems that use DRAM as an extra level of cache looks like: 000F 0010 1MMM CCCC where the MMM and CCCC bits are used for the same purpose as the original code. For this new class of errors the ADXL translation will provide details of both the DIMM used as cache for the error location and the component that is being cached. Note: This new error code is first supported in Skylake. Older EDAC drivers do not need to be updated. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aristeu Rozanski <aris@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190205182109.27828-1-tony.luck@intel.com
2019-02-06EDAC, i10nm: Fix randconfig buildsTony Luck
I10NM_EDAC depends on CONFIG_ACPI so make that dependency explicit. Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aristeu Rozanski <aris@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190205180200.26865-1-tony.luck@intel.com
2019-02-04EDAC, mce_amd: Print ExtErrorCode and description on a single lineYazen Ghannam
Save a log line by printing the extended error code and the description on a single line. This is similar to how errors are printed in other subsystems, e.g. "#, description". If we don't have a valid description then only the number/code is printed. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190201225534.8177-6-Yazen.Ghannam@amd.com
2019-02-03EDAC, mce_amd: Match error descriptions to latest documentationYazen Ghannam
Update the error descriptions to match the latest documentation for easier searching. In some cases the changes are small and in other cases the changes may be total rewording of the description. No functional changes. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190201225534.8177-5-Yazen.Ghannam@amd.com
2019-02-03x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank typesYazen Ghannam
Some SMCA bank types on future systems will report new error types even though the bank type is not treated as a new version. These new error types will reported by bits that are reserved in past systems. Add the new error descriptions to the lists in edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com
2019-02-03x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU unitsYazen Ghannam
The existing CS, PSP, and SMU SMCA bank types will see new versions (as indicated by their McaTypes) in future SMCA systems. Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the same names as the older versions, since they are logically the same to the user. SMCA systems won't mix and match IP blocks with different McaType versions in the same system, so there isn't a need to distinguish them. The MCA_IPID register is saved when logging an MCA error, and that can be used to triage the error. Also, add the new error descriptions to edac_mce_amd. Some error types (positions in the list) are overloaded compared to the previous McaTypes. Therefore, just create new lists of the error descriptions to keep things simple even if some of the error descriptions are the same between versions. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com
2019-02-03x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank typesYazen Ghannam
Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and PCIE SMCA bank types. Also, add their respective error descriptions to the MCE decoding module edac_mce_amd. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Shirish S <Shirish.S@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.com
2019-02-02EDAC, i10nm: Add a driver for Intel 10nm server processorsQiuxu Zhuo
This driver supports the Intel 10nm series server integrated memory controller. It gets the memory capacity and topology information by reading the registers in PCI configuration space and memory-mapped I/O. It decodes the memory error address to the platform specific address by using the ACPI Address Translation (ADXL) Device Specific Method (DSM). Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190130191519.15393-5-tony.luck@intel.com
2019-02-02EDAC, skx_edac: Delete duplicated codeQiuxu Zhuo
Delete the duplicated code from skx_edac.c and rename skx_edac.c to skx_base.c. Update the Makefile to build the skx_edac driver from skx_base.c and skx_common.c. Add SPDX to skx_base.c and clean out unnecessary #include lines. [ bp: Drop the license boilerplate - there's an SPDX identifier now. ] Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190130191519.15393-4-tony.luck@intel.com
2019-02-02EDAC, skx_common: Separate common code out from skx_edacQiuxu Zhuo
Parts of skx_edac can be shared with the Intel 10nm server EDAC driver. Carve out the common parts from skx_edac in preparation to support both skx_edac driver and i10nm_edac drivers. Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190130191519.15393-3-tony.luck@intel.com
2019-01-24EDAC, altera: Fix S10 persistent register offsetThor Thayer
Correct the persistent register offset where address and status are stored. Fixes: 08f08bfb7b4c ("EDAC, altera: Merge Stratix10 into the Arria10 SDRAM probe routine") Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: devicetree@vger.kernel.org Cc: dinguyen@kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: robh+dt@kernel.org Cc: stable <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/1548179287-21760-2-git-send-email-thor.thayer@linux.intel.com
2019-01-23EDAC: Do not check return value of debugfs_create() functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. [ bp: Make edac_debugfs_init() return void too, while at it. ] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20190122152151.16139-17-gregkh@linuxfoundation.org
2019-01-18EDAC, aspeed: Add an Aspeed AST2500 EDAC driverStefan M Schaeckeler
Add support for the Aspeed AST2500 SoC. Signed-off-by: Stefan M Schaeckeler <sschaeck@cisco.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Jeffery <andrew@aj.id.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: devicetree@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-aspeed@lists.ozlabs.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/1547743097-5236-2-git-send-email-schaecsn@gmx.net
2018-12-19EDAC, fsl_ddr: Add LS1021A to the list of supported hardwarePatrick Havelange
The Freescale ddr driver also works on the LS1021A board. Signed-off-by: Patrick Havelange <patrick.havelange@essensium.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: York Sun <york.sun@nxp.com> Cc: arnout.vandecappelle@essensium.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: matthew.weber@rockwellcollins.com Cc: patrick.havelange@essensium.com Link: https://lkml.kernel.org/r/20181219104323.10324-1-patrick.havelange@essensium.com
2018-12-11EDAC, i5000: Remove set but not used local variablesYueHaibing
Remove unused local variables as reported by gcc's -Wunused-but-set-variable option. [ bp: simplify commit message. ] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20181211095207.25936-1-yuehaibing@huawei.com
2018-11-20EDAC, i82975x: Fix spelling mistake "reserverd" -> "reserved"Alexandre Belloni
Fix a spelling mistake in a register layout description. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "Arvind R." <arvino55@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20181120153304.1218-1-alexandre.belloni@bootlin.com
2018-11-20EDAC, fsl: Move error injection under CONFIG_EDAC_DEBUGYork Sun
Gate error injection feature with CONFIG_EDAC_DEBUG so that it is not visible in production setups. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: York Sun <york.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/20181119225303.13265-1-york.sun@nxp.com
2018-11-16EDAC, skx: Let EDAC core show the decoded result for debugfsQiuxu Zhuo
Current debugfs shows the decoded result in its own print format which is inconvenient for analysis/statistics. Use skx_mce_check_error() instead of skx_decode() for debugfs, then the decoded result is showed via EDAC core in a more readable format like "CPU_SrcID#[0-9]_MC#[0-9]_Chan#[0-9]_DIMM#[0-9]". Print a warning the first time this interface is used so the administrator can see the console log that error(s) have been faked. Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: arozansk@redhat.com CC: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1542353705-13531-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16EDAC, skx: Move debugfs node under EDAC's hierarchyQiuxu Zhuo
The debugfs node is /sys/kernel/debug/skx_edac_test. Rename it and move under EDAC debugfs root directory. Remove the unused 'skx_fake_addr' and remove the 'skx_test' on error. Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: arozansk@redhat.com CC: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1542353684-13496-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16EDAC, skx: Prepend hex formatting with '0x'Qiuxu Zhuo
Some debug/error strings in hex formatting do not have the '0x' prefix. Prepend hex formatting with '0x' for them, but with one exception: "Couldn't enable %04x:%04x", instead of putting '0x' in this line, add the word 'device'. We commonly use 8086:1234 without the leading '0x' (e.g. as '-d' argument to lspci(8) and setpci(8) commands). Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Tony Luck <tony.luck@intel.com> CC: arozansk@redhat.com CC: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1542353660-13458-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-16EDAC, skx: Fix function calling order in skx_exit()Qiuxu Zhuo
The order of function calling in skx_exit() is not the reversed order in skx_init(). Fix it. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Tony Luck <tony.luck@intel.com> CC: arozansk@redhat.com CC: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1542353616-13421-1-git-send-email-qiuxu.zhuo@intel.com
2018-11-13EDAC: Drop per-memory controller busesBorislav Petkov
... and use the single edac_subsys object returned from subsys_system_register(). The idea is to have a single bus and multiple devices on it. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> CC: Aristeu Rozanski Filho <arozansk@redhat.com> CC: Greg KH <gregkh@linuxfoundation.org> CC: Justin Ernst <justin.ernst@hpe.com> CC: linux-edac <linux-edac@vger.kernel.org> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Russ Anderson <rja@hpe.com> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20180926152752.GG5584@zn.tnic
2018-11-13EDAC: Don't add devices under /sys/bus/edacTony Luck
Nobody(*) uses them. Dropping this will allow us to make the total number of memory controllers configurable (as we won't have to worry about duplicated device names under this directory). (*) https://lkml.kernel.org/r/20180927221054.580220e5@coco.lan Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> CC: Aristeu Rozanski Filho <arozansk@redhat.com> CC: Greg KH <gregkh@linuxfoundation.org> CC: Justin Ernst <justin.ernst@hpe.com> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Russ Anderson <rja@hpe.com> CC: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20181001224313.GA9487@agluck-desk
2018-11-10EDAC: Fix indentation issues in several EDAC driversColin Ian King
Replace spaces with tabs and insert missing indentation. [ bp: Rewrite commit message. ] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: "Arvind R." <arvino55@gmail.com> CC: Mark Gross <mark.gross@intel.com> CC: Mauro Carvalho Chehab <mchehab@kernel.org> CC: Ranganathan Desikan <ravi@jetztechnologies.com> CC: kernel-janitors@vger.kernel.org CC: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20181109133757.21471-1-colin.king@canonical.com