summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits) gitignore: fix wording mfd: ab8500-debugfs: fix "between" in printk memstick: trivial fix of spelling mistake on management cpupowerutils: bench: fix "average" treewide: Fix typos in printk IB/mlx4: printk fix pinctrl: sirf/atlas7: fix printk spelling serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/ w1: comment spelling s/minmum/minimum/ Blackfin: comment spelling s/divsor/divisor/ metag: Fix misspellings in comments. ia64: Fix misspellings in comments. hexagon: Fix misspellings in comments. tools/perf: Fix misspellings in comments. cris: Fix misspellings in comments. c6x: Fix misspellings in comments. blackfin: Fix misspelling of 'register' in comment. avr32: Fix misspelling of 'definitions' in comment. treewide: Fix typos in printk Doc: treewide : Fix typos in DocBook/filesystem.xml ...
2016-05-16Merge tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: "It was pretty busy in EDAC land this time: - Altera Arria10 L2 cache and On-Chip RAM ECC handling (Thor Thayer) - Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac (Tony Luck) - Do not register sb_edac with pci_register_driver() (Tony Luck) - Add support for Skylake to ie31200_edac (Jason Baron) - Do not register amd64_edac with pci_register_driver() (Borislav Petkov) ... plus the usual round of cleanups and fixes all over the place" * tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (25 commits) EDAC, amd64_edac: Drop pci_register_driver() use EDAC, ie31200_edac: Add Skylake support EDAC, sb_edac: Use cpu family/model in driver detection EDAC, i7core: Remove double buffering of error records EDAC, amd64_edac: Issue driver banner only on success ARM: socfpga: Initialize Arria10 OCRAM ECC on startup EDAC: Increment correct counter in edac_inc_ue_error() EDAC, sb_edac: Remove double buffering of error records EDAC: Fix used after kfree() error in edac_unregister_sysfs() EDAC, altera: Avoid unused function warnings EDAC, altera: Remove useless casts ARM: socfpga: Enable Arria10 OCRAM ECC on startup EDAC, altera: Add Arria10 OCRAM ECC support Documentation: dt: socfpga: Add Altera Arria10 OCRAM binding EDAC, altera: Make OCRAM ECC dependency check generic EDAC, altera: Add register offset for ECC Enable EDAC, altera: Extract error inject operations to a struct fops ARM: socfpga: Enable Arria10 L2 cache ECC on startup EDAC, altera: Add Arria10 L2 Cache ECC handling Documentation, dt, socfpga: Add Altera Arria10 L2 cache binding ...
2016-05-12EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCAYazen Ghannam
Use X86_FEATURE_SMCA when detecting if SMCA is available instead of directly using CPUID 0x80000007_EBX. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-09EDAC, amd64_edac: Drop pci_register_driver() useBorislav Petkov
- remove homegrown instances counting. - take F3 PCI device from amd_nb caching instead of F2 which was used with the PCI core. With those changes, the driver doesn't need to register a PCI driver and relies on the northbridges caching which we do anyway on AMD. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>
2016-05-06EDAC, ie31200_edac: Add Skylake supportJason Baron
Skylake adjusts some register locations, but otherwise follows the existing model quite closely. I was able to verify that the 'ce_count' increments when 'bad dimms' are used. The accounting of 'ce_count' and 'ue_count' is the primary functionality of interest for us. Tested on Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz. Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462547927-22679-1-git-send-email-jbaron@akamai.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-05-02EDAC, sb_edac: Use cpu family/model in driver detectionTony Luck
Instead of picking a random PCI ID from the dozen or so we need to access, just use x86_match_cpu() to pick based on CPU model number. The choosing of PCI devices has been problematic in the past, see 11249e739929 ("sb_edac: Fix detection on SNB machines") which fixed problems introduced by d0585cd815fa ("sb_edac: Claim a different PCI device"). This is especially ugly if future hardware might not even have EDAC-relevant registers in PCI config space and we would still be required to choose some "random" PCI devices to scan for just so our driver loads. Is this cleaner/clearer? It deletes much more code than it adds. Only tested on Broadwell. The driver loads/unloads and loads again. Still decodes errors too. Signed-off-by: Tony Luck <tony.luck@intel.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-29EDAC, i7core: Remove double buffering of error recordsTony Luck
In the bad old days the functions from x86_mce_decoder_chain could be called in machine check context. So we used to carefully copy them and defer processing until later. But in f29a7aff4bd60 ("x86/mce: Avoid potential deadlock due to printk() in MCE context") we switched the logging code to save the record in a genpool, and call the functions that registered to be notified later from a work queue. So drop all the double buffering and do all the work we want to do as soon as i7core_mce_check_error() is called. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/29ab2c370915c6e132fc5d88e7b72cb834bedbfe.1461855008.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-29EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callbackTony Luck
Both of these drivers can return NOTIFY_BAD, but this terminates processing other callbacks that were registered later on the chain. Since the driver did nothing to log the error it seems wrong to prevent other interested parties from seeing it. E.g. neither of them had even bothered to check the type of the error to see if it was a memory error before the return NOTIFY_BAD. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-27EDAC, amd64_edac: Issue driver banner only on successBorislav Petkov
... and don't mislead users into thinking that the driver has loaded successfully. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23EDAC: Increment correct counter in edac_inc_ue_error()Emmanouil Maroudas
Fix typo in edac_inc_ue_error() to increment ue_noinfo_count instead of ce_noinfo_count. Signed-off-by: Emmanouil Maroudas <emmanouil.maroudas@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 4275be635597 ("edac: Change internal representation to work with layers") Link: http://lkml.kernel.org/r/1461425580-5898-1-git-send-email-emmanouil.maroudas@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23EDAC, sb_edac: Remove double buffering of error recordsTony Luck
In the bad old days the functions from x86_mce_decoder_chain could be called in machine check context. So we used to carefully copy them and defer processing until later. But in f29a7aff4bd60 ("x86/mce: Avoid potential deadlock due to printk() in MCE context") we switched the logging code to save the record in a genpool, and call the functions that registered to be notified later from a work queue. So drop all the double buffering and do all the work we want to do as soon as sbridge_mce_check_error() is called. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: patrickg@supermicro.com Link: http://lkml.kernel.org/r/100025611cd780d9bca72792b2b2146760da53e0.1460756761.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23EDAC: Fix used after kfree() error in edac_unregister_sysfs()Tony Luck
Code flow looks like this: device_unregister(&mci->dev); -> kobject_put+0x25/0x50 -> kobject_cleanup+0x77/0x190 -> device_release+0x32/0xa0 -> mci_attr_release+0x36/0x70 -> kfree(mci); bus_unregister(mci->bus); Fix is to grab a local copy of "mci->bus" and use that when we call bus_unregister(). Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23EDAC, altera: Avoid unused function warningsArnd Bergmann
The recently added Arria10 OCRAM ECC support caused some new harmless warnings about unused functions when it is disabled: drivers/edac/altera_edac.c:1067:20: error: 'altr_edac_a10_ecc_irq' defined but not used [-Werror=unused-function] drivers/edac/altera_edac.c:658:12: error: 'altr_check_ecc_deps' defined but not used [-Werror=unused-function] This rearranges the code slightly to have those two functions inside of the same #ifdef that hides their callers. It also manages to avoid a forward declaration of the IRQ handler in the process. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thor Thayer <tthayer@opensource.altera.com> Cc: Alan Tull <atull@opensource.altera.com> Cc: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: c7b4be8db8bc ("EDAC, altera: Add Arria10 OCRAM ECC support") Link: http://lkml.kernel.org/r/1460837650-1237650-2-git-send-email-arnd@arndb.de Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-23EDAC, altera: Remove useless castsArnd Bergmann
The altera EDAC driver refers to its per-device data using a cast to '(void *)', which makes the pointer non-const, though both the source and destination are actually const. Removing the annotation makes the reference (almost) fit into a single line for improved readability, and ensures that it is actually defined as const. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thor Thayer <tthayer@opensource.altera.com> Cc: Alan Tull <atull@opensource.altera.com> Cc: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1460837650-1237650-1-git-send-email-arnd@arndb.de Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-22x86 EDAC, sb_edac.c: Take account of channel hashing when neededTony Luck
Haswell and Broadwell can be configured to hash the channel interleave function using bits [27:12] of the physical address. On those processor models we must check to see if hashing is enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and act accordingly. Based on a patch by patrickg <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel addressTony Luck
In commit: eb1af3b71f9d ("Fix computation of channel address") I switched the "sck_way" variable from holding the log2 value read from the h/w to instead be the actual number. Unfortunately it is needed in log2 form when used to shift the address. Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Fixes: eb1af3b71f9d ("Fix computation of channel address") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-18treewide: Fix typos in printkMasanari Iida
This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-07EDAC, altera: Add Arria10 OCRAM ECC supportThor Thayer
Add Arria10 On-Chip RAM ECC handling. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459992174-8015-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-02EDAC, altera: Make OCRAM ECC dependency check genericThor Thayer
In preparation for the Arria10 peripheral ECCs, move the OCRAM ECC dependency check into the general ECC area since this same function can be used by other memories. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459450087-24792-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-02EDAC, altera: Add register offset for ECC EnableThor Thayer
In preparation for the Arria10 peripheral ECCs, add a register offset from the ECC base to index to the ECC enable register. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459450087-24792-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-02EDAC, altera: Extract error inject operations to a struct fopsThor Thayer
In preparation for the Arria10 peripheral ECCs, extract the inject file operations because the Arria10 IRQ trigger mechanism is different than Cyclone5/Arria5 and Arria10 L2 cache. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1459450087-24792-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29EDAC, altera: Add Arria10 L2 Cache ECC handlingThor Thayer
Add a private data structure for Arria10 L2 cache ECC and the probe function for it. The Arria10 ECC device IRQs are in a shared register so the ECC Manager parent/child relationship requires a different probe function. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-8-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29EDAC, altera: Add register offset for ECC Error InjectThor Thayer
In preparation for the Arria10 peripheral ECCs, add a register offset from the ECC base to the private data structure to index to the error injection register. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-6-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29EDAC, altera: Abstract ECC Enable Mask in check_deps()Thor Thayer
In preparation for the Arria10 peripheral ECCs, use the ECC Enable mask in place of hard coded masks in the check dependency functions. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-5-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29EDAC, altera: Remove platform device from check_deps()Thor Thayer
In preparation for the Arria10 peripheral ECCs, remove the platform device parameter from the check_deps() functions because it is not needed and makes the Arria10 check_deps() cleaner. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29EDAC, altera: Move device structs and defines to the headerThor Thayer
Move the device structs and defines to altera_edac.h in preparation for adding the Arria10 L2 cache ECC. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-29EDAC, altera: Make L2C depend on L2x0 cache controllerThor Thayer
Make L2 cache depend instead of forcibly select the L2 cache support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1458576106-24505-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-03-16Merge tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: - Altera: L2 cache and On-Chip RAM support (Thor Thayer). - EDAC: Workqueue handling cleanups (Borislav Petkov). - Xgene: Register bus error handling (Loc Ho). - Misc small fixes. * tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: ARM: socfpga: Enable OCRAM ECC on startup ARM: socfpga: Enable L2 cache ECC on startup ARM: dts: Add Altera L2 Cache and OCRAM EDAC entries EDAC, altera: Add Altera L2 cache and OCRAM support EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit() EDAC, mpc85xx: Silence unused variable warning EDAC: Cleanup/sync workqueue functions EDAC: Kill workqueue setup/teardown functions EDAC: Balance workqueue setup and teardown arm64: Update the APM X-Gene EDAC node with the RB register resource EDAC, xgene: Add missing SoC register bus error handling Documentation, EDAC: Update xgene binding for missing register bus EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()
2016-03-14Merge branch 'ras-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call
2016-03-10EDAC/sb_edac: Fix computation of channel addressLuck, Tony
Large memory Haswell-EX systems with multiple DIMMs per channel were sometimes reporting the wrong DIMM. Found three problems: 1) Debug printouts for socket and channel interleave were not interpreting the register fields correctly. The socket interleave field is a 2^X value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2, 2=3. 3=4). 2) Actual use of the socket interleave value didn't interpret as 2^X 3) Conversion of address to channel address was complicated, and wrong. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errorsAravind Gopalakrishnan
For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-07EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon PhiHubert Chrzaniuk
Correct a typo introduced by d0cdf9003140 ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support") As a result under some configurations DIMMs were not correctly recognized. Problem affects only Xeon Phi architecture. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-11EDAC, altera: Add Altera L2 cache and OCRAM supportThor Thayer
Add L2 Cache and On-Chip RAM EDAC support for the Altera SoCs. The SDRAM controller is using the Memory Controller model. Each type of ECC is individually configurable. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: galak@codeaurora.org Cc: grant.likely@linaro.org Cc: ijc+devicetree@hellion.org.uk Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-doc@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: pawel.moll@arm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1455132384-17108-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-10EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit()Thor Thayer
debugfs_remove() is used to remove a file or a directory from the debugfs filesystem on an EDAC device exit. However edac_debugfs might not be empty. This is similar to 30f84a891bf6 ("EDAC: Use edac_debugfs_remove_recursive()") which changed the EDAC MCI code to use edac_debugfs_remove_recursive(). Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1455064165-3816-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC, mpc85xx: Silence unused variable warningSudip Mukherjee
We were getting this build warning: drivers/edac/mpc85xx_edac.c:1247:6: warning: unused variable 'pvr' pvr is only used if CONFIG_FSL_SOC_BOOKE is defined. Declare it __maybe_unused. Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1454427573-7994-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Cleanup/sync workqueue functionsBorislav Petkov
They're both running only when ->edac_check is initialized so remove that check from the workqueue function itself. Synchronize/generalize the ->op_state check between the two. Kill useless comments, while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Kill workqueue setup/teardown functionsBorislav Petkov
We have the generic wrappers now, use those. edac_pci_workq_setup() had an unused argument anyway. Signed-off-by: Borislav Petkov <bp@suse.de>
2016-02-02EDAC: Balance workqueue setup and teardownBorislav Petkov
We use the ->edac_check function pointers to determine whether we need to setup a polling workqueue. However, the destroy path is not balanced and we might try to teardown an unitialized workqueue. Balance init and destroy paths by looking at ->edac_check in both cases. Set op_state to OP_OFFLINE *before* destroying anything. Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-25EDAC, xgene: Add missing SoC register bus error handlingLoc Ho
Add missing register bus error handling for APM X-Gene EDAC SoC and fix a checking condition for CE error promoted to UE. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Link: http://lkml.kernel.org/r/1453495625-28006-3-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-25EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()Dan Carpenter
dct_sel_base_off is declared as a u64 but we're only using the lower 32 bits because of a shift wrapping bug. This can possibly truncate the upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS row. Fixes: c8e518d5673d ('amd64_edac: Sanitize f10_get_base_addr_offset') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda Signed-off-by: Borislav Petkov <bp@suse.de>
2016-01-01EDAC, i5100: Use to_delayed_work()Geliang Tang
Use to_delayed_work() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/58c0e319c7263a10b692100c657c06c42814aecf.1451659910.git.geliangtang@163.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC, sb_edac: Set fixed DIMM width on Xeon Knights LandingHubert Chrzaniuk
Knights Landing does not come with register that could be used to fetch DIMM width. However the value is fixed for this architecture so it can be hardcoded. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Rework workqueue handlingBorislav Petkov
Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with 91b99041c1d5 ("drivers/edac: updated PCI monitoring") Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Make edac_device workqueue setup/teardown functions staticBorislav Petkov
They're not used anywhere else. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Remove edac_get_sysfs_subsys() error handlingBorislav Petkov
It cannot fail now. We either load EDAC core after having successfully initialized edac_subsys or we don't. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Unexport and make edac_subsys staticBorislav Petkov
... and use the accessor instead. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Rip out the edac_subsys reference countingBorislav Petkov
This was really dumb - reference counting for the main EDAC sysfs object. While we could've simply registered it as the first thing in the module init path and then hand it around to what needs it. Do that and rip out all the code around it, thus simplifying the whole handling significantly. Move the edac_subsys node back to edac_module.c. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Robustify workqueues destructionBorislav Petkov
EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work *synchronously* too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>
2015-12-11EDAC, mc_sysfs: Fix freeing bus' nameBorislav Petkov
I get the splat below when modprobing/rmmoding EDAC drivers. It happens because bus->name is invalid after bus_unregister() has run. The Code: section below corresponds to: .loc 1 1108 0 movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus .loc 1 1109 0 popq %rbx # .loc 1 1108 0 movq (%rax), %rdi # _7->name, jmp kfree # and %rax has some funky stuff 2030203020312030 which looks a lot like something walked over it. Fix that by saving the name ptr before doing stuff to string it points to. general protection fault: 0000 [#1] SMP Modules linked in: ... CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48 Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011 task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000 RIP: 0010:[<ffffffffa019da92>] [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292 RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286 RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110 R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68 R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000 FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0 Stack: Call Trace: i7core_unregister_mci.isra.9 i7core_remove pci_device_remove __device_release_driver driver_detach bus_remove_driver pci_unregister_driver i7core_exit SyS_delete_module system_call_fastpath 0x7fc9bf426536 Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b RIP [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP <ffff88030da3fe28> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: <stable@vger.kernel.org> # v3.6.. Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
2015-12-11EDAC, mpc85xx: Make mpc85xx-pci-edac a platform deviceScott Wood
Originally the mpc85xx-pci-edac driver bound directly to the PCI controller node. Commit 905e75c46dba ("powerpc/fsl-pci: Unify pci/pcie initialization code") turned the PCI controller code into a platform device. Since we can't have two drivers binding to the same device, the EDAC code was changed to be called into as a library-style submodule. However, this doesn't work if the EDAC driver is built as a module. Commit 8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting") exposed another problem with this approach -- mpc85xx_pci_err_probe() was being called in the same early boot phase that the PCI controller is initialized, rather than in the device_initcall phase that the EDAC layer expects. This caused a crash on boot. To fix this, the PCI controller code now creates a child platform device specifically for EDAC, which the mpc85xx-pci-edac driver binds to. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daniel Axtens <dja@axtens.net> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Jia Hongtao <B38951@freescale.com> Cc: Jiri Kosina <jkosina@suse.com> Cc: Kim Phillips <kim.phillips@freescale.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Masanari Iida <standby24x7@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.com Signed-off-by: Borislav Petkov <bp@suse.de>