<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/edac, branch linux-6.9.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2024-06-16T11:51:13Z</updated>
<entry>
<title>EDAC/igen6: Convert PCIBIOS_* return codes to errnos</title>
<updated>2024-06-16T11:51:13Z</updated>
<author>
<name>Ilpo Järvinen</name>
<email>ilpo.jarvinen@linux.intel.com</email>
</author>
<published>2024-05-27T13:22:35Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=419dde0dc88404d2610b03858e97dc2ce7f25a8d'/>
<id>urn:sha1:419dde0dc88404d2610b03858e97dc2ce7f25a8d</id>
<content type='text'>
commit f8367a74aebf88dc8b58a0db6a6c90b4cb8fc9d3 upstream.

errcmd_enable_error_reporting() uses pci_{read,write}_config_word()
that return PCIBIOS_* codes. The return code is then returned all the
way into the probe function igen6_probe() that returns it as is. The
probe functions, however, should return normal errnos.

Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from errcmd_enable_error_reporting().

Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Ilpo Järvinen &lt;ilpo.jarvinen@linux.intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>EDAC/amd64: Convert PCIBIOS_* return codes to errnos</title>
<updated>2024-06-16T11:51:13Z</updated>
<author>
<name>Ilpo Järvinen</name>
<email>ilpo.jarvinen@linux.intel.com</email>
</author>
<published>2024-05-27T13:22:34Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=7fcabad0e41a9971e2e3f466c05ad27351f90722'/>
<id>urn:sha1:7fcabad0e41a9971e2e3f466c05ad27351f90722</id>
<content type='text'>
commit 3ec8ebd8a5b782d56347ae884de880af26f93996 upstream.

gpu_get_node_map() uses pci_read_config_dword() that returns PCIBIOS_*
codes. The return code is then returned all the way into the module
init function amd64_edac_init() that returns it as is. The module init
functions, however, should return normal errnos.

Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from gpu_get_node_map().

For consistency, convert also the other similar cases which return
PCIBIOS_* codes even if they do not have any bugs at the moment.

Fixes: 4251566ebc1c ("EDAC/amd64: Cache and use GPU node map")
Signed-off-by: Ilpo Järvinen &lt;ilpo.jarvinen@linux.intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527132236.13875-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>EDAC/skx_common: Allow decoding of SGX addresses</title>
<updated>2024-05-30T07:44:18Z</updated>
<author>
<name>Qiuxu Zhuo</name>
<email>qiuxu.zhuo@intel.com</email>
</author>
<published>2024-04-08T12:04:19Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5f87c3011120d5120a96c6223b183eb63801b5c7'/>
<id>urn:sha1:5f87c3011120d5120a96c6223b183eb63801b5c7</id>
<content type='text'>
[ Upstream commit e0d335077831196bffe6a634ffe385fc684192ca ]

There are no "struct page" associations with SGX pages, causing the check
pfn_to_online_page() to fail. This results in the inability to decode the
SGX addresses and warning messages like:

  Invalid address 0x34cc9a98840 in IA32_MC17_ADDR

Add an additional check to allow the decoding of the error address and to
skip the warning message, if the error address is an SGX address.

Fixes: 1e92af09fab1 ("EDAC/skx_common: Filter out the invalid address")
Signed-off-by: Qiuxu Zhuo &lt;qiuxu.zhuo@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/r/20240408120419.50234-1-qiuxu.zhuo@intel.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>EDAC/synopsys: Fix ECC status and IRQ control race condition</title>
<updated>2024-05-06T12:19:07Z</updated>
<author>
<name>Serge Semin</name>
<email>fancer.lancer@gmail.com</email>
</author>
<published>2024-02-22T18:12:46Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=591c946675d88dcc0ae9ff54be9d5caaee8ce1e3'/>
<id>urn:sha1:591c946675d88dcc0ae9ff54be9d5caaee8ce1e3</id>
<content type='text'>
The race condition around the ECCCLR register access happens in the IRQ
disable method called in the device remove() procedure and in the ECC IRQ
handler:

  1. Enable IRQ:
     a. ECCCLR = EN_CE | EN_UE
  2. Disable IRQ:
     a. ECCCLR = 0
  3. IRQ handler:
     a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT
     b. ECCCLR = 0
     c. ECCCLR = EN_CE | EN_UE

So if the IRQ disabling procedure is called concurrently with the IRQ
handler method the IRQ might be actually left enabled due to the
statement 3c.

The root cause of the problem is that ECCCLR register (which since
v3.10a has been called as ECCCTL) has intermixed ECC status data clear
flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN
flags) and handling (write 1 to clear ECC status data) procedures must
be serialised around the ECCCTL register modification to prevent the
race.

So fix the problem described above by adding the spin-lock around the
ECCCLR modifications and preventing the IRQ-handler from modifying the
IRQs enable flags (there is no point in disabling the IRQ and then
re-enabling it again within a single IRQ handler call, see the
statements 3a/3b and 3c above).

Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR")
Signed-off-by: Serge Semin &lt;fancer.lancer@gmail.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com
</content>
</entry>
<entry>
<title>EDAC/versal: Do not log total error counts</title>
<updated>2024-04-25T16:08:05Z</updated>
<author>
<name>Shubhrajyoti Datta</name>
<email>shubhrajyoti.datta@amd.com</email>
</author>
<published>2024-04-25T12:19:42Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1a24733e80771d8eef656e515306a560519856a9'/>
<id>urn:sha1:1a24733e80771d8eef656e515306a560519856a9</id>
<content type='text'>
When logging errors, the driver currently logs the total error count.
However, it should log the current error only. Fix it.

  [ bp: Rewrite text. ]

Fixes: 6f15b178cd63 ("EDAC/versal: Add a Xilinx Versal memory controller driver")
Signed-off-by: Shubhrajyoti Datta &lt;shubhrajyoti.datta@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240425121942.26378-4-shubhrajyoti.datta@amd.com
</content>
</entry>
<entry>
<title>EDAC/versal: Check user-supplied data before injecting an error</title>
<updated>2024-04-25T16:04:47Z</updated>
<author>
<name>Shubhrajyoti Datta</name>
<email>shubhrajyoti.datta@amd.com</email>
</author>
<published>2024-04-25T12:19:41Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=de87ba848d5e4c861b7357dd7a91698aed7a5a18'/>
<id>urn:sha1:de87ba848d5e4c861b7357dd7a91698aed7a5a18</id>
<content type='text'>
The function inject_data_ue_store() lacks a NULL check for the user
passed values. To prevent below kernel crash include a NULL check.

Call trace:

  kstrtoull
  kstrtou8
  inject_data_ue_store
  full_proxy_write
  vfs_write
  ksys_write
  __arm64_sys_write
  invoke_syscall
  el0_svc_common.constprop.0
  do_el0_svc
  el0_svc
  el0t_64_sync_handler
  el0t_64_sync

Fixes: 83bf24051a60 ("EDAC/versal: Make the bit position of injected errors configurable")
Signed-off-by: Shubhrajyoti Datta &lt;shubhrajyoti.datta@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240425121942.26378-3-shubhrajyoti.datta@amd.com
</content>
</entry>
<entry>
<title>EDAC/versal: Do not register for NOC errors</title>
<updated>2024-04-25T16:04:14Z</updated>
<author>
<name>Shubhrajyoti Datta</name>
<email>shubhrajyoti.datta@amd.com</email>
</author>
<published>2024-04-25T12:19:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=edbe59428eb0da09958769326a6566d4c9242ae7'/>
<id>urn:sha1:edbe59428eb0da09958769326a6566d4c9242ae7</id>
<content type='text'>
The NOC errors are not handled in the driver. Remove the request for
registration.

Signed-off-by: Shubhrajyoti Datta &lt;shubhrajyoti.datta@amd.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Link: https://lore.kernel.org/r/20240425121942.26378-2-shubhrajyoti.datta@amd.com
</content>
</entry>
<entry>
<title>Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras</title>
<updated>2024-03-12T01:14:06Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-03-12T01:14:06Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b0402403e54ae9eb94ce1cbb53c7def776e97426'/>
<id>urn:sha1:b0402403e54ae9eb94ce1cbb53c7def776e97426</id>
<content type='text'>
Pull EDAC updates from Borislav Petkov:

 - Add a FRU (Field Replaceable Unit) memory poison manager which
   collects and manages previously encountered hw errors in order to
   save them to persistent storage across reboots. Previously recorded
   errors are "replayed" upon reboot in order to poison memory which has
   caused said errors in the past.

   The main use case is stacked, on-chip memory which cannot simply be
   replaced so poisoning faulty areas of it and thus making them
   inaccessible is the only strategy to prolong its lifetime.

 - Add an AMD address translation library glue which converts the
   reported addresses of hw errors into system physical addresses in
   order to be used by other subsystems like memory failure, for
   example. Add support for MI300 accelerators to that library.

 - igen6: Add support for Alder Lake-N SoC

 - i10nm: Add Grand Ridge support

 - The usual fixlets and cleanups

* tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/versal: Convert to platform remove callback returning void
  RAS/AMD/FMPM: Fix off by one when unwinding on error
  RAS/AMD/FMPM: Add debugfs interface to print record entries
  RAS/AMD/FMPM: Save SPA values
  RAS: Export helper to get ras_debugfs_dir
  RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
  RAS: Introduce a FRU memory poison manager
  RAS/AMD/ATL: Add MI300 row retirement support
  Documentation: Move RAS section to admin-guide
  EDAC/versal: Make the bit position of injected errors configurable
  EDAC/i10nm: Add Intel Grand Ridge micro-server support
  EDAC/igen6: Add one more Intel Alder Lake-N SoC support
  RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
  RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
  RAS/AMD/ATL: Add MI300 support
  Documentation: RAS: Add index and address translation section
  EDAC/amd64: Use new AMD Address Translation Library
  RAS: Introduce AMD Address Translation Library
  EDAC/synopsys: Convert to devm_platform_ioremap_resource()
</content>
</entry>
<entry>
<title>Merge remote-tracking branches 'ras/edac-drivers', 'ras/edac-misc' and 'ras/edac-amd-atl' into edac-updates-for-v6.9</title>
<updated>2024-03-11T15:24:20Z</updated>
<author>
<name>Borislav Petkov (AMD)</name>
<email>bp@alien8.de</email>
</author>
<published>2024-03-11T15:24:20Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=af65545a0f82d7336f62e34f69d3c644806f5f95'/>
<id>urn:sha1:af65545a0f82d7336f62e34f69d3c644806f5f95</id>
<content type='text'>
* ras/edac-drivers:
  EDAC/i10nm: Add Intel Grand Ridge micro-server support
  EDAC/igen6: Add one more Intel Alder Lake-N SoC support

* ras/edac-misc:
  EDAC/versal: Convert to platform remove callback returning void
  EDAC/versal: Make the bit position of injected errors configurable
  EDAC/synopsys: Convert to devm_platform_ioremap_resource()

* ras/edac-amd-atl:
  RAS/AMD/FMPM: Fix off by one when unwinding on error
  RAS/AMD/FMPM: Add debugfs interface to print record entries
  RAS/AMD/FMPM: Save SPA values
  RAS: Export helper to get ras_debugfs_dir
  RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
  RAS: Introduce a FRU memory poison manager
  RAS/AMD/ATL: Add MI300 row retirement support
  Documentation: Move RAS section to admin-guide
  RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
  RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
  RAS/AMD/ATL: Add MI300 support
  Documentation: RAS: Add index and address translation section
  EDAC/amd64: Use new AMD Address Translation Library
  RAS: Introduce AMD Address Translation Library

Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
</content>
</entry>
<entry>
<title>EDAC/versal: Convert to platform remove callback returning void</title>
<updated>2024-03-08T12:57:49Z</updated>
<author>
<name>Uwe Kleine-König</name>
<email>u.kleine-koenig@pengutronix.de</email>
</author>
<published>2024-03-08T08:51:06Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4527a2194e7c2f88e940f9071084daa307ce08af'/>
<id>urn:sha1:4527a2194e7c2f88e940f9071084daa307ce08af</id>
<content type='text'>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve this, there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shubhrajyoti Datta &lt;shubhrajyoti.datta@amd.com&gt;
Link: https://lore.kernel.org/r/83deca1ce260f7e17ff3cb106c9a6946d4ca4505.1709886922.git.u.kleine-koenig@pengutronix.de
</content>
</entry>
</feed>
