Skip to content

Conversation

@AichunShi
Copy link

This PR is to fix EDAC driver for Intel GNR platform on 6.6-velinux kernel.

Upstream commits from v6.13:
a36667037a0c0e36c59407f8ae636295390239a5 EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
2397f795735219caa9c2fe61e7bcdd0652e670d3 EDAC/skx_common: Differentiate memory error sources

Upstream commit from v6.11:
8b93582 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common

Upstream commit from v6.11 already merged:
123b158 EDAC, i10nm: make skx_common.o a separate module

Test

Built and run the kernel successfully.
EDAC Test is PASS on GNR platform.

Configs

No Change.

qzhuo2 added 3 commits March 27, 2025 18:25
commit 8b93582 upstream.

Commit

  afdb82fd763c ("EDAC, i10nm: make skx_common.o a separate module")

made skx_common.o a separate module. With skx_common.o now a separate
module, move the common debug code setup_{skx,i10nm}_debug() and
teardown_{skx,i10nm}_debug() in {skx,i10nm}_base.c to skx_common.c to
reduce code duplication. Additionally, prefix these function names with
'skx' to maintain consistency with other names in the file.

Intel-SIG: commit 8b93582 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common
Backport to fix EDAC driver for GNR

Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <[email protected]>
commit 2397f795735219caa9c2fe61e7bcdd0652e670d3 upstream.

The current skx_common determines whether the memory error source is the
near memory of the 2LM system and then retrieves the decoded error results
from the ADXL components (near-memory vs. far-memory) accordingly.

However, some memory controllers may have limitations in correctly
reporting the memory error source, leading to the retrieval of incorrect
decoded parts from the ADXL.

To address these limitations, instead of simply determining whether the
memory error is from the near memory of the 2LM system, it is necessary to
distinguish the memory error source details as follows:

  Memory error from the near memory of the 2LM system.
  Memory error from the far memory of the 2LM system.
  Memory error from the 1LM system.
  Not a memory error.

This will enable the i10nm_edac driver to take appropriate actions for
those memory controllers that have limitations in reporting the memory
error source.

Intel-SIG: commit 2397f7957352 EDAC/skx_common: Differentiate memory error sources
Backport to fix EDAC driver for GNR

Fixes: ba987ea ("EDAC/i10nm: Add Intel Granite Rapids server support")
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Tested-by: Diego Garcia Rodriguez <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <[email protected]>
commit a36667037a0c0e36c59407f8ae636295390239a5 upstream.

The Granite Rapids CPUs with Flat2LM memory configurations may
mistakenly report near-memory errors as far-memory errors, resulting
in the invalid decoded ADXL results:

  EDAC skx: Bad imc -1

Fix this incorrect far-memory error source indicator by prefetching the
decoded far-memory controller ID, and adjust the error source indicator
to near-memory if the far-memory controller ID is invalid.

Intel-SIG: commit a36667037a0c EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
Backport to fix EDAC driver for GNR

Fixes: ba987ea ("EDAC/i10nm: Add Intel Granite Rapids server support")
Signed-off-by: Qiuxu Zhuo <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
Tested-by: Diego Garcia Rodriguez <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Aichun Shi: amend commit log ]
Signed-off-by: Aichun Shi <[email protected]>
@AichunShi AichunShi marked this pull request as ready for review March 27, 2025 10:34
@qiruibd
Copy link
Collaborator

qiruibd commented Apr 8, 2025

Acked

jackYoung0915 pushed a commit to jackYoung0915/kernel that referenced this pull request Jul 5, 2025
[Backport][USB]xhci: Limit time spent with xHC interrupts disabled during bus resume
x56Jason added a commit to openvelinux/kernel-intel that referenced this pull request Nov 10, 2025
…x-edac-fix-gnr' into intel-6.6-velinux

This PR is to fix EDAC driver for Intel GNR platform on 6.6-velinux kernel.

Upstream commits from v6.13:
a36667037a0c0e36c59407f8ae636295390239a5 EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicator
2397f795735219caa9c2fe61e7bcdd0652e670d3 EDAC/skx_common: Differentiate memory error sources

Upstream commit from v6.11:
8b93582 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common

Upstream commit from v6.11 already merged:
123b158 EDAC, i10nm: make skx_common.o a separate module

Test
Built and run the kernel successfully.
EDAC Test is PASS on GNR platform.

 Conflicts:
	drivers/edac/skx_common.c
	drivers/edac/skx_common.h
[jz: stable already merged part of the commits]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants