-
Notifications
You must be signed in to change notification settings - Fork 48
NO MERGE: TEST ONLY: 6.6-velinux combine all PRs. #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
bhe4
wants to merge
146
commits into
openvelinux:6.6-velinux
Choose a base branch
from
bhe4:6.6-velinux_all
base: 6.6-velinux
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit 2e89345 upstream. Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct prm_module_info. Intel-SIG: commit 2e89345 ACPI: PRM: Annotate struct prm_module_info with __counted_by. Backport PRM update and bugfixes up to v6.14. Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci # [1] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> [ Aubrey Li: amend commit log ] Signed-off-by: Aubrey Li <[email protected]>
commit f0fcdd2 upstream. Platform Runtime Mechanism (PRM) handlers can be invoked from either the AML interpreter or directly by an OS driver. Implement the latter. [ bp: Massage commit message. ] Intel-SIG: commit f0fcdd2 PRM: Add PRM handler direct call support. Backport PRM update and bugfixes up to v6.14. Signed-off-by: John Allen <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Yazen Ghannam <[email protected]> Reviewed-by: Ard Biesheuvel <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Aubrey Li: amend commit log ] Signed-off-by: Aubrey Li <[email protected]>
commit 090e3be upstream. Server product based on the Atom Darkmont core. Intel-SIG: commit 090e3be x86/cpu: Add model number for Intel Clearwater Forest processor. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Quanxian Wang: amend commit log ] Signed-off-by: Quanxian Wang <[email protected]>
commit a0423af92cb31e6fc4f53ef9b6e19fdf08ad4395 upstream. Latest Intel platform Clearwater Forest has introduced new instructions enumerated by CPUIDs of SHA512, SM3, SM4 and AVX-VNNI-INT16. Advertise these CPUIDs to userspace so that guests can query them directly. SHA512, SM3 and SM4 are on an expected-dense CPUID leaf and some other bits on this leaf have kernel usages. Considering they have not truly kernel usages, hide them in /proc/cpuinfo. These new instructions only operate in xmm, ymm registers and have no new VMX controls, so there is no additional host enabling required for guests to use these instructions, i.e. advertising these CPUIDs to userspace is safe. Intel-SIG: commit a0423af92cb3 x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest. Tested-by: Jiaan Lu <[email protected]> Tested-by: Xuelian Guo <[email protected]> Signed-off-by: Tao Su <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> [ Quanxian Wang: amend commit log ] Signed-off-by: Quanxian Wang <[email protected]>
commit f91f2a9 upstream. A new DSA device ID, 0x11fb, is introduced for the Granite Rapids-D platform. Add the device ID to the IDXD driver. Since a potential security issue has been fixed on the new device, it's secure to assign the device to virtual machines, and therefore, the new device ID will not be added to the VFIO denylist. Additionally, the new device ID may be useful in identifying and addressing any other potential issues with this specific device in the future. The same is also applied to any other new DSA/IAA devices with new device IDs. Intel-SIG: commit f91f2a9 dmaengine: idxd: Add a new DSA device ID for Granite Rapids-D platform Add GNR new idxd id support. Signed-off-by: Fenghua Yu <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]> (cherry picked from commit f91f2a9) Signed-off-by: Ethan Zhao <[email protected]>
commit b628cb5 upstream. Use the GuestPhysBits field in CPUID.0x80000008 to communicate the max mappable GPA to userspace, i.e. the max GPA that is addressable by the CPU itself. Typically this is identical to the max effective GPA, except in the case where the CPU supports MAXPHYADDR > 48 but does not support 5-level TDP (the CPU consults bits 51:48 of the GPA only when walking the fifth level TDP page table entry). Enumerating the max mappable GPA via CPUID will allow guest firmware to map resources like PCI bars in the highest possible address space, while ensuring that the GPA is addressable by the CPU. Without precise knowledge about the max mappable GPA, the guest must assume that 5-level paging is unsupported and thus restrict its mappings to the lower 48 bits. Advertise the max mappable GPA via KVM_GET_SUPPORTED_CPUID as userspace doesn't have easy access to whether or not 5-level paging is supported, and to play nice with userspace VMMs that reflect the supported CPUID directly into the guest. AMD's APM (3.35) defines GuestPhysBits (EAX[23:16]) as: Maximum guest physical address size in bits. This number applies only to guests using nested paging. When this field is zero, refer to the PhysAddrSize field for the maximum guest physical address size. Tom Lendacky confirmed that the purpose of GuestPhysBits is software use and KVM can use it as described above. Real hardware always returns zero. Leave GuestPhysBits as '0' when TDP is disabled in order to comply with the APM's statement that GuestPhysBits "applies only to guest using nested paging". As above, guest firmware will likely create suboptimal mappings, but that is a very minor issue and not a functional concern. Intel-SIG: commit b628cb5 KVM: x86: Advertise max mappable GPA in CPUID.0x80000008.GuestPhysBits Signed-off-by: Gerd Hoffmann <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: massage changelog] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 980b8bc upstream. Use the max mappable GPA via GuestPhysBits advertised by KVM to calculate max_gfn. Currently some selftests (e.g. access_tracking_perf_test, dirty_log_test...) add RAM regions close to max_gfn, so guest may access GPA beyond its mappable range and cause infinite loop. Adjust max_gfn in vm_compute_max_gfn() since x86 selftests already overrides vm_compute_max_gfn() specifically to deal with goofy edge cases. Intel-SIG: commit 980b8bc KVM: selftests: x86: Prioritize getting max_gfn from GuestPhysBits Reported-by: Yi Lai <[email protected]> Signed-off-by: Tao Su <[email protected]> Tested-by: Yi Lai <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: tweak name, add comment and sanity check] Signed-off-by: Sean Christopherson <[email protected]> Conflicts: tools/testing/selftests/kvm/include/x86_64/processor.h [jz: resolve simple context conflict] Signed-off-by: Jason Zeng <[email protected]>
commit 8b93582 upstream. Commit afdb82fd763c ("EDAC, i10nm: make skx_common.o a separate module") made skx_common.o a separate module. With skx_common.o now a separate module, move the common debug code setup_{skx,i10nm}_debug() and teardown_{skx,i10nm}_debug() in {skx,i10nm}_base.c to skx_common.c to reduce code duplication. Additionally, prefix these function names with 'skx' to maintain consistency with other names in the file. Intel-SIG: commit 8b93582 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common Backport to fix EDAC driver for GNR Signed-off-by: Qiuxu Zhuo <[email protected]> Signed-off-by: Tony Luck <[email protected]> Link: https://lore.kernel.org/all/[email protected] [ Aichun Shi: amend commit log ] Signed-off-by: Aichun Shi <[email protected]>
commit 7efb4d8 upstream. When SGX EDECCSSA support was added to KVM in commit 16a7fe3 ("KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest"), it forgot to clear the X86_FEATURE_SGX_EDECCSSA bit in KVM CPU caps when KVM SGX is disabled. Fix it. Intel-SIG: commit 7efb4d8 KVM: VMX: Also clear SGX EDECCSSA in KVM CPU caps when SGX is disabled Backport a fix for the KVM exposing the SGX EDECCSSA capability. Fixes: 16a7fe3 ("KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest") Signed-off-by: Kai Huang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]> [ Zhiquan Li: amend commit log ] Signed-off-by: Zhiquan Li <[email protected]>
…omain commit bb9a9bf upstream. The scope of uncore control is per power domain with TPMI. There are two types of processor topologies can be presented by CPUID extended topology leaf irrespective of the hardware architecture: 1. A die is not enumerated in CPUID. In this case there is only one die in a package is visible. In this case there can be multiple power domains in a single die. 2. A power domain in a package is enumerated as a die in CPUID. So there is one power domain per die. To allow die level controls, the current implementation creates a root domain and aggregates all information from power domains in it. This is well suited for configuration 1 above. But for configuration 2 above, the root domain will present the same information as present by power domain. So, no use of aggregating. To check the configuration, call topology_max_dies_per_package(). If it is more than one, avoid creating root domain. Intel-SIG: commit bb9a9bf platform/x86/intel-uncore-freq: Do not present separate package-die domain. Backport Intel uncore-freq driver elc support and update Signed-off-by: Srinivas Pandruvada <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]> [ Yingbao Jia: amend commit log ] Signed-off-by: Yingbao Jia <[email protected]>
…ntrol commit bb516dc upstream. Add efficiency latency control support to the TPMI uncore driver. This defines two new threshold values for controlling uncore frequency, low threshold and high threshold. When CPU utilization is below low threshold, the user configurable floor latency control frequency can be used by the system. When CPU utilization is above high threshold, the uncore frequency is increased in 100MHz steps until power limit is reached. Intel-SIG: commit bb516dc platform/x86/intel-uncore-freq: Add support for efficiency latency control. Backport Intel uncore-freq driver elc support and update Signed-off-by: Tero Kristo <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]> [ Yingbao Jia: amend commit log ] Signed-off-by: Yingbao Jia <[email protected]>
…fs interface commit 24b6616 upstream. Add the TPMI efficiency latency control fields to the sysfs interface. The sysfs files are mapped to the TPMI uncore driver via the registered uncore_read and uncore_write driver callbacks. These fields are not populated on older non TPMI hardware. Intel-SIG: commit 24b6616 platform/x86/intel-uncore-freq: Add efficiency latency control to sysfs interface. Backport Intel uncore-freq driver elc support and update Signed-off-by: Tero Kristo <[email protected]> Reviewed-by: Ilpo Järvinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]> [ Yingbao Jia: amend commit log ] Signed-off-by: Yingbao Jia <[email protected]>
commit f557e0d1c2e6eb6af6d4468ed2c0ee91829370e2 upstream. Add Granite Rapids Xeon D C-states support: C1, C1E, C6, and C6P. The C-states are basically the same as in Granite Rapids Xeon SP/AP, but characteristics (latency, target residency) are a bit different. Intel-SIG: commit f557e0d1c2e6 intel_idle: add Granite Rapids Xeon D support. Backport Intel idle GNR-D support. Signed-off-by: Artem Bityutskiy <[email protected]> Link: https://patch.msgid.link/[email protected] [ rjw: Changelog edit ] Signed-off-by: Rafael J. Wysocki <[email protected]> [ Yingbao Jia: amend commit log ] Signed-off-by: Yingbao Jia <[email protected]>
commit eeed4bfbe9b96214162a09a7fbb7570fa9522ca4 upstream. Clearwater Forest (CWF) SoC has the same C-states as Sierra Forest (SRF) SoC. Add CWF support by re-using the SRF C-states table. Note: it is expected that CWF C-states will have same or very similar characteristics as SRF C-states (latency and target residency). However, there is a possibility that the characteristics will end up being different enough when the CWF platform development is finished. In that case, a separate CWF C-states table will be created and populated with the CWF-specific characteristics (latency and target residency). Intel-SIG: commit eeed4bfbe9b9 intel_idle: add Clearwater Forest SoC support. Backport Intel idle CWF support. Signed-off-by: Artem Bityutskiy <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Rafael J. Wysocki <[email protected]> [ Yingbao Jia: amend commit log ] Signed-off-by: Yingbao Jia <[email protected]>
commit 1c450ff upstream. Advertise AVX10.1 related CPUIDs, i.e. report AVX10 support bit via CPUID.(EAX=07H, ECX=01H):EDX[bit 19] and new CPUID leaf 0x24H so that guest OS and applications can query the AVX10.1 CPUIDs directly. Intel AVX10 represents the first major new vector ISA since the introduction of Intel AVX512, which will establish a common, converged vector instruction set across all Intel architectures[1]. AVX10.1 is an early version of AVX10, that enumerates the Intel AVX512 instruction set at 128, 256, and 512 bits which is enabled on Granite Rapids. I.e., AVX10.1 is only a new CPUID enumeration with no new functionality. New features, e.g. Embedded Rounding and Suppress All Exceptions (SAE) will be introduced in AVX10.2. Advertising AVX10.1 is safe because there is nothing to enable for AVX10.1, i.e. it's purely a new way to enumerate support, thus there will never be anything for the kernel to enable. Note just the CPUID checking is changed when using AVX512 related instructions, e.g. if using one AVX512 instruction needs to check (AVX512 AND AVX512DQ), it can check ((AVX512 AND AVX512DQ) OR AVX10.1) after checking XCR0[7:5]. The versions of AVX10 are expected to be inclusive, e.g. version N+1 is a superset of version N. Per the spec, the version can never be 0, just advertise AVX10.1 if it's supported in hardware. Moreover, advertising AVX10_{128,256,512} needs to land in the same commit as advertising basic AVX10.1 support, otherwise KVM would advertise an impossible CPU model. E.g. a CPU with AVX512 but not AVX10.1/512 is impossible per the SDM. As more and more AVX related CPUIDs are added (it would have resulted in around 40-50 CPUID flags when developing AVX10), the versioning approach is introduced. But incrementing version numbers are bad for virtualization. E.g. if AVX10.2 has a feature that shouldn't be enumerated to guests for whatever reason, then KVM can't enumerate any "later" features either, because the only way to hide the problematic AVX10.2 feature is to set the version to AVX10.1 or lower[2]. But most AVX features are just passed through and don't have virtualization controls, so AVX10 should not be problematic in practice, so long as Intel honors their promise that future versions will be supersets of past versions. [1] https://cdrdv2.intel.com/v1/dl/getContent/784267 [2] https://lore.kernel.org/all/[email protected]/ Intel-SIG: commit 1c450ff KVM: x86: Advertise AVX10.1 CPUID to userspace. GNR AVX10.1 backporting Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Tao Su <[email protected]> Link: https://lore.kernel.org/r/[email protected] [sean: minor changelog tweaks] Signed-off-by: Sean Christopherson <[email protected]> [ Quanxian Wang: amend commit log ] Signed-off-by: Quanxian Wang <[email protected]>
commit 090e3be upstream. Server product based on the Atom Darkmont core. Intel-SIG: commit 090e3be x86/cpu: Add model number for Intel Clearwater Forest processor. BACKPORTING NEW CPU IFM Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Quanxian Wang: amend commit log ] Signed-off-by: Quanxian Wang <[email protected]>
commit 664596bd98bb251dd417dfd3f9b615b661e1e44a upstream. Hide the Intel Birch Stream SoC TCO WDT feature since it was removed. On platforms with PCH TCO WDT, this redundant device might be rendering errors like this: [ 28.144542] sysfs: cannot create duplicate filename '/bus/platform/devices/iTCO_wdt' Intel-SIG: commit 664596bd98bb i2c: i801: Hide Intel Birch Stream SoC TCO WDT Fixes: 8c56f9e ("i2c: i801: Add support for Intel Birch Stream SoC") Link: https://bugzilla.kernel.org/show_bug.cgi?id=220320 Signed-off-by: Chiasheng Lee <[email protected]> Cc: <[email protected]> # v6.7+ Reviewed-by: Mika Westerberg <[email protected]> Reviewed-by: Jarkko Nikula <[email protected]> Signed-off-by: Andi Shyti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Conflicts: drivers/i2c/busses/i2c-i801.c [jz: resolve context conflicts] Signed-off-by: Jason Zeng <[email protected]>
…the kernel commit 76db7aa upstream. Sync the new sample type for the branch counters feature. Signed-off-by: Kan Liang <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexey Bayduraev <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Tinghao Zhang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 72b8b94 upstream. Sort header files alphabetically. Intel-SIG: commit 72b8b94 powercap: intel_rapl: Sort header files Backport TPMI based RAPL PMU support for GNR and future Xeons. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 575024a upstream. Introduce two new APIs rapl_package_add_pmu()/rapl_package_remove_pmu(). RAPL driver can invoke these APIs to expose its supported energy counters via perf PMU. The new RAPL PMU is fully compatible with current MSR RAPL PMU, including using the same PMU name and events name/id/unit/scale, etc. For example, use below command perf stat -e power/energy-pkg/ -e power/energy-ram/ FOO to get the energy consumption if power/energy-pkg/ and power/energy-ram/ events are available in the "perf list" output. This does not introduce any conflict because TPMI RAPL is the only user of these APIs currently, and it never co-exists with MSR RAPL. Note that RAPL Packages can be probed/removed dynamically, and the events supported by each TPMI RAPL device can be different. Thus the RAPL PMU support is done on demand, which means 1. PMU is registered only if it is needed by a RAPL Package. PMU events for unsupported counters are not exposed. 2. PMU is unregistered and registered when a new RAPL Package is probed and supports new counters that are not supported by current PMU. For example, on a dual-package system using TPMI RAPL, it is possible that Package 1 behaves as TPMI domain root and supports Psys domain. In this case, register PMU without Psys event when probing Package 0, and re-register the PMU with Psys event when probing Package 1. 3. PMU is unregistered when all registered RAPL Packages don't need PMU. Intel-SIG: commit 575024a powercap: intel_rapl: Introduce APIs for PMU support Backport TPMI based RAPL PMU support for GNR and future Xeons. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 963a9ad upstream. Enable RAPL PMU support for TPMI RAPL driver. Intel-SIG: commit 963a9ad powercap: intel_rapl_tpmi: Enable PMU support Backport TPMI based RAPL PMU support for GNR and future Xeons. Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 0007f39 upstream. The unit control address of some CXL units may be wrongly calculated under some configuration on a EMR machine. The current implementation only saves the unit control address of the units from the first die, and the first unit of the rest of dies. Perf assumed that the units from the other dies have the same offset as the first die. So the unit control address of the rest of the units can be calculated. However, the assumption is wrong, especially for the CXL units. Introduce an RB tree for each uncore type to save the unit control address and three kinds of ID information (unit ID, PMU ID, and die ID) for all units. The unit ID is a physical ID of a unit. The PMU ID is a logical ID assigned to a unit. The logical IDs start from 0 and must be contiguous. The physical ID and the logical ID are 1:1 mapping. The units with the same physical ID in different dies share the same PMU. The die ID indicates which die a unit belongs to. The RB tree can be searched by two different keys (unit ID or PMU ID + die ID). During the RB tree setup, the unit ID is used as a key to look up the RB tree. The perf can create/assign a proper PMU ID to the unit. Later, after the RB tree is setup, PMU ID + die ID is used as a key to look up the RB tree to fill the cpumask of a PMU. It's used more frequently, so PMU ID + die ID is compared in the unit_less(). The uncore_find_unit() has to be O(N). But the RB tree setup only occurs once during the driver load time. It should be acceptable. Compared with the current implementation, more space is required to save the information of all units. The extra size should be acceptable. For example, on EMR, there are 221 units at most. For a 2-socket machine, the extra space is ~6KB at most. Intel-SIG: commit 0007f39 perf/x86/uncore: Save the unit control address of all units Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit c74443d upstream. The cpumask of some uncore units, e.g., CXL uncore units, may be wrong under some configurations. Perf may access an uncore counter of a non-existent uncore unit. The uncore driver assumes that all uncore units are symmetric among dies. A global cpumask is shared among all uncore PMUs. However, some CXL uncore units may only be available on some dies. A per PMU cpumask is introduced to track the CPU mask of this PMU. The driver searches the unit control RB tree to check whether the PMU is available on a given die, and updates the per PMU cpumask accordingly. Intel-SIG: commit c74443d perf/x86/uncore: Support per PMU cpumask Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Yunying Sun <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 585463f upstream. The box_ids only save the unit ID for the first die. If a unit, e.g., a CXL unit, doesn't exist in the first die. The unit ID cannot be retrieved. The unit control RB tree also stores the unit ID information. Retrieve the unit ID from the unit control RB tree Intel-SIG: commit 585463f perf/x86/uncore: Retrieve the unit ID from the unit control RB tree Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Yunying Sun <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 80580da upstream. The unit control RB tree has the unit control and unit ID information for all the units. Use it to replace the box_ctls/mmio_offsets to get an accurate unit control address for MMIO uncore units. Intel-SIG: commit 80580da perf/x86/uncore: Apply the unit control RB tree to MMIO uncore units Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Yunying Sun <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit b1d9ea2 upstream. The unit control RB tree has the unit control and unit ID information for all the MSR units. Use them to replace the box_ctl and uncore_msr_box_ctl() to get an accurate unit control address for MSR uncore units. Add intel_generic_uncore_assign_hw_event(), which utilizes the accurate unit control address from the unit control RB tree to calculate the config_base and event_base. The unit id related information should be retrieved from the unit control RB tree as well. Intel-SIG: commit b1d9ea2 perf/x86/uncore: Apply the unit control RB tree to MSR uncore units Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Yunying Sun <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit f76a842 upstream. The unit control RB tree has the unit control and unit ID information for all the PCI units. Use them to replace the box_ctls/pci_offsets to get an accurate unit control address for PCI uncore units. The UPI/M3UPI units in the discovery table are ignored. Please see the commit 65248a9 ("perf/x86/uncore: Add a quirk for UPI on SPR"). Manually allocate a unit control RB tree for UPI/M3UPI. Add cleanup_extra_boxes to release such manual allocation. Intel-SIG: commit f76a842 perf/x86/uncore: Apply the unit control RB tree to PCI uncore units Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Yunying Sun <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 15a4bd5 upstream. The unit control and ID information are retrieved from the unit control RB tree. No one uses the old structure anymore. Remove them. Intel-SIG: commit 15a4bd5 perf/x86/uncore: Cleanup unused unit structure Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Yunying Sun <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit f8a86a9 upstream. Unknown uncore PMON types can be found in both SPR and EMR with HBM or CXL. $ls /sys/devices/ | grep type uncore_type_12_16 uncore_type_12_18 uncore_type_12_2 uncore_type_12_4 uncore_type_12_6 uncore_type_12_8 uncore_type_13_17 uncore_type_13_19 uncore_type_13_3 uncore_type_13_5 uncore_type_13_7 uncore_type_13_9 The unknown PMON types are HBM and CXL PMON. Except for the name, the other information regarding the HBM and CXL PMON counters can be retrieved via the discovery table. Add them into the uncores tables for SPR and EMR. The event config registers for all CXL related units are 8-byte apart. Add SPR_UNCORE_MMIO_OFFS8_COMMON_FORMAT to specially handle it. Intel-SIG: commit f8a86a9 perf/x86/intel/uncore: Support HBM and CXL PMON counters Backport SPR/EMR HBM and CXL PMON support to kernel v6.6 Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Yunying Sun <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit d4b5694 upstream. From PMU's perspective, the SPR/GNR server has a similar uarch to the ADL/MTL client p-core. Many functions are shared. However, the shared function name uses the abbreviation of the server product code name, rather than the common uarch code name. Rename these internal shared functions by the common uarch name. Intel-SIG: commit d4b5694 perf/x86/intel: Use the common uarch name for the shared functions Backport as a dependency needed by the GNR distinct pmu name fix Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ Yunying Sun: amend commit log ] Signed-off-by: Yunying Sun <[email protected]> Signed-off-by: Jason Zeng <[email protected]>
commit 208d8c7 upstream. Let cpu_init_exception_handling() call cpu_init_fred_exceptions() to initialize FRED. However if FRED is unavailable or disabled, it falls back to set up TSS IST and initialize IDT. Intel-SIG: commit 208d8c7 x86/fred: Invoke FRED initialization code to enable FRED Backport FRED support. Co-developed-by: Xin Li <[email protected]> Signed-off-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: Shan Kang <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit 208d8c7) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
…ng to inline properly commit cba9ff3 upstream. Change array_index_mask_nospec() to __always_inline because "inline" is broken as https://www.kernel.org/doc/local/inline.html. Intel-SIG: commit cba9ff3 x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly Backport FRED support. Fixes: 6786137bf8fd ("x86/fred: FRED entry/exit and dispatch code") Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit cba9ff3) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit e138419 upstream. Add H. Peter Anvin and myself as FRED maintainers. Intel-SIG: commit e138419 MAINTAINERS: Add a maintainer entry for FRED Backport FRED support. Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: H. Peter Anvin (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit e138419) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit c416b5b upstream. As TOP_OF_KERNEL_STACK_PADDING was defined as 0 on x86_64, it went unnoticed that the initialization of the .sp field in INIT_THREAD and some calculations in the low level startup code do not take the padding into account. FRED enabled kernels require a 16 byte padding, which means that the init task initialization and the low level startup code use the wrong stack offset. Subtract TOP_OF_KERNEL_STACK_PADDING in all affected places to adjust for this. Intel-SIG: commit c416b5b x86/fred: Fix init_task thread stack pointer initialization Backport FRED support. Fixes: 65c9cc9 ("x86/fred: Reserve space for the FRED stack frame") Fixes: 3adee77 ("x86/smpboot: Remove initial_stack on 64-bit") Reported-by: kernel test robot <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit c416b5b) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit 989b5cf upstream. Depending on whether FRED is enabled, sysvec_install() installs a system interrupt handler into either into FRED's system vector dispatch table or into the IDT. However FRED can be disabled later in trap_init(), after sysvec_install() has been invoked already; e.g., the HYPERVISOR_CALLBACK_VECTOR handler is registered with sysvec_install() in kvm_guest_init(), which is called in setup_arch() but way before trap_init(). IOW, there is a gap between FRED is available and available but disabled. As a result, when FRED is available but disabled, early sysvec_install() invocations fail to install the IDT handler resulting in spurious interrupts. Fix it by parsing cmdline param "fred=" in cpu_parse_early_param() to ensure that FRED is disabled before the first sysvec_install() incovations. Intel-SIG: commit 989b5cf x86/fred: Parse cmdline param "fred=" in cpu_parse_early_param() Backport FRED support. Fixes: 3810da1 ("x86/fred: Add a fred= cmdline param") Reported-by: Hou Wenlong <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] (cherry picked from commit 989b5cf) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit 73270c1 upstream. To enable FRED earlier, move the RSP initialization out of cpu_init_fred_exceptions() into cpu_init_fred_rsps(). This is required as the FRED RSP initialization depends on the availability of the CPU entry areas which are set up late in trap_init(), No functional change intended. Marked with Fixes as it's a depedency for the real fix. Intel-SIG: commit 73270c1 x86/fred: Move FRED RSP initialization into separate function Backport FRED support. Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code") Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] (cherry picked from commit 73270c1) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit a97756c upstream. On 64-bit init_mem_mapping() relies on the minimal page fault handler provided by the early IDT mechanism. The real page fault handler is installed right afterwards into the IDT. This is problematic on CPUs which have X86_FEATURE_FRED set because the real page fault handler retrieves the faulting address from the FRED exception stack frame and not from CR2, but that does obviously not work when FRED is not yet enabled in the CPU. To prevent this enable FRED right after init_mem_mapping() without interrupt stacks. Those are enabled later in trap_init() after the CPU entry area is set up. [ tglx: Encapsulate the FRED details ] Intel-SIG: commit a97756c x86/fred: Enable FRED right after init_mem_mapping() Backport FRED support. Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code") Reported-by: Hou Wenlong <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] (cherry picked from commit a97756c) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit 23edbd2ca5fb4c78ac4a5644511c63895fd1c57 upstream. SS is initialized to NULL during boot time and not explicitly set to __KERNEL_DS. With FRED enabled, if a kernel event is delivered before a CPU goes to user level for the first time, its SS is NULL thus NULL is pushed into the SS field of the FRED stack frame. But before ERETS is executed, the CPU may context switch to another task and go to user level. Then when the CPU comes back to kernel mode, SS is changed to __KERNEL_DS. Later when ERETS is executed to return from the kernel event handler, a #GP fault is generated because SS doesn't match the SS saved in the FRED stack frame. Initialize SS to __KERNEL_DS when enabling FRED to prevent that. Note, IRET doesn't check if SS matches the SS saved in its stack frame, thus IDT doesn't have this problem. For IDT it doesn't matter whether SS is set to __KERNEL_DS or not, because it's set to NULL upon interrupt or exception delivery and __KERNEL_DS upon SYSCALL. Thus it's pointless to initialize SS for IDT. Intel-SIG: commit 723edbd x86/fred: Set SS to __KERNEL_DS when enabling FRED Backport FRED support. Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] (cherry picked from commit 723edbd) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit 0dfac6f upstream. In most cases, ti_work values passed to arch_exit_to_user_mode_prepare() are zeros, e.g., 99% in kernel build tests. So an obvious optimization is to test ti_work for zero before processing individual bits in it. Omit the optimization when FPU debugging is enabled, otherwise the FPU consistency check is never executed. Intel 0day tests did not find a perfermance regression with this change. Intel-SIG: commit 0dfac6f x86/entry: Test ti_work for zero before processing individual bits Backport FRED support. Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] (cherry picked from commit 0dfac6f) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
…nism commit efe5088 upstream. Per the discussion about FRED MSR writes with WRMSRNS instruction [1], use the alternatives mechanism to choose WRMSRNS when it's available, otherwise fallback to WRMSR. Remove the dependency on X86_FEATURE_WRMSRNS as WRMSRNS is no longer dependent on FRED. [1] https://lore.kernel.org/lkml/[email protected]/ Use DS prefix to pad WRMSR instead of a NOP. The prefix is ignored. At least that's the current information from the hardware folks. Intel-SIG: commit efe5088 x86/msr: Switch between WRMSRNS and WRMSR with the alternatives mechanism Backport FRED support. Signed-off-by: Andrew Cooper <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] (cherry picked from commit efe5088) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
…itch commit fe85ee3 upstream. The FRED RSP0 MSR points to the top of the kernel stack for user level event delivery. As this is the task stack it needs to be updated when a task is scheduled in. The update is done at context switch. That means it's also done when switching to kernel threads, which is pointless as those never go out to user space. For KVM threads this means there are two writes to FRED_RSP0 as KVM has to switch to the guest value before VMENTER. Defer the update to the exit to user space path and cache the per CPU FRED_RSP0 value, so redundant writes can be avoided. Provide fred_sync_rsp0() for KVM to keep the cache in sync with the actual MSR value after returning from guest to host mode. [ tglx: Massage change log ] Intel-SIG: commit fe85ee3 x86/entry: Set FRED RSP0 on return to userspace instead of context switch Backport FRED support. Suggested-by: Sean Christopherson <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected] (cherry picked from commit fe85ee3) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit de31b3cd706347044e1a57d68c3a683d58e8cca4 upstream. The FRED RSP0 MSR is only used for delivering events when running userspace. Linux leverages this property to reduce expensive MSR writes and optimize context switches. The kernel only writes the MSR when about to run userspace *and* when the MSR has actually changed since the last time userspace ran. This optimization is implemented by maintaining a per-CPU cache of FRED RSP0 and then checking that against the value for the top of current task stack before running userspace. However cpu_init_fred_exceptions() writes the MSR without updating the per-CPU cache. This means that the kernel might return to userspace with MSR_IA32_FRED_RSP0==0 when it needed to point to the top of current task stack. This would induce a double fault (#DF), which is bad. A context switch after cpu_init_fred_exceptions() can paper over the issue since it updates the cached value. That evidently happens most of the time explaining how this bug got through. Fix the bug through resynchronizing the FRED RSP0 MSR with its per-CPU cache in cpu_init_fred_exceptions(). Intel-SIG: commit de31b3cd7063 x86/fred: Fix the FRED RSP0 MSR out of sync with its per-CPU cache Backport FRED support. Fixes: fe85ee3 ("x86/entry: Set FRED RSP0 on return to userspace instead of context switch") Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc:[email protected] Link: https://lore.kernel.org/all/20250110174639.1250829-1-xin%40zytor.com (cherry picked from commit de31b3cd706347044e1a57d68c3a683d58e8cca4) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
commit e5f1e8af9c9e151ecd665f6d2e36fb25fec3b110 upstream. Upon a wakeup from S4, the restore kernel starts and initializes the FRED MSRs as needed from its perspective. It then loads a hibernation image, including the image kernel, and attempts to load image pages directly into their original page frames used before hibernation unless those frames are currently in use. Once all pages are moved to their original locations, it jumps to a "trampoline" page in the image kernel. At this point, the image kernel takes control, but the FRED MSRs still contain values set by the restore kernel, which may differ from those set by the image kernel before hibernation. Therefore, the image kernel must ensure the FRED MSRs have the same values as before hibernation. Since these values depend only on the location of the kernel text and data, they can be recomputed from scratch. Intel-SIG: commit e5f1e8af9c9e1 x86/fred: Fix system hang during S4 resume with FRED enabled Backport FRED support. Reported-by: Xi Pardee <[email protected]> Reported-by: Todd Brandt <[email protected]> Tested-by: Todd Brandt <[email protected]> Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Rafael J. Wysocki <[email protected]> Reviewed-by: H. Peter Anvin (Intel) <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit e5f1e8af9c9e151ecd665f6d2e36fb25fec3b110) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
…rn from SIGTRAP handler commit e34dbbc85d64af59176fe59fad7b4122f4330fe2 upstream. Clear the software event flag in the augmented SS to prevent immediate repeat of single step trap on return from SIGTRAP handler if the trap flag (TF) is set without an external debugger attached. Following is a typical single-stepping flow for a user process: 1) The user process is prepared for single-stepping by setting RFLAGS.TF = 1. 2) When any instruction in user space completes, a #DB is triggered. 3) The kernel handles the #DB and returns to user space, invoking the SIGTRAP handler with RFLAGS.TF = 0. 4) After the SIGTRAP handler finishes, the user process performs a sigreturn syscall, restoring the original state, including RFLAGS.TF = 1. 5) Goto step 2. According to the FRED specification: A) Bit 17 in the augmented SS is designated as the software event flag, which is set to 1 for FRED event delivery of SYSCALL, SYSENTER, or INT n. B) If bit 17 of the augmented SS is 1 and ERETU would result in RFLAGS.TF = 1, a single-step trap will be pending upon completion of ERETU. In step 4) above, the software event flag is set upon the sigreturn syscall, and its corresponding ERETU would restore RFLAGS.TF = 1. This combination causes a pending single-step trap upon completion of ERETU. Therefore, another #DB is triggered before any user space instruction is executed, which leads to an infinite loop in which the SIGTRAP handler keeps being invoked on the same user space IP. Intel-SIG: commit e34dbbc85d64a x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler Backport FRED support. Fixes: 14619d9 ("x86/fred: FRED entry/exit and dispatch code") Suggested-by: H. Peter Anvin (Intel) <[email protected]> Signed-off-by: Xin Li (Intel) <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Tested-by: Sohil Mehta <[email protected]> Cc:[email protected] Link: https://lore.kernel.org/all/20250609084054.2083189-2-xin%40zytor.com (cherry picked from commit e34dbbc85d64af59176fe59fad7b4122f4330fe2) [ Ethan Zhao: amend commit log ] Signed-off-by: Ethan Zhao <[email protected]>
10a6355 to
b9b0c02
Compare
The call to idxd_free() introduces a duplicate put_device() leading to a
reference count underflow:
refcount_t: underflow; use-after-free.
WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
...
Call Trace:
<TASK>
idxd_remove+0xe4/0x120 [idxd]
pci_device_remove+0x3f/0xb0
device_release_driver_internal+0x197/0x200
driver_detach+0x48/0x90
bus_remove_driver+0x74/0xf0
pci_unregister_driver+0x2e/0xb0
idxd_exit_module+0x34/0x7a0 [idxd]
__do_sys_delete_module.constprop.0+0x183/0x280
do_syscall_64+0x54/0xd70
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The idxd_unregister_devices() which is invoked at the very beginning of
idxd_remove(), already takes care of the necessary put_device() through the
following call path:
idxd_unregister_devices() -> device_unregister() -> put_device()
In addition, when CONFIG_DEBUG_KOBJECT_RELEASE is enabled, put_device() may
trigger asynchronous cleanup via schedule_delayed_work(). If idxd_free() is
called immediately after, it can result in a use-after-free.
Remove the improper idxd_free() to avoid both the refcount underflow and
potential memory corruption during module unload.
Fixes: d5449ff1b04d ("dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call")
Signed-off-by: Yi Sun <[email protected]>
Tested-by: Shuai Xue <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
A recent refactor introduced a misplaced put_device() call, resulting in a
reference count underflow during module unload.
There is no need to add additional put_device() calls for idxd groups,
engines, or workqueues. Although the commit claims: "Note, this also
fixes the missing put_device() for idxd groups, engines, and wqs."
It appears no such omission actually existed. The required cleanup is
already handled by the call chain:
idxd_unregister_devices() -> device_unregister() -> put_device()
Extend idxd_cleanup() to handle the remaining necessary cleanup and
remove idxd_cleanup_internals(), which duplicates deallocation logic
for idxd, engines, groups, and workqueues. Memory management is also
properly handled through the Linux device model.
Fixes: a409e919ca32 ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper")
Signed-off-by: Yi Sun <[email protected]>
Tested-by: Shuai Xue <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
The clean up in idxd_setup_wqs() has had a couple bugs because the error
handling is a bit subtle. It's simpler to just re-write it in a cleaner
way. The issues here are:
1) If "idxd->max_wqs" is <= 0 then we call put_device(conf_dev) when
"conf_dev" hasn't been initialized.
2) If kzalloc_node() fails then again "conf_dev" is invalid. It's
either uninitialized or it points to the "conf_dev" from the
previous iteration so it leads to a double free.
It's better to free partial loop iterations within the loop and then
the unwinding at the end can handle whole loop iterations. I also
renamed the labels to describe what the goto does and not where the goto
was located.
Fixes: 3fd2f4bc010c ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs")
Reported-by: Colin Ian King <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
Clearwater Forrest has same c-state residency counters like Sierra Forrest. So this simply adds cpu model id for it. Cc: Artem Bityutskiy <[email protected]> Cc: Kan Liang <[email protected]> Reviewed-by: Kan Liang <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
Along with the introduction Perfmon v6, pmu counters could be incontinuous, like fixed counters on CWF, only fixed counters 0-3 and 5-7 are supported, there is no fixed counter 4 on CWF. To accommodate this change, archPerfmonExt CPUID (0x23) leaves are introduced to enumerate the true-view of counters bitmap. Current perf code already supports archPerfmonExt CPUID and uses counters-bitmap to enumerate HW really supported counters, but x86_pmu_show_pmu_cap() still only dumps the absolute counter number instead of true-view bitmap, it's out-dated and may mislead readers. So dump counters true-view bitmap in x86_pmu_show_pmu_cap() and opportunistically change the dump sequence and words. Signed-off-by: Dapeng Mi <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kan Liang <[email protected]> Link: https://lore.kernel.org/r/[email protected]
…esctrl subsystem commit 594902c986e269660302f09df9ec4bf1cf017b77 upstream. In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain structure representing a NUMA node relies on the cacheinfo interface (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map) for monitoring. The L3 cache information of a SNC NUMA node determines which domains are summed for the "top level" L3-scoped events. rdt_mon_domain::ci is initialized using the first online CPU of a NUMA node. When this CPU goes offline, its shared_cpu_map is cleared to contain only the offline CPU itself. Subsequently, attempting to read counters via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning zero values for "top-level events" without any error indication. Replace the cacheinfo references in struct rdt_mon_domain and struct rmid_read with the cacheinfo ID (a unique identifier for the L3 cache). rdt_domain_hdr::cpu_mask contains the online CPUs associated with that domain. When reading "top-level events", select a CPU from rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine valid CPUs for reading RMID counter via the MSR interface. Considering all CPUs associated with the L3 cache improves the chances of picking a housekeeping CPU on which the counter reading work can be queued, avoiding an unnecessary IPI. Fixes: 328ea68 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files") Signed-off-by: Qinyun Tan <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Reinette Chatre <[email protected]> Tested-by: Tony Luck <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Kui Wen <[email protected]>
https://lore.kernel.org/all/[email protected]/ Signed-off-by: Kui Wen <[email protected]>
1. add CONFIG_INTEL_IFS=m 2. add CONFIG_DMATEST=m 3. add CONFIG_TCG_TPM=y 4. do below change for EDAC on 25ww43.4 CONFIG_EDAC=y CONFIG_EDAC_DEBUG=y CONFIG_EDAC_DECODE_MCE=y CONFIG_ACPI_APEI_ERST_DEBUG=y CONFIG_EDAC_IEH=m 4. do below change for power module CONFIG_INTEL_TPMI=m CONFIG_INTEL_VSEC=m CONFIG_INTEL_RAPL_TPMI=m CONFIG_INTEL_PMT_CLASS=m CONFIG_INTEL_PMT_TELEMETRY=m 5. enable the fred CONFIG_X86_FRED=y 6. enable the cet CONFIG_X86_USER_SHADOW_STACK=y Signed-off-by: Bo He <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It's combined the below PRs for 6.6 kernel test purpose:
#70 opened on Sep 24 by zhang-rui
#69 opened on Sep 24 by zhang-rui
#68 opened on Sep 24 by zhang-rui
#67 opened on Sep 24 by zhang-rui
#66 opened on Sep 24 by zhang-rui
#65 opened on Sep 24 by zhang-rui
#64 opened on Sep 24 by x56Jason
#62 opened on Sep 18 by x56Jason
#60 opened on Aug 19 by quanxianwang
#59 opened on Aug 7 by quanxianwang
#49 opened on Jun 10 by jiayingbao
#48 opened on Jun 10 by jiayingbao
#45 opened on Mar 31 by zhiquan1-li
#43 opened on Mar 27 by AichunShi
1
#41 opened on Mar 27 by x56Jason
#39 opened on Mar 27 by EthanZHF
#36 opened on Mar 27 by quanxianwang
#31 opened on Mar 11 by aubreyli