summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2026-03-19x86/apic: Disable x2apic on resume if the kernel expects soShashank Balaji
commit 8cc7dd77a1466f0ec58c03478b2e735a5b289b96 upstream. When resuming from s2ram, firmware may re-enable x2apic mode, which may have been disabled by the kernel during boot either because it doesn't support IRQ remapping or for other reasons. This causes the kernel to continue using the xapic interface, while the hardware is in x2apic mode, which causes hangs. This happens on defconfig + bare metal + s2ram. Fix this in lapic_resume() by disabling x2apic if the kernel expects it to be disabled, i.e. when x2apic_mode = 0. The ACPI v6.6 spec, Section 16.3 [1] says firmware restores either the pre-sleep configuration or initial boot configuration for each CPU, including MSR state: When executing from the power-on reset vector as a result of waking from an S2 or S3 sleep state, the platform firmware performs only the hardware initialization required to restore the system to either the state the platform was in prior to the initial operating system boot, or to the pre-sleep configuration state. In multiprocessor systems, non-boot processors should be placed in the same state as prior to the initial operating system boot. (further ahead) If this is an S2 or S3 wake, then the platform runtime firmware restores minimum context of the system before jumping to the waking vector. This includes: CPU configuration. Platform runtime firmware restores the pre-sleep configuration or initial boot configuration of each CPU (MSR, MTRR, firmware update, SMBase, and so on). Interrupts must be disabled (for IA-32 processors, disabled by CLI instruction). (and other things) So at least as per the spec, re-enablement of x2apic by the firmware is allowed if "x2apic on" is a part of the initial boot configuration. [1] https://uefi.org/specs/ACPI/6.6/16_Waking_and_Sleeping.html#initialization [ bp: Massage. ] Fixes: 6e1cb38a2aef ("x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping") Co-developed-by: Rahul Bukte <rahul.bukte@sony.com> Signed-off-by: Rahul Bukte <rahul.bukte@sony.com> Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260306-x2apic-fix-v2-1-bee99c12efa3@sony.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19powerpc64/bpf: fix the address returned by bpf_get_func_ipHari Bathini
commit 157820264ac3dadfafffad63184b883eb28f9ae0 upstream. bpf_get_func_ip() helper function returns the address of the traced function. It relies on the IP address stored at ctx - 16 by the bpf trampoline. On 64-bit powerpc, this address is recovered from LR accounting for OOL trampoline. But the address stored here was off by 4-bytes. Ensure the address is the actual start of the traced function. Reported-by: Abhishek Dubey <adubey@linux.ibm.com> Fixes: d243b62b7bd3 ("powerpc64/bpf: Add support for bpf trampolines") Cc: stable@vger.kernel.org Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260303181031.390073-3-hbathini@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19powerpc64/bpf: fix kfunc call supportHari Bathini
commit 01b6ac72729610ae732ca2a66e3a642e23f6cd60 upstream. Commit 61688a82e047 ("powerpc/bpf: enable kfunc call") inadvertently enabled kfunc call support for 32-bit powerpc but that support will not be possible until ABI mismatch between 32-bit powerpc and eBPF is handled in 32-bit powerpc JIT code. Till then, advertise support only for 64-bit powerpc. Also, in powerpc ABI, caller needs to extend the arguments properly based on signedness. The JIT code is responsible for handling this explicitly for kfunc calls as verifier can't handle this for each architecture-specific ABI needs. But this was not taken care of while kfunc call support was enabled for powerpc. Fix it by handling this with bpf_jit_find_kfunc_model() and using zero_extend() & sign_extend() helper functions. Fixes: 61688a82e047 ("powerpc/bpf: enable kfunc call") Cc: stable@vger.kernel.org Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260303181031.390073-7-hbathini@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19powerpc/pseries: Correct MSI allocation trackingNam Cao
commit 35e4f2a17eb40288f9bcdb09549fa04a63a96279 upstream. The per-device MSI allocation calculation in pseries_irq_domain_alloc() is clearly wrong. It can still happen to work when nr_irqs is 1. Correct it. Fixes: c0215e2d72de ("powerpc/pseries: Fix MSI-X allocation failure when quota is exceeded") Cc: stable@vger.kernel.org Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> [maddy: Fixed Nilay's reviewed-by tag] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260302003948.1452016-1-namcao@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19s390/xor: Fix xor_xc_5() inline assemblyVasily Gorbik
commit 5f25805303e201f3afaff0a90f7c7ce257468704 upstream. xor_xc_5() contains a larl 1,2f that is not used by the asm and is not declared as a clobber. This can corrupt a compiler-allocated value in %r1 and lead to miscompilation. Remove the instruction. Fixes: 745600ed6965 ("s390/lib: Use exrl instead of ex in xor functions") Cc: stable@vger.kernel.org Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19s390/xor: Fix xor_xc_2() inline assembly constraintsHeiko Carstens
commit f775276edc0c505dc0f782773796c189f31a1123 upstream. The inline assembly constraints for xor_xc_2() are incorrect. "bytes", "p1", and "p2" are input operands, while all three of them are modified within the inline assembly. Given that the function consists only of this inline assembly it seems unlikely that this may cause any problems, however fix this in any case. Fixes: 2cfc5f9ce7f5 ("s390/xor: optimized xor routing using the XC instruction") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-2-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19s390/stackleak: Fix __stackleak_poison() inline assembly constraintHeiko Carstens
commit 674c5ff0f440a051ebf299d29a4c013133d81a65 upstream. The __stackleak_poison() inline assembly comes with a "count" operand where the "d" constraint is used. "count" is used with the exrl instruction and "d" means that the compiler may allocate any register from 0 to 15. If the compiler would allocate register 0 then the exrl instruction would not or the value of "count" into the executed instruction - resulting in a stackframe which is only partially poisoned. Use the correct "a" constraint, which excludes register 0 from register allocation. Fixes: 2a405f6bb3a5 ("s390/stackleak: provide fast __stackleak_poison() implementation") Cc: stable@vger.kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20260302133500.1560531-4-hca@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19parisc: Check kernel mapping earlier at bootupHelge Deller
commit 17c144f1104bfc29a3ce3f7d0931a1bfb7a3558c upstream. The check if the initial mapping is sufficient needs to happen much earlier during bootup. Move this test directly to the start_parisc() function and use native PDC iodc functions to print the warning, because panic() and printk() are not functional yet. This fixes boot when enabling various KALLSYSMS options which need much more space. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faultsPiotr Jaroszynski
commit 97c5550b763171dbef61e6239cab372b9f9cd4a2 upstream. contpte_ptep_set_access_flags() compared the gathered ptep_get() value against the requested entry to detect no-ops. ptep_get() ORs AF/dirty from all sub-PTEs in the CONT block, so a dirty sibling can make the target appear already-dirty. When the gathered value matches entry, the function returns 0 even though the target sub-PTE still has PTE_RDONLY set in hardware. For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered across the CONT range. But page-table walkers that evaluate each descriptor individually (e.g. a CPU without DBM support, or an SMMU without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the unchanged target sub-PTE, causing an infinite fault loop. Gathering can therefore cause false no-ops when only a sibling has been updated: - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared) - read faults: target still lacks PTE_AF Fix by checking each sub-PTE against the requested AF/dirty/write state (the same bits consumed by __ptep_set_access_flags()), using raw per-PTE values rather than the gathered ptep_get() view, before returning no-op. Keep using the raw target PTE for the write-bit unfold decision. Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT range may become the effective cached translation and software must maintain consistent attributes across the range. Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings") Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: James Houghton <jthoughton@google.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Breno Leitao <leitao@debian.org> Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19parisc: Fix initial page table creation for bootHelge Deller
commit 8475d8fe21ec9c7eb2faca555fbc5b68cf0d2597 upstream. The KERNEL_INITIAL_ORDER value defines the initial size (usually 32 or 64 MB) of the page table during bootup. Up until now the whole area was initialized with PTE entries, but there was no check if we filled too many entries. Change the code to fill up with so many entries that the "_end" symbol can be reached by the kernel, but not more entries than actually fit into the initial PTE tables. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19arm64: mm: Add PTE_DIRTY back to PAGE_KERNEL* to fix kexec/hibernationCatalin Marinas
commit c25c4aa3f79a488cc270507935a29c07dc6bddfc upstream. Commit 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()") changed pte_mkwrite_novma() to only clear PTE_RDONLY when PTE_DIRTY is set. This was to allow writable-clean PTEs for swap pages that haven't actually been written. However, this broke kexec and hibernation for some platforms. Both go through trans_pgd_create_copy() -> _copy_pte(), which calls pte_mkwrite_novma() to make the temporary linear-map copy fully writable. With the updated pte_mkwrite_novma(), read-only kernel pages (without PTE_DIRTY) remain read-only in the temporary mapping. While such behaviour is fine for user pages where hardware DBM or trapping will make them writeable, subsequent in-kernel writes by the kexec relocation code will fault. Add PTE_DIRTY back to all _PAGE_KERNEL* protection definitions. This was the case prior to 5.4, commit aa57157be69f ("arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default"). With the kernel linear-map PTEs always having PTE_DIRTY set, pte_mkwrite_novma() correctly clears PTE_RDONLY. Fixes: 143937ca51cc ("arm64, mm: avoid always making PTE dirty in pte_mkwrite()") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@vger.kernel.org Reported-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com> Link: https://lore.kernel.org/r/20251204062722.3367201-1-jianpeng.chang.cn@windriver.com Cc: Will Deacon <will@kernel.org> Cc: Huang, Ying <ying.huang@linux.alibaba.com> Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19parisc: Increase initial mapping to 64 MB with KALLSYMSHelge Deller
commit 8e732934fb81282be41602550e7e07baf265e972 upstream. The 32MB initial kernel mapping can become too small when CONFIG_KALLSYMS is used. Increase the mapping to 64 MB in this case. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19arm64: gcs: Honour mprotect(PROT_NONE) on shadow stack mappingsCatalin Marinas
commit 47a8aad135ac1aed04b7b0c0a8157fd208075827 upstream. vm_get_page_prot() short-circuits the protection_map[] lookup for a VM_SHADOW_STACK mapping since it uses a different PIE index from the typical read/write/exec permissions. However, the side effect is that it also ignores mprotect(PROT_NONE) by creating an accessible PTE. Special-case the !(vm_flags & VM_ACCESS_FLAGS) flags to use the protection_map[VM_NONE] permissions instead. No GCS attributes are required for an inaccessible PTE. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") Cc: stable@vger.kernel.org Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: David Hildenbrand <david@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19s390/pfault: Fix virtual vs physical address confusionAlexander Gordeev
commit d879ac6756b662a085a743e76023c768c3241579 upstream. When Linux is running as guest, runs a user space process and the user space process accesses a page that the host has paged out, the guest gets a pfault interrupt and schedules a different process. Without this mechanism the host would have to suspend the whole virtual CPU until the page has been paged in. To setup the pfault interrupt the real address of parameter list should be passed to DIAGNOSE 0x258, but a virtual address is passed instead. That has a performance impact, since the pfault setup never succeeds, the interrupt is never delivered to a guest and the whole virtual CPU is suspended as result. Cc: stable@vger.kernel.org Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Reported-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19powerpc, perf: Check that current->mm is alive before getting user callchainViktor Malik
[ Upstream commit e9bbfb4bfa86c6b5515b868d6982ac60505d7e39 ] It may happen that mm is already released, which leads to kernel panic. This adds the NULL check for current->mm, similarly to commit 20afc60f892d ("x86, perf: Check that current->mm is alive before getting user callchain"). I was getting this panic when running a profiling BPF program (profile.py from bcc-tools): [26215.051935] Kernel attempted to read user page (588) - exploit attempt? (uid: 0) [26215.051950] BUG: Kernel NULL pointer dereference on read at 0x00000588 [26215.051952] Faulting instruction address: 0xc00000000020fac0 [26215.051957] Oops: Kernel access of bad area, sig: 11 [#1] [...] [26215.052049] Call Trace: [26215.052050] [c000000061da6d30] [c00000000020fc10] perf_callchain_user_64+0x2d0/0x490 (unreliable) [26215.052054] [c000000061da6dc0] [c00000000020f92c] perf_callchain_user+0x1c/0x30 [26215.052057] [c000000061da6de0] [c0000000005ab2a0] get_perf_callchain+0x100/0x360 [26215.052063] [c000000061da6e70] [c000000000573bc8] bpf_get_stackid+0x88/0xf0 [26215.052067] [c000000061da6ea0] [c008000000042258] bpf_prog_16d4ab9ab662f669_do_perf_event+0xf8/0x274 [...] In addition, move storing the top-level stack entry to generic perf_callchain_user to make sure the top-evel entry is always captured, even if current->mm is NULL. Fixes: 20002ded4d93 ("perf_counter: powerpc: Add callchain support") Signed-off-by: Viktor Malik <vmalik@redhat.com> Tested-by: Qiao Zhao <qzhao@redhat.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> [Maddy: fixed message to avoid checkpatch format style error] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260309144045.169427-1-vmalik@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19KVM: arm64: pkvm: Fallback to level-3 mapping on host stage-2 faultMarc Zyngier
commit 8531d5a83d8eb8affb5c0249b466c28d94192603 upstream. If, for any odd reason, we cannot converge to mapping size that is completely contained in a memblock region, we fail to install a S2 mapping and go back to the faulting instruction. Rince, repeat. This happens when faulting in regions that are smaller than a page or that do not have PAGE_SIZE-aligned boundaries (as witnessed on an O6 board that refuses to boot in protected mode). In this situation, fallback to using a PAGE_SIZE mapping anyway -- it isn't like we can go any lower. Fixes: e728e705802fe ("KVM: arm64: Adjust range correctly during host stage-2 faults") Link: https://lore.kernel.org/r/86wlzr77cn.wl-maz@kernel.org Cc: stable@vger.kernel.org Cc: Quentin Perret <qperret@google.com> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://patch.msgid.link/20260305132751.2928138-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APICSean Christopherson
commit 3989a6d036c8ec82c0de3614bed23a1dacd45de5 upstream. Initialize all per-vCPU AVIC control fields in the VMCB if AVIC is enabled in KVM and the VM has an in-kernel local APIC, i.e. if it's _possible_ the vCPU could activate AVIC at any point in its lifecycle. Configuring the VMCB if and only if AVIC is active "works" purely because of optimizations in kvm_create_lapic() to speculatively set apicv_active if AVIC is enabled *and* to defer updates until the first KVM_RUN. In quotes because KVM likely won't do the right thing if kvm_apicv_activated() is false, i.e. if a vCPU is created while APICv is inhibited at the VM level for whatever reason. E.g. if the inhibit is *removed* before KVM_REQ_APICV_UPDATE is handled in KVM_RUN, then __kvm_vcpu_update_apicv() will elide calls to vendor code due to seeing "apicv_active == activate". Cleaning up the initialization code will also allow fixing a bug where KVM incorrectly leaves CR8 interception enabled when AVIC is activated without creating a mess with respect to whether AVIC is activated or not. Cc: stable@vger.kernel.org Fixes: 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before enabling AVIC") Fixes: 6c3e4422dd20 ("svm: Add support for dynamic APICv") Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Reviewed-by: Jim Mattson <jmattson@google.com> Link: https://patch.msgid.link/20260203190711.458413-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMMJim Mattson
commit e2ffe85b6d2bb7780174b87aa4468a39be17eb81 upstream. Add KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM to allow L1 to set FREEZE_IN_SMM in vmcs12's GUEST_IA32_DEBUGCTL field, as permitted prior to commit 6b1dd26544d0 ("KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest"). Enable the quirk by default for backwards compatibility (like all quirks); userspace can disable it via KVM_CAP_DISABLE_QUIRKS2 for consistency with the constraints on WRMSR(IA32_DEBUGCTL). Note that the quirk only bypasses the consistency check. The vmcs02 bit is still owned by the host, and PMCs are not frozen during virtualized SMM. In particular, if a host administrator decides that PMCs should not be frozen during physical SMM, then L1 has no say in the matter. Fixes: 095686e6fcb4 ("KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter") Cc: stable@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://patch.msgid.link/20260205231537.1278753-1-jmattson@google.com [sean: tag for stable@, clean-up and fix goofs in the comment and docs] Signed-off-by: Sean Christopherson <seanjc@google.com> [Rename quirk. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19KVM: arm64: Fix protected mode handling of pages larger than 4kBMarc Zyngier
commit 08f97454b7fa39bfcf82524955c771d2d693d6fe upstream. Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"), pKVM tracks the memory that has been mapped into a guest in a side data structure. Crucially, it uses it to find out whether a page has already been mapped, and therefore refuses to map it twice. So far, so good. However, this very patch completely breaks non-4kB page support, with guests being unable to boot. The most obvious symptom is that we take the same fault repeatedly, and not making forward progress. A quick investigation shows that this is because of the above rejection code. As it turns out, there are multiple issues at play: - while the HPFAR_EL2 register gives you the faulting IPA minus the bottom 12 bits, it will still give you the extra bits that are part of the page offset for anything larger than 4kB, even for a level-3 mapping - pkvm_pgtable_stage2_map() assumes that the address passed as a parameter is aligned to the size of the intended mapping - the faulting address is only aligned for a non-page mapping When the planets are suitably aligned (pun intended), the guest faults on a page by accessing it past the bottom 4kB, and extra bits get set in the HPFAR_EL2 register. If this results in a page mapping (which is likely with large granule sizes), nothing aligns it further down, and pkvm_mapping_iter_first() finds an intersection that doesn't really exist. We assume this is a spurious fault and return -EAGAIN. And again... This doesn't hit outside of the protected code, as the page table code always aligns the IPA down to a page boundary, hiding the issue for everyone else. Fix it by always forcing the alignment on vma_pagesize, irrespective of the value of vma_pagesize. Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings") Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://https://patch.msgid.link/20260222141000.3084258-1-maz@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-19powerpc: 83xx: km83xx: Fix keymile vendor prefixJ. Neuschäfer
[ Upstream commit 691417ffe7821721e0a28bd25ad8c0dc0d4ae4ad ] When kmeter.c was refactored into km83xx.c in 2011, the "keymile" vendor prefix was changed to upper-case "Keymile". The devicetree at arch/powerpc/boot/dts/kmeter1.dts never underwent the same change, suggesting that this was simply a mistake. Fixes: 93e2b95c81042d ("powerpc/83xx: rename and update kmeter1") Signed-off-by: J. Neuschäfer <j.ne@posteo.net> Reviewed-by: Heiko Schocher <hs@nabladev.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260303-keymile-v1-1-463a11e71702@posteo.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19powerpc/crash: adjust the elfcorehdr sizeSourabh Jain
[ Upstream commit 04e707cb77c272cb0bb2e2e3c5c7f844d804a089 ] With crash hotplug support enabled, additional memory is allocated to the elfcorehdr kexec segment to accommodate resources added during memory hotplug events. However, the kdump FDT is not updated with the same size, which can result in elfcorehdr corruption in the kdump kernel. Update elf_headers_sz (the kimage member representing the size of the elfcorehdr kexec segment) to reflect the total memory allocated for the elfcorehdr segment instead of the elfcorehdr buffer size at the time of kdump load. This allows of_kexec_alloc_and_setup_fdt() to reserve the full elfcorehdr memory in the kdump FDT and prevents elfcorehdr corruption. Fixes: 849599b702ef8 ("powerpc/crash: add crash memory hotplug support") Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260227171801.2238847-1-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19powerpc/kexec/core: use big-endian types for crash variablesSourabh Jain
[ Upstream commit 20197b967a6a29dab81495f25a988515bda84cfe ] Use explicit word-sized big-endian types for kexec and crash related variables. This makes the endianness unambiguous and avoids type mismatches that trigger sparse warnings. The change addresses sparse warnings like below (seen on both 32-bit and 64-bit builds): CHECK ../arch/powerpc/kexec/core.c sparse: expected unsigned int static [addressable] [toplevel] [usertype] crashk_base sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned int static [addressable] [toplevel] [usertype] crashk_size sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned long long static [addressable] [toplevel] mem_limit sparse: got restricted __be32 [usertype] sparse: warning: incorrect type in assignment (different base types) sparse: expected unsigned int static [addressable] [toplevel] [usertype] kernel_end sparse: got restricted __be32 [usertype] No functional change intended. Fixes: ea961a828fe7 ("powerpc: Fix endian issues in kexec and crash dump code") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512221405.VHPKPjnp-lkp@intel.com/ Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20251224151257.28672-1-sourabhjain@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19kexec: Include kernel-end even without crashkernelBen Collins
[ Upstream commit 38c64dfe0af12778953846df5f259e913275cfe5 ] Certain versions of kexec don't even work without kernel-end being added to the device-tree. Add it even if crash-kernel is disabled. Signed-off-by: Ben Collins <bcollins@kernel.org> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/2025042122-inescapable-mandrill-8a5ff2@boujee-and-buff Stable-dep-of: 20197b967a6a ("powerpc/kexec/core: use big-endian types for crash variables") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-19powerpc/uaccess: Fix inline assembly for clang build on PPC32Christophe Leroy (CS GROUP)
[ Upstream commit 0ee95a1d458630272d0415d0ffa9424fcb606c90 ] Test robot reports the following error with clang-16.0.6: In file included from kernel/rseq.c:75: include/linux/rseq_entry.h:141:3: error: invalid operand for instruction unsafe_get_user(offset, &ucs->post_commit_offset, efault); ^ include/linux/uaccess.h:608:2: note: expanded from macro 'unsafe_get_user' arch_unsafe_get_user(x, ptr, local_label); \ ^ arch/powerpc/include/asm/uaccess.h:518:2: note: expanded from macro 'arch_unsafe_get_user' __get_user_size_goto(__gu_val, __gu_addr, sizeof(*(p)), e); \ ^ arch/powerpc/include/asm/uaccess.h:284:2: note: expanded from macro '__get_user_size_goto' __get_user_size_allowed(x, ptr, size, __gus_retval); \ ^ arch/powerpc/include/asm/uaccess.h:275:10: note: expanded from macro '__get_user_size_allowed' case 8: __get_user_asm2(x, (u64 __user *)ptr, retval); break; \ ^ arch/powerpc/include/asm/uaccess.h:258:4: note: expanded from macro '__get_user_asm2' " li %1+1,0\n" \ ^ <inline asm>:7:5: note: instantiated into assembly here li 31+1,0 ^ 1 error generated. On PPC32, for 64 bits vars a pair of registers is used. Usually the lower register in the pair is the high part and the higher register is the low part. GCC uses r3/r4 ... r11/r12 ... r14/r15 ... r30/r31 In older kernel code inline assembly was using %1 and %1+1 to represent 64 bits values. However here it looks like clang uses r31 as high part, allthough r32 doesn't exist hence the error. Allthoug %1+1 should work, most places now use %L1 instead of %1+1, so let's do the same here. With that change, the build doesn't fail anymore and a disassembly shows clang uses r17/r18 and r31/r14 pair when GCC would have used r16/r17 and r30/r31: Disassembly of section .fixup: 00000000 <.fixup>: 0: 38 a0 ff f2 li r5,-14 4: 3a 20 00 00 li r17,0 8: 3a 40 00 00 li r18,0 c: 48 00 00 00 b c <.fixup+0xc> c: R_PPC_REL24 .text+0xbc 10: 38 a0 ff f2 li r5,-14 14: 3b e0 00 00 li r31,0 18: 39 c0 00 00 li r14,0 1c: 48 00 00 00 b 1c <.fixup+0x1c> 1c: R_PPC_REL24 .text+0x144 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202602021825.otcItxGi-lkp@intel.com/ Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()") Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/8ca3a657a650e497a96bfe7acde2f637dadab344.1770103646.git.chleroy@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12ARM: clean up the memset64() C wrapperThomas Weißschuh
commit b52343d1cb47bb27ca32a3f4952cc2fd3cd165bf upstream. The current logic to split the 64-bit argument into its 32-bit halves is byte-order specific and a bit clunky. Use a union instead which is easier to read and works in all cases. GCC still generates the same machine code. While at it, rename the arguments of the __memset64() prototype to actually reflect their semantics. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Ben Hutchings <ben@decadent.org.uk> # for -stable Link: https://lore.kernel.org/all/1a11526ae3d8664f705b541b8d6ea57b847b49a8.camel@decadent.org.uk/ Suggested-by: https://lore.kernel.org/all/aZonkWMwpbFhzDJq@casper.infradead.org/ # for -stable Link: https://lore.kernel.org/all/aZonkWMwpbFhzDJq@casper.infradead.org/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12kbuild: Split .modinfo out from ELF_DETAILSNathan Chancellor
commit 8678591b47469fe16357234efef9b260317b8be4 upstream. Commit 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") added .modinfo to ELF_DETAILS while removing it from COMMON_DISCARDS, as it was needed in vmlinux.unstripped and ELF_DETAILS was present in all architecture specific vmlinux linker scripts. While this shuffle is fine for vmlinux, ELF_DETAILS and COMMON_DISCARDS may be used by other linker scripts, such as the s390 and x86 compressed boot images, which may not expect to have a .modinfo section. In certain circumstances, this could result in a bootloader failing to load the compressed kernel [1]. Commit ddc6cbef3ef1 ("s390/boot/vmlinux.lds.S: Ensure bzImage ends with SecureBoot trailer") recently addressed this for the s390 bzImage but the same bug remains for arm, parisc, and x86. The presence of .modinfo in the x86 bzImage was the root cause of the issue worked around with commit d50f21091358 ("kbuild: align modinfo section for Secureboot Authenticode EDK2 compat"). misc.c in arch/x86/boot/compressed includes lib/decompress_unzstd.c, which in turn includes lib/xxhash.c and its MODULE_LICENSE / MODULE_DESCRIPTION macros due to the STATIC definition. Split .modinfo out from ELF_DETAILS into its own macro and handle it in all vmlinux linker scripts. Discard .modinfo in the places where it was previously being discarded from being in COMMON_DISCARDS, as it has never been necessary in those uses. Cc: stable@vger.kernel.org Fixes: 3e86e4d74c04 ("kbuild: keep .modinfo section in vmlinux.unstripped") Reported-by: Ed W <lists@wildgooses.com> Closes: https://lore.kernel.org/587f25e0-a80e-46a5-9f01-87cb40cfa377@wildgooses.com/ [1] Tested-by: Ed W <lists@wildgooses.com> # x86_64 Link: https://patch.msgid.link/20260225-separate-modinfo-from-elf-details-v1-1-387ced6baf4b@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12arm64: gcs: Do not set PTE_SHARED on GCS mappings if FEAT_LPA2 is enabledCatalin Marinas
commit 8a85b3131225a8c8143ba2ae29c0eef8c1f9117f upstream. When FEAT_LPA2 is enabled, bits 8-9 of the PTE replace the shareability attribute with bits 50-51 of the output address. The _PAGE_GCS{,_RO} definitions include the PTE_SHARED bits as 0b11 (this matches the other _PAGE_* definitions) but using this macro directly leads to the following panic when enabling GCS on a system/model with LPA2: Unable to handle kernel paging request at virtual address fffff1ffc32d8008 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000060f4d000 [fffff1ffc32d8008] pgd=100000006184b003, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP CPU: 0 UID: 0 PID: 513 Comm: gcs_write_fault Tainted: G M 7.0.0-rc1 #1 PREEMPT Tainted: [M]=MACHINE_CHECK Hardware name: QEMU QEMU Virtual Machine, BIOS 2025.02-8+deb13u1 11/08/2025 pstate: 03402005 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : zap_huge_pmd+0x168/0x468 lr : zap_huge_pmd+0x2c/0x468 sp : ffff800080beb660 x29: ffff800080beb660 x28: fff00000c2058180 x27: ffff800080beb898 x26: fff00000c2058180 x25: ffff800080beb820 x24: 00c800010b600f41 x23: ffffc1ffc30af1a8 x22: fff00000c2058180 x21: 0000ffff8dc00000 x20: fff00000c2bc6370 x19: ffff800080beb898 x18: ffff800080bebb60 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000007 x14: 000000000000000a x13: 0000aaaacbbbffff x12: 0000000000000000 x11: 0000ffff8ddfffff x10: 00000000000001fe x9 : 0000ffff8ddfffff x8 : 0000ffff8de00000 x7 : 0000ffff8da00000 x6 : fff00000c2bc6370 x5 : 0000ffff8da00000 x4 : 000000010b600000 x3 : ffffc1ffc0000000 x2 : fff00000c2058180 x1 : fffff1ffc32d8000 x0 : 000000c00010b600 Call trace: zap_huge_pmd+0x168/0x468 (P) unmap_page_range+0xd70/0x1560 unmap_single_vma+0x48/0x80 unmap_vmas+0x90/0x180 unmap_region+0x88/0xe4 vms_complete_munmap_vmas+0xf8/0x1e0 do_vmi_align_munmap+0x158/0x180 do_vmi_munmap+0xac/0x160 __vm_munmap+0xb0/0x138 vm_munmap+0x14/0x20 gcs_free+0x70/0x80 mm_release+0x1c/0xc8 exit_mm_release+0x28/0x38 do_exit+0x190/0x8ec do_group_exit+0x34/0x90 get_signal+0x794/0x858 arch_do_signal_or_restart+0x11c/0x3e0 exit_to_user_mode_loop+0x10c/0x17c el0_da+0x8c/0x9c el0t_64_sync_handler+0xd0/0xf0 el0t_64_sync+0x198/0x19c Code: aa1603e2 d34cfc00 cb813001 8b011861 (f9400420) Similarly to how the kernel handles protection_map[], use a gcs_page_prot variable to store the protection bits and clear PTE_SHARED if LPA2 is enabled. Also remove the unused PAGE_GCS{,_RO} macros. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 6497b66ba694 ("arm64/mm: Map pages for guarded control stack") Reported-by: Emanuele Rocca <emanuele.rocca@arm.com> Cc: stable@vger.kernel.org Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12x86/boot/sev: Move SEV decompressor variables into the .data sectionTom Lendacky
commit 4ca191cec17a997d0e3b2cd312f3a884288acc27 upstream. As part of the work to remove the dependency on calling into the decompressor code (startup_64()) for a UEFI boot, a call to rmpadjust() was removed from sev_enable() in favor of checking the value of the snp_vmpl variable. When booting through a non-UEFI path and calling startup_64(), the call to sev_enable() is performed before the BSS section is zeroed. With the removal of the rmpadjust() call and the corresponding check of the return code, the snp_vmpl variable is checked. Since the kernel is running at VMPL0, the snp_vmpl variable will not have been set and should be the default value of 0. However, since the call occurs before the BSS is zeroed, the snp_vmpl variable may not actually be zero, which will cause the guest boot to fail. Since the decompressor relocates itself, the BSS would need to be cleared both before and after the relocation, but this would, in effect, cause all of the changes to BSS variables before relocation to be lost after relocation. Instead, move the snp_vmpl variable into the .data section so that it is initialized and the value made safe during relocation. As a pre-caution against future changes, move other SEV-related decompressor variables into the .data section, too. Fixes: 68a501d7fd82 ("x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Changyuan Lyu <changyuanl@google.com> Tested-by: Kevin Hui <kevinhui@meta.com> Tested-by: Changyuan Lyu <changyuanl@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/5648b7de5b0a5d0dfef3785f9582b718678c6448.1770217260.git.thomas.lendacky@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12x86/sev: Allow IBPB-on-Entry feature for SNP guestsKim Phillips
commit 9073428bb204d921ae15326bb7d4558d9d269aab upstream. The SEV-SNP IBPB-on-Entry feature does not require a guest-side implementation. It was added in Zen5 h/w, after the first SNP Zen implementation, and thus was not accounted for when the initial set of SNP features were added to the kernel. In its abundant precaution, commit 8c29f0165405 ("x86/sev: Add SEV-SNP guest feature negotiation support") included SEV_STATUS' IBPB-on-Entry bit as a reserved bit, thereby masking guests from using the feature. Allow guests to make use of IBPB-on-Entry when supported by the hypervisor, as the bit is now architecturally defined and safe to expose. Fixes: 8c29f0165405 ("x86/sev: Add SEV-SNP guest feature negotiation support") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@kernel.org Link: https://patch.msgid.link/20260203222405.4065706-2-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12x86/boot: Handle relative CONFIG_EFI_SBAT_FILE file pathsJan Stancek
commit 3d1973a0c76a78a4728cff13648a188ed486cf44 upstream. CONFIG_EFI_SBAT_FILE can be a relative path. When compiling using a different output directory (O=) the build currently fails because it can't find the filename set in CONFIG_EFI_SBAT_FILE: arch/x86/boot/compressed/sbat.S: Assembler messages: arch/x86/boot/compressed/sbat.S:6: Error: file not found: kernel.sbat Add $(srctree) as include dir for sbat.o. [ bp: Massage commit message. ] Fixes: 61b57d35396a ("x86/efi: Implement support for embedding SBAT data for x86") Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/f4eda155b0cef91d4d316b4e92f5771cb0aa7187.1772047658.git.jstancek@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12perf/x86/intel/uncore: Add per-scheduler IMC CAS count eventsZide Chen
commit 6a8a48644c4b804123e59dbfc5d6cd29a0194046 upstream. IMC on SPR and EMR does not support sub-channels. In contrast, CPUs that use gnr_uncores[] (e.g. Granite Rapids and Sierra Forest) implement two command schedulers (SCH0/SCH1) per memory channel, providing logically independent command and data paths. Do not reuse the spr_uncore_imc[] configuration for these CPUs. Instead, introduce a dedicated gnr_uncore_imc[] with per-scheduler events, so userspace can monitor SCH0 and SCH1 independently. On these CPUs, replace cas_count_{read,write} with cas_count_{read,write}_sch{0,1}. This may break existing userspace that relies on cas_count_{read,write}, prompting it to switch to the per-scheduler events, as the legacy event reports only partial traffic (SCH0). Fixes: 632c4bf6d007 ("perf/x86/intel/uncore: Support Granite Rapids") Fixes: cb4a6ccf3583 ("perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge") Reported-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260210005225.20311-1-zide.chen@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12x86/efi: defer freeing of boot services memoryMike Rapoport (Microsoft)
commit a4b0bf6a40f3c107c67a24fbc614510ef5719980 upstream. efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA using memblock_free_late(). There are two issue with that: memblock_free_late() should be used for memory allocated with memblock_alloc() while the memory reserved with memblock_reserve() should be freed with free_reserved_area(). More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y efi_free_boot_services() is called before deferred initialization of the memory map is complete. Benjamin Herrenschmidt reports that this causes a leak of ~140MB of RAM on EC2 t3a.nano instances which only have 512MB or RAM. If the freed memory resides in the areas that memory map for them is still uninitialized, they won't be actually freed because memblock_free_late() calls memblock_free_pages() and the latter skips uninitialized pages. Using free_reserved_area() at this point is also problematic because __free_page() accesses the buddy of the freed page and that again might end up in uninitialized part of the memory map. Delaying the entire efi_free_boot_services() could be problematic because in addition to freeing boot services memory it updates efi.memmap without any synchronization and that's undesirable late in boot when there is concurrency. More robust approach is to only defer freeing of the EFI boot services memory. Split efi_free_boot_services() in two. First efi_unmap_boot_services() collects ranges that should be freed into an array then efi_free_boot_services() later frees them after deferred init is complete. Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode") Cc: <stable@vger.kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-12LoongArch: Remove some extern variables in source filesTiezhu Yang
[ Upstream commit 0e6f596d6ac635e80bb265d587b2287ef8fa1cd6 ] There are declarations of the variable "eentry", "pcpu_handlers[]" and "exception_handlers[]" in asm/setup.h, the source files already include this header file directly or indirectly, so no need to declare them in the source files, just remove the code. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12LoongArch: Handle percpu handler address for ORC unwinderTiezhu Yang
[ Upstream commit 055c7e75190e0be43037bd663a3f6aced194416e ] After commit 4cd641a79e69 ("LoongArch: Remove unnecessary checks for ORC unwinder"), the system can not boot normally under some configs (such as enable KASAN), there are many error messages "cannot find unwind pc". The kernel boots normally with the defconfig, so no problem found out at the first time. Here is one way to reproduce: cd linux make mrproper defconfig -j"$(nproc)" scripts/config -e KASAN make olddefconfig all -j"$(nproc)" sudo make modules_install sudo make install sudo reboot The address that can not unwind is not a valid kernel address which is between "pcpu_handlers[cpu]" and "pcpu_handlers[cpu] + vec_sz" due to the code of eentry was copied to the new area of pcpu_handlers[cpu] in setup_tlb_handler(), handle this special case to get the valid address to unwind normally. Cc: stable@vger.kernel.org Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12LoongArch: Remove unnecessary checks for ORC unwinderTiezhu Yang
[ Upstream commit 4cd641a79e69270a062777f64a0dd330abb9044a ] According to the following function definitions, __kernel_text_address() already checks __module_text_address(), so it should remove the check of __module_text_address() in bt_address() at least. int __kernel_text_address(unsigned long addr) { if (kernel_text_address(addr)) return 1; ... return 0; } int kernel_text_address(unsigned long addr) { bool no_rcu; int ret = 1; ... if (is_module_text_address(addr)) goto out; ... return ret; } bool is_module_text_address(unsigned long addr) { guard(rcu)(); return __module_text_address(addr) != NULL; } Furthermore, there are two checks of __kernel_text_address(), one is in bt_address() and the other is after calling bt_address(), it looks like redundant. Handle the exception address first and then use __kernel_text_address() to validate the calculated address for exception or the normal address in bt_address(), then it can remove the check of __kernel_text_address() after calling bt_address(). Just remove unnecessary checks, no functional changes intended. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Stable-dep-of: 055c7e75190e ("LoongArch: Handle percpu handler address for ORC unwinder") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12KVM: x86: Add x2APIC "features" to control EOI broadcast suppressionKhushit Shah
[ Upstream commit 6517dfbcc918f970a928d9dc17586904bac06893 ] Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support for Suppress EOI Broadcasts when using a split IRQCHIP (I/O APIC emulated by userspace), which KVM completely mishandles. When x2APIC support was first added, KVM incorrectly advertised and "enabled" Suppress EOI Broadcast, without fully supporting the I/O APIC side of the equation, i.e. without adding directed EOI to KVM's in-kernel I/O APIC. That flaw was carried over to split IRQCHIP support, i.e. KVM advertised support for Suppress EOI Broadcasts irrespective of whether or not the userspace I/O APIC implementation supported directed EOIs. Even worse, KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without support for directed EOI came to rely on the "spurious" broadcasts. KVM "fixed" the in-kernel I/O APIC implementation by completely disabling support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but didn't do anything to remedy userspace I/O APIC implementations. KVM's bogus handling of Suppress EOI Broadcast is problematic when the guest relies on interrupts being masked in the I/O APIC until well after the initial local APIC EOI. E.g. Windows with Credential Guard enabled handles interrupts in the following order: 1. Interrupt for L2 arrives. 2. L1 APIC EOIs the interrupt. 3. L1 resumes L2 and injects the interrupt. 4. L2 EOIs after servicing. 5. L1 performs the I/O APIC EOI. Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt storm, e.g. if the IRQ line is still asserted and userspace reacts to the EOI by re-injecting the IRQ, because the guest doesn't de-assert the line until step #4, and doesn't expect the interrupt to be re-enabled until step #5. Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way of knowing if the userspace I/O APIC supports directed EOIs, i.e. suppressing EOI broadcasts would result in interrupts being stuck masked in the userspace I/O APIC due to step #5 being ignored by userspace. And fully disabling support for Suppress EOI Broadcast is also undesirable, as picking up the fix would require a guest reboot, *and* more importantly would change the virtual CPU model exposed to the guest without any buy-in from userspace. Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to explicitly enable or disable support for Suppress EOI Broadcasts. This gives userspace control over the virtual CPU model exposed to the guest, as KVM should never have enabled support for Suppress EOI Broadcast without userspace opt-in. Not setting either flag will result in legacy quirky behavior for backward compatibility. Disallow fully enabling SUPPRESS_EOI_BROADCAST when using an in-kernel I/O APIC, as KVM's history/support is just as tragic. E.g. it's not clear that commit c806a6ad35bf ("KVM: x86: call irq notifiers with directed EOI") was entirely correct, i.e. it may have simply papered over the lack of Directed EOI emulation in the I/O APIC. Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI is to support Directed EOI (KVM's name) irrespective of guest CPU vendor. Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs") Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com Cc: stable@vger.kernel.org Suggested-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Khushit Shah <khushit.shah@nutanix.com> Link: https://patch.msgid.link/20260123125657.3384063-1-khushit.shah@nutanix.com [sean: clean up minor formatting goofs and fix a comment typo] Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12ARM: dts: imx53-usbarmory: Replace license text comment with SPDX identifierBence Csókás
[ Upstream commit faa6baa36497958dd8fd5561daa37249779446d7 ] Replace verbatim license text with a `SPDX-License-Identifier`. The comment header mis-attributes this license to be "X11", but the license text does not include the last line "Except as contained in this notice, the name of the X Consortium shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Software without prior written authorization from the X Consortium.". Therefore, this license is actually equivalent to the SPDX "MIT" license (confirmed by text diffing). Cc: Andrej Rosano <andrej@inversepath.com> Signed-off-by: Bence Csókás <csokas.bence@prolan.hu> Acked-by: Andrej Rosano <andrej.rosano@reversec.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Stable-dep-of: 43d67ec26b32 ("PCI: dwc: ep: Fix resizable BAR support for multi-PF configurations") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12arm64: dts: rockchip: Fix rk3588 PCIe range mappingsShawn Lin
[ Upstream commit 46c56b737161060dfa468f25ae699749047902a2 ] The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so that there is no same address allocated from normal system memory. Otherwise it's broken if the same address assigned to the EP for DMA purpose.Fix it to sync with the vendor BSP. Fixes: 0acf4fa7f187 ("arm64: dts: rockchip: add PCIe3 support for rk3588") Fixes: 8d81b77f4c49 ("arm64: dts: rockchip: add rk3588 PCIe2 support") Cc: stable@vger.kernel.org Cc: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://patch.msgid.link/1767600929-195341-2-git-send-email-shawn.lin@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12arm64: dts: rockchip: Fix rk356x PCIe range mappingsShawn Lin
[ Upstream commit f63ea193a404481f080ca2958f73e9f364682db9 ] The pcie bus address should be mapped 1:1 to the cpu side MMIO address, so that there is no same address allocated from normal system memory. Otherwise it's broken if the same address assigned to the EP for DMA purpose.Fix it to sync with the vendor BSP. Fixes: 568a67e742df ("arm64: dts: rockchip: Fix rk356x PCIe register and range mappings") Fixes: 66b51ea7d70f ("arm64: dts: rockchip: Add rk3568 PCIe2x1 controller") Cc: stable@vger.kernel.org Cc: Andrew Powers-Holmes <aholmes@omnom.net> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Link: https://patch.msgid.link/1767600929-195341-1-git-send-email-shawn.lin@rock-chips.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12x86/uprobes: Fix XOL allocation failure for 32-bit tasksOleg Nesterov
[ Upstream commit d55c571e4333fac71826e8db3b9753fadfbead6a ] This script #!/usr/bin/bash echo 0 > /proc/sys/kernel/randomize_va_space echo 'void main(void) {}' > TEST.c # -fcf-protection to ensure that the 1st endbr32 insn can't be emulated gcc -m32 -fcf-protection=branch TEST.c -o test bpftrace -e 'uprobe:./test:main {}' -c ./test "hangs", the probed ./test task enters an endless loop. The problem is that with randomize_va_space == 0 get_unmapped_area(TASK_SIZE - PAGE_SIZE) called by xol_add_vma() can not just return the "addr == TASK_SIZE - PAGE_SIZE" hint, this addr is used by the stack vma. arch_get_unmapped_area_topdown() doesn't take TIF_ADDR32 into account and in_32bit_syscall() is false, this leads to info.high_limit > TASK_SIZE. vm_unmapped_area() happily returns the high address > TASK_SIZE and then get_unmapped_area() returns -ENOMEM after the "if (addr > TASK_SIZE - len)" check. handle_swbp() doesn't report this failure (probably it should) and silently restarts the probed insn. Endless loop. I think that the right fix should change the x86 get_unmapped_area() paths to rely on TIF_ADDR32 rather than in_32bit_syscall(). Note also that if CONFIG_X86_X32_ABI=y, in_x32_syscall() falsely returns true in this case because ->orig_ax = -1. But we need a simple fix for -stable, so this patch just sets TS_COMPAT if the probed task is 32-bit to make in_ia32_syscall() true. Fixes: 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Reported-by: Paulo Andrade <pandrade@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/aV5uldEvV7pb4RA8@redhat.com/ Cc: stable@vger.kernel.org Link: https://patch.msgid.link/aWO7Fdxn39piQnxu@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12unwind_user/x86: Teach FP unwind about start of functionPeter Zijlstra
[ Upstream commit ae25884ad749e7f6e0c3565513bdc8aa2554a425 ] When userspace is interrupted at the start of a function, before we get a chance to complete the frame, unwind will miss one caller. X86 has a uprobe specific fixup for this, add bits to the generic unwinder to support this. Suggested-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251024145156.GM4068168@noisy.programming.kicks-ass.net Stable-dep-of: d55c571e4333 ("x86/uprobes: Fix XOL allocation failure for 32-bit tasks") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12unwind_user/x86: Enable frame pointer unwinding on x86Josh Poimboeuf
[ Upstream commit 49cf34c0815f93fb2ea3ab5cfbac1124bd9b45d0 ] Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the unwind_user interfaces can be used. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20250827193828.347397433@kernel.org Stable-dep-of: d55c571e4333 ("x86/uprobes: Fix XOL allocation failure for 32-bit tasks") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12KVM: x86: Ignore -EBUSY when checking nested events from vcpu_block()Sean Christopherson
[ Upstream commit ead63640d4e72e6f6d464f4e31f7fecb79af8869 ] Ignore -EBUSY when checking nested events after exiting a blocking state while L2 is active, as exiting to userspace will generate a spurious userspace exit, usually with KVM_EXIT_UNKNOWN, and likely lead to the VM's demise. Continuing with the wakeup isn't perfect either, as *something* has gone sideways if a vCPU is awakened in L2 with an injected event (or worse, a nested run pending), but continuing on gives the VM a decent chance of surviving without any major side effects. As explained in the Fixes commits, it _should_ be impossible for a vCPU to be put into a blocking state with an already-injected event (exception, IRQ, or NMI). Unfortunately, userspace can stuff MP_STATE and/or injected events, and thus put the vCPU into what should be an impossible state. Don't bother trying to preserve the WARN, e.g. with an anti-syzkaller Kconfig, as WARNs can (hopefully) be added in paths where _KVM_ would be violating x86 architecture, e.g. by WARNing if KVM attempts to inject an exception or interrupt while the vCPU isn't running. Cc: Alessandro Ratti <alessandro@0x65c.net> Cc: stable@vger.kernel.org Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Fixes: 45405155d876 ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject") Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10d4261a580000 Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671bc7a7.050a0220.455e8.022a.GAE@google.com Link: https://patch.msgid.link/20260109030657.994759-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12x86/acpi/boot: Correct acpi_is_processor_usable() check againYazen Ghannam
[ Upstream commit adbf61cc47cb72b102682e690ad323e1eda652c2 ] ACPI v6.3 defined a new "Online Capable" MADT LAPIC flag. This bit is used in conjunction with the "Enabled" MADT LAPIC flag to determine if a CPU can be enabled/hotplugged by the OS after boot. Before the new bit was defined, the "Enabled" bit was explicitly described like this (ACPI v6.0 wording provided): "If zero, this processor is unusable, and the operating system support will not attempt to use it" This means that CPU hotplug (based on MADT) is not possible. Many BIOS implementations follow this guidance. They may include LAPIC entries in MADT for unavailable CPUs, but since these entries are marked with "Enabled=0" it is expected that the OS will completely ignore these entries. However, QEMU will do the same (include entries with "Enabled=0") for the purpose of allowing CPU hotplug within the guest. Comment from QEMU function pc_madt_cpu_entry(): /* ACPI spec says that LAPIC entry for non present * CPU may be omitted from MADT or it must be marked * as disabled. However omitting non present CPU from * MADT breaks hotplug on linux. So possible CPUs * should be put in MADT but kept disabled. */ Recent Linux topology changes broke the QEMU use case. A following fix for the QEMU use case broke bare metal topology enumeration. Rework the Linux MADT LAPIC flags check to allow the QEMU use case only for guests and to maintain the ACPI spec behavior for bare metal. Remove an unnecessary check added to fix a bare metal case introduced by the QEMU "fix". [ bp: Change logic as Michal suggested. ] [ mingo: Removed misapplied -stable tag. ] Fixes: fed8d8773b8e ("x86/acpi/boot: Correct acpi_is_processor_usable() check") Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package") Closes: https://lore.kernel.org/r/20251024204658.3da9bf3f.michal.pecio@gmail.com Reported-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Michal Pecio <michal.pecio@gmail.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://lore.kernel.org/20251111145357.4031846-1-yazen.ghannam@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearingFuad Tabba
[ Upstream commit ef06fd16d48704eac868441d98d4ef083d8f3d07 ] struct bpf_plt contains a u64 target field. Currently, the BPF JIT allocator requests an alignment of 4 bytes (sizeof(u32)) for the JIT buffer. Because the base address of the JIT buffer can be 4-byte aligned (e.g., ending in 0x4 or 0xc), the relative padding logic in build_plt() fails to ensure that target lands on an 8-byte boundary. This leads to two issues: 1. UBSAN reports misaligned-access warnings when dereferencing the structure. 2. More critically, target is updated concurrently via WRITE_ONCE() in bpf_arch_text_poke() while the JIT'd code executes ldr. On arm64, 64-bit loads/stores are only guaranteed to be single-copy atomic if they are 64-bit aligned. A misaligned target risks a torn read, causing the JIT to jump to a corrupted address. Fix this by increasing the allocation alignment requirement to 8 bytes (sizeof(u64)) in bpf_jit_binary_pack_alloc(). This anchors the base of the JIT buffer to an 8-byte boundary, allowing the relative padding math in build_plt() to correctly align the target field. Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20260226075525.233321-1-tabba@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12arm64: io: Extract user memory type in ioremap_prot()Will Deacon
[ Upstream commit 8f098037139b294050053123ab2bc0f819d08932 ] The only caller of ioremap_prot() outside of the generic ioremap() implementation is generic_access_phys(), which passes a 'pgprot_t' value determined from the user mapping of the target 'pfn' being accessed by the kernel. On arm64, the 'pgprot_t' contains all of the non-address bits from the pte, including the permission controls, and so we end up returning a new user mapping from ioremap_prot() which faults when accessed from the kernel on systems with PAN: | Unable to handle kernel read from unreadable memory at virtual address ffff80008ea89000 | ... | Call trace: | __memcpy_fromio+0x80/0xf8 | generic_access_phys+0x20c/0x2b8 | __access_remote_vm+0x46c/0x5b8 | access_remote_vm+0x18/0x30 | environ_read+0x238/0x3e8 | vfs_read+0xe4/0x2b0 | ksys_read+0xcc/0x178 | __arm64_sys_read+0x4c/0x68 Extract only the memory type from the user 'pgprot_t' in ioremap_prot() and assert that we're being passed a user mapping, to protect us against any changes in future that may require additional handling. To avoid falsely flagging users of ioremap(), provide our own ioremap() macro which simply wraps __ioremap_prot(). Cc: Zeng Heng <zengheng4@huawei.com> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 893dea9ccd08 ("arm64: Add HAVE_IOREMAP_PROT support") Reported-by: Jinjiang Tu <tujinjiang@huawei.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12arm64: io: Rename ioremap_prot() to __ioremap_prot()Will Deacon
[ Upstream commit f6bf47ab32e0863df50f5501d207dcdddb7fc507 ] Rename our ioremap_prot() implementation to __ioremap_prot() and convert all arch-internal callers over to the new function. ioremap_prot() remains as a #define to __ioremap_prot() for generic_access_phys() and will be subsequently extended to handle user permissions in 'prot'. Cc: Zeng Heng <zengheng4@huawei.com> Cc: Jinjiang Tu <tujinjiang@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Stable-dep-of: 8f098037139b ("arm64: io: Extract user memory type in ioremap_prot()") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12s390/vtime: Fix virtual timer forwardingHeiko Carstens
[ Upstream commit dbc0fb35679ed5d0adecf7d02137ac2c77244b3b ] Since delayed accounting of system time [1] the virtual timer is forwarded by do_account_vtime() but also vtime_account_kernel(), vtime_account_softirq(), and vtime_account_hardirq(). This leads to double accounting of system, guest, softirq, and hardirq time. Remove accounting from the vtime_account*() family to restore old behavior. There is only one user of the vtimer interface, which might explain why nobody noticed this so far. Fixes: b7394a5f4ce9 ("sched/cputime, s390: Implement delayed accounting of system time") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12s390/idle: Fix cpu idle exit cpu time accountingHeiko Carstens
[ Upstream commit 0d785e2c324c90662baa4fe07a0d02233ff92824 ] With the conversion to generic entry [1] cpu idle exit cpu time accounting was converted from assembly to C. This introduced an reversed order of cpu time accounting. On cpu idle exit the current accounting happens with the following call chain: -> do_io_irq()/do_ext_irq() -> irq_enter_rcu() -> account_hardirq_enter() -> vtime_account_irq() -> vtime_account_kernel() vtime_account_kernel() accounts the passed cpu time since last_update_timer as system time, and updates last_update_timer to the current cpu timer value. However the subsequent call of -> account_idle_time_irq() will incorrectly subtract passed cpu time from timer_idle_enter to the updated last_update_timer value from system_timer. Then last_update_timer is updated to a sys_enter_timer, which means that last_update_timer goes back in time. Subsequently account_hardirq_exit() will account too much cpu time as hardirq time. The sum of all accounted cpu times is still correct, however some cpu time which was previously accounted as system time is now accounted as hardirq time, plus there is the oddity that last_update_timer goes back in time. Restore previous behavior by extracting cpu time accounting code from account_idle_time_irq() into a new update_timer_idle() function and call it before irq_enter_rcu(). Fixes: 56e62a737028 ("s390: convert to generic entry") [1] Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-12x86/cfi: Fix CFI rewrite for odd alignmentsPeter Zijlstra
[ Upstream commit 24c8147abb39618d74fcc36e325765e8fe7bdd7a ] Rustam reported his clang builds did not boot properly; turns out his .config has: CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B=y set. Fix up the FineIBT code to deal with this unusual alignment. Fixes: 931ab63664f0 ("x86/ibt: Implement FineIBT") Reported-by: Rustam Kovhaev <rkovhaev@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Rustam Kovhaev <rkovhaev@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>