summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2026-05-03Add distro-specific configuration.0x221E
2026-04-22KVM: x86: Use scratch field in MMIO fragment to hold small write valuesSean Christopherson
commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 upstream. When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1@huawei.com> Reported-by: Yashu Zhang <zhangjiaji1@huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable@vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22x86-64/arm64/powerpc: clean up and rename __copy_from_user_flushcacheLinus Torvalds
commit 809b997a5ce945ab470f70c187048fe4f5df20bf upstream. This finishes the work on these odd functions that were only implemented by a handful of architectures. The 'flushcache' function was only used from the iterator code, and let's make it do the same thing that the nontemporal version does: remove the two underscores and add the user address checking. Yes, yes, the user address checking is also done at iovec import time, but we have long since walked away from the old double-underscore thing where we try to avoid address checking overhead at access time, and these functions shouldn't be so special and old-fashioned. The arm64 version already did the address check, in fact, so there it's just a matter of renaming it. For powerpc and x86-64 we now do the proper user access boilerplate. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22x86: rename and clean up __copy_from_user_inatomic_nocache()Linus Torvalds
commit 5de7bcaadf160c1716b20a263cf8f5b06f658959 upstream. Similarly to the previous commit, this renames the somewhat confusingly named function. But in this case, it was at least less confusing: the __copy_from_user_inatomic_nocache is indeed copying from user memory, and it is indeed ok to be used in an atomic context, so it will not warn about it. But the previous commit also removed the NTB mis-use of the __copy_from_user_inatomic_nocache() function, and as a result every call-site is now _actually_ doing a real user copy. That means that we can now do the proper user pointer verification too. End result: add proper address checking, remove the double underscores, and change the "nocache" to "nontemporal" to more accurately describe what this x86-only function actually does. It might be worth noting that only the target is non-temporal: the actual user accesses are normal memory accesses. Also worth noting is that non-x86 targets (and on older 32-bit x86 CPU's before XMM2 in the Pentium III) we end up just falling back on a regular user copy, so nothing can actually depend on the non-temporal semantics, but that has always been true. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22x86-64: rename misleadingly named '__copy_user_nocache()' functionLinus Torvalds
commit d187a86de793f84766ea40b9ade7ac60aabbb4fe upstream. This function was a masterclass in bad naming, for various historical reasons. It claimed to be a non-cached user copy. It is literally _neither_ of those things. It's a specialty memory copy routine that uses non-temporal stores for the destination (but not the source), and that does exception handling for both source and destination accesses. Also note that while it works for unaligned targets, any unaligned parts (whether at beginning or end) will not use non-temporal stores, since only words and quadwords can be non-temporal on x86. The exception handling means that it _can_ be used for user space accesses, but not on its own - it needs all the normal "start user space access" logic around it. But typically the user space access would be the source, not the non-temporal destination. That was the original intention of this, where the destination was some fragile persistent memory target that needed non-temporal stores in order to catch machine check exceptions synchronously and deal with them gracefully. Thus that non-descriptive name: one use case was to copy from user space into a non-cached kernel buffer. However, the existing users are a mix of that intended use-case, and a couple of random drivers that just did this as a performance tweak. Some of those random drivers then actively misused the user copying version (with STAC/CLAC and all) to do kernel copies without ever even caring about the exception handling, _just_ for the non-temporal destination. Rename it as a first small step to actually make it halfway sane, and change the prototype to be more normal: it doesn't take a user pointer unless the caller has done the proper conversion, and the argument size is the full size_t (it still won't actually copy more than 4GB in one go, but there's also no reason to silently truncate the size argument in the caller). Finally, use this now sanely named function in the NTB code, which mis-used a user copy version (with STAC/CLAC and all) of this interface despite it not actually being a user copy at all. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: SEV: Drop WARN on large size for KVM_MEMORY_ENCRYPT_REG_REGIONSean Christopherson
commit 8acffeef5ef720c35e513e322ab08e32683f32f2 upstream. Drop the WARN in sev_pin_memory() on npages overflowing an int, as the WARN is comically trivially to trigger from userspace, e.g. by doing: struct kvm_enc_region range = { .addr = 0, .size = -1ul, }; __vm_ioctl(vm, KVM_MEMORY_ENCRYPT_REG_REGION, &range); Note, the checks in sev_mem_enc_register_region() that presumably exist to verify the incoming address+size are completely worthless, as both "addr" and "size" are u64s and SEV is 64-bit only, i.e. they _can't_ be greater than ULONG_MAX. That wart will be cleaned up in the near future. if (range->addr > ULONG_MAX || range->size > ULONG_MAX) return -EINVAL; Opportunistically add a comment to explain why the code calculates the number of pages the "hard" way, e.g. instead of just shifting @ulen. Fixes: 78824fabc72e ("KVM: SVM: fix svn_pin_memory()'s use of get_user_pages_fast()") Cc: stable@vger.kernel.org Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Tested-by: Liam Merwick <liam.merwick@oracle.com> Link: https://patch.msgid.link/20260313003302.3136111-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: SEV: Lock all vCPUs when synchronzing VMSAs for SNP launch finishSean Christopherson
commit cb923ee6a80f4e604e6242a4702b59251e61a380 upstream. Lock all vCPUs when synchronizing and encrypting VMSAs for SNP guests, as allowing userspace to manipulate and/or run a vCPU while its state is being synchronized would at best corrupt vCPU state, and at worst crash the host kernel. Opportunistically assert that vcpu->mutex is held when synchronizing its VMSA (the SEV-ES path already locks vCPUs). Fixes: ad27ce155566 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260310234829.2608037-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: SEV: Disallow LAUNCH_FINISH if vCPUs are actively being createdSean Christopherson
commit 624bf3440d7214b62c22d698a0a294323f331d5d upstream. Reject LAUNCH_FINISH for SEV-ES and SNP VMs if KVM is actively creating one or more vCPUs, as KVM needs to process and encrypt each vCPU's VMSA. Letting userspace create vCPUs while LAUNCH_FINISH is in-progress is "fine", at least in the current code base, as kvm_for_each_vcpu() operates on online_vcpus, LAUNCH_FINISH (all SEV+ sub-ioctls) holds kvm->mutex, and fully onlining a vCPU in kvm_vm_ioctl_create_vcpu() is done under kvm->mutex. I.e. there's no difference between an in-progress vCPU and a vCPU that is created entirely after LAUNCH_FINISH. However, given that concurrent LAUNCH_FINISH and vCPU creation can't possibly work (for any reasonable definition of "work"), since userspace can't guarantee whether a particular vCPU will be encrypted or not, disallow the combination as a hardening measure, to reduce the probability of introducing bugs in the future, and to avoid having to reason about the safety of future changes related to LAUNCH_FINISH. Cc: Jethro Beekman <jethro@fortanix.com> Closes: https://lore.kernel.org/all/b31f7c6e-2807-4662-bcdd-eea2c1e132fa@fortanix.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260310234829.2608037-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: SEV: Protect *all* of sev_mem_enc_register_region() with kvm->lockSean Christopherson
commit b6408b6cec5df76a165575777800ef2aba12b109 upstream. Take and hold kvm->lock for before checking sev_guest() in sev_mem_enc_register_region(), as sev_guest() isn't stable unless kvm->lock is held (or KVM can guarantee KVM_SEV_INIT{2} has completed and can't rollack state). If KVM_SEV_INIT{2} fails, KVM can end up trying to add to a not-yet-initialized sev->regions_list, e.g. triggering a #GP Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 110 UID: 0 PID: 72717 Comm: syz.15.11462 Tainted: G U W O 6.16.0-smp-DEV #1 NONE Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 12.52.0-0 10/28/2024 RIP: 0010:sev_mem_enc_register_region+0x3f0/0x4f0 ../include/linux/list.h:83 Code: <41> 80 3c 04 00 74 08 4c 89 ff e8 f1 c7 a2 00 49 39 ed 0f 84 c6 00 RSP: 0018:ffff88838647fbb8 EFLAGS: 00010256 RAX: dffffc0000000000 RBX: 1ffff92015cf1e0b RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000001000 RDI: ffff888367870000 RBP: ffffc900ae78f050 R08: ffffea000d9e0007 R09: 1ffffd4001b3c000 R10: dffffc0000000000 R11: fffff94001b3c001 R12: 0000000000000000 R13: ffff8982ab0bde00 R14: ffffc900ae78f058 R15: 0000000000000000 FS: 00007f34e9dc66c0(0000) GS:ffff89ee64d33000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe180adef98 CR3: 000000047210e000 CR4: 0000000000350ef0 Call Trace: <TASK> kvm_arch_vm_ioctl+0xa72/0x1240 ../arch/x86/kvm/x86.c:7371 kvm_vm_ioctl+0x649/0x990 ../virt/kvm/kvm_main.c:5363 __se_sys_ioctl+0x101/0x170 ../fs/ioctl.c:51 do_syscall_x64 ../arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x6f/0x1f0 ../arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f34e9f7e9a9 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f34e9dc6038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f34ea1a6080 RCX: 00007f34e9f7e9a9 RDX: 0000200000000280 RSI: 000000008010aebb RDI: 0000000000000007 RBP: 00007f34ea000d69 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f34ea1a6080 R15: 00007ffce77197a8 </TASK> with a syzlang reproducer that looks like: syz_kvm_add_vcpu$x86(0x0, &(0x7f0000000040)={0x0, &(0x7f0000000180)=ANY=[], 0x70}) (async) syz_kvm_add_vcpu$x86(0x0, &(0x7f0000000080)={0x0, &(0x7f0000000180)=ANY=[@ANYBLOB="..."], 0x4f}) (async) r0 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000200), 0x0, 0x0) r1 = ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0) r2 = openat$kvm(0xffffffffffffff9c, &(0x7f0000000240), 0x0, 0x0) r3 = ioctl$KVM_CREATE_VM(r2, 0xae01, 0x0) ioctl$KVM_SET_CLOCK(r3, 0xc008aeba, &(0x7f0000000040)={0x1, 0x8, 0x0, 0x5625e9b0}) (async) ioctl$KVM_SET_PIT2(r3, 0x8010aebb, &(0x7f0000000280)={[...], 0x5}) (async) ioctl$KVM_SET_PIT2(r1, 0x4070aea0, 0x0) (async) r4 = ioctl$KVM_CREATE_VM(0xffffffffffffffff, 0xae01, 0x0) openat$kvm(0xffffffffffffff9c, 0x0, 0x0, 0x0) (async) ioctl$KVM_SET_USER_MEMORY_REGION(r4, 0x4020ae46, &(0x7f0000000400)={0x0, 0x0, 0x0, 0x2000, &(0x7f0000001000/0x2000)=nil}) (async) r5 = ioctl$KVM_CREATE_VCPU(r4, 0xae41, 0x2) close(r0) (async) openat$kvm(0xffffffffffffff9c, &(0x7f0000000000), 0x8000, 0x0) (async) ioctl$KVM_SET_GUEST_DEBUG(r5, 0x4048ae9b, &(0x7f0000000300)={0x4376ea830d46549b, 0x0, [0x46, 0x0, 0x0, 0x0, 0x0, 0x1000]}) (async) ioctl$KVM_RUN(r5, 0xae80, 0x0) Opportunistically use guard() to avoid having to define a new error label and goto usage. Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active") Cc: stable@vger.kernel.org Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://patch.msgid.link/20260310234829.2608037-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: SEV: Reject attempts to sync VMSA of an already-launched/encrypted vCPUSean Christopherson
commit 9b9f7962e3e879d12da2bf47e02a24ec51690e3d upstream. Reject synchronizing vCPU state to its associated VMSA if the vCPU has already been launched, i.e. if the VMSA has already been encrypted. On a host with SNP enabled, accessing guest-private memory generates an RMP #PF and panics the host. BUG: unable to handle page fault for address: ff1276cbfdf36000 #PF: supervisor write access in kernel mode #PF: error_code(0x80000003) - RMP violation PGD 5a31801067 P4D 5a31802067 PUD 40ccfb5063 PMD 40e5954063 PTE 80000040fdf36163 SEV-SNP: PFN 0x40fdf36, RMP entry: [0x6010fffffffff001 - 0x000000000000001f] Oops: Oops: 0003 [#1] SMP NOPTI CPU: 33 UID: 0 PID: 996180 Comm: qemu-system-x86 Tainted: G OE Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: Dell Inc. PowerEdge R7625/0H1TJT, BIOS 1.5.8 07/21/2023 RIP: 0010:sev_es_sync_vmsa+0x54/0x4c0 [kvm_amd] Call Trace: <TASK> snp_launch_update_vmsa+0x19d/0x290 [kvm_amd] snp_launch_finish+0xb6/0x380 [kvm_amd] sev_mem_enc_ioctl+0x14e/0x720 [kvm_amd] kvm_arch_vm_ioctl+0x837/0xcf0 [kvm] kvm_vm_ioctl+0x3fd/0xcc0 [kvm] __x64_sys_ioctl+0xa3/0x100 x64_sys_call+0xfe0/0x2350 do_syscall_64+0x81/0x10f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7ffff673287d </TASK> Note, the KVM flaw has been present since commit ad73109ae7ec ("KVM: SVM: Provide support to launch and run an SEV-ES guest"), but has only been actively dangerous for the host since SNP support was added. With SEV-ES, KVM would "just" clobber guest state, which is totally fine from a host kernel perspective since userspace can clobber guest state any time before sev_launch_update_vmsa(). Fixes: ad27ce155566 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command") Reported-by: Jethro Beekman <jethro@fortanix.com> Closes: https://lore.kernel.org/all/d98692e2-d96b-4c36-8089-4bc1e5cc3d57@fortanix.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260310234829.2608037-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22arm64: mm: Handle invalid large leaf mappings correctlyRyan Roberts
commit 15bfba1ad77fad8e45a37aae54b3c813b33fe27c upstream. It has been possible for a long time to mark ptes in the linear map as invalid. This is done for secretmem, kfence, realm dma memory un/share, and others, by simply clearing the PTE_VALID bit. But until commit a166563e7ec37 ("arm64: mm: support large block mapping when rodata=full") large leaf mappings were never made invalid in this way. It turns out various parts of the code base are not equipped to handle invalid large leaf mappings (in the way they are currently encoded) and I've observed a kernel panic while booting a realm guest on a BBML2_NOABORT system as a result: [ 15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers [ 15.476896] Unable to handle kernel paging request at virtual address ffff000019600000 [ 15.513762] Mem abort info: [ 15.527245] ESR = 0x0000000096000046 [ 15.548553] EC = 0x25: DABT (current EL), IL = 32 bits [ 15.572146] SET = 0, FnV = 0 [ 15.592141] EA = 0, S1PTW = 0 [ 15.612694] FSC = 0x06: level 2 translation fault [ 15.640644] Data abort info: [ 15.661983] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 [ 15.694875] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 15.723740] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000 [ 15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704 [ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP [ 15.886394] Modules linked in: [ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT [ 15.935258] Hardware name: linux,dummy-virt (DT) [ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c [ 16.006163] lr : swiotlb_bounce+0xf4/0x158 [ 16.024145] sp : ffff80008000b8f0 [ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000 [ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000 [ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0 [ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc [ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974 [ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018 [ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000 [ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000 [ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000 [ 16.349071] Call trace: [ 16.360143] __pi_memcpy_generic+0x128/0x22c (P) [ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4 [ 16.400282] swiotlb_map+0x5c/0x228 [ 16.415984] dma_map_phys+0x244/0x2b8 [ 16.432199] dma_map_page_attrs+0x44/0x58 [ 16.449782] virtqueue_map_page_attrs+0x38/0x44 [ 16.469596] virtqueue_map_single_attrs+0xc0/0x130 [ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc [ 16.510355] try_fill_recv+0x2a4/0x584 [ 16.526989] virtnet_open+0xd4/0x238 [ 16.542775] __dev_open+0x110/0x24c [ 16.558280] __dev_change_flags+0x194/0x20c [ 16.576879] netif_change_flags+0x24/0x6c [ 16.594489] dev_change_flags+0x48/0x7c [ 16.611462] ip_auto_config+0x258/0x1114 [ 16.628727] do_one_initcall+0x80/0x1c8 [ 16.645590] kernel_init_freeable+0x208/0x2f0 [ 16.664917] kernel_init+0x24/0x1e0 [ 16.680295] ret_from_fork+0x10/0x20 [ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c) [ 16.723106] ---[ end trace 0000000000000000 ]--- [ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000 [ 16.818966] PHYS_OFFSET: 0xfff1000080000000 [ 16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f [ 16.862904] Memory Limit: none [ 16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- This panic occurs because the swiotlb memory was previously shared to the host (__set_memory_enc_dec()), which involves transitioning the (large) leaf mappings to invalid, sharing to the host, then marking the mappings valid again. But pageattr_p[mu]d_entry() would only update the entry if it is a section mapping, since otherwise it concluded it must be a table entry so shouldn't be modified. But p[mu]d_sect() only returns true if the entry is valid. So the result was that the large leaf entry was made invalid in the first pass then ignored in the second pass. It remains invalid until the above code tries to access it and blows up. The simple fix would be to update pageattr_pmd_entry() to use !pmd_table() instead of pmd_sect(). That would solve this problem. But the ptdump code also suffers from a similar issue. It checks pmd_leaf() and doesn't call into the arch-specific note_page() machinery if it returns false. As a result of this, ptdump wasn't even able to show the invalid large leaf mappings; it looked like they were valid which made this super fun to debug. the ptdump code is core-mm and pmd_table() is arm64-specific so we can't use the same trick to solve that. But we already support the concept of "present-invalid" for user space entries. And even better, pmd_leaf() will return true for a leaf mapping that is marked present-invalid. So let's just use that encoding for present-invalid kernel mappings too. Then we can use pmd_leaf() where we previously used pmd_sect() and everything is magically fixed. Additionally, from inspection kernel_page_present() was broken in a similar way, so I'm also updating that to use pmd_leaf(). The transitional page tables component was also similarly broken; it creates a copy of the kernel page tables, making RO leaf mappings RW in the process. It also makes invalid (but-not-none) pte mappings valid. But it was not doing this for large leaf mappings. This could have resulted in crashes at kexec- or hibernate-time. This code is fixed to flip "present-invalid" mappings back to "present-valid" at all levels. Finally, I have hardened split_pmd()/split_pud() so that if it is passed a "present-invalid" leaf, it will maintain that property in the split leaves, since I wasn't able to convince myself that it would only ever be called for "present-valid" leaves. Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full") Cc: stable@vger.kernel.org Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22x86/CPU: Fix FPDSS on Zen1Borislav Petkov (AMD)
commit e55d98e7756135f32150b9b8f75d580d0d4b2dd3 upstream. Zen1's hardware divider can leave, under certain circumstances, partial results from previous operations. Those results can be leaked by another, attacker thread. Fix that with a chicken bit. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-12Merge tag 'ras-urgent-2026-04-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 MCE fix from Ingo Molnar: "Fix incorrect hardware errors reported on Zen3 CPUs, such as bogus L3 cache deferred errors (Yazen Ghannam)" * tag 'ras-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/amd: Filter bogus hardware errors on Zen3 clients
2026-04-12Merge tag 'perf-urgent-2026-04-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Four Intel uncore PMU driver fixes by Zide Chen" * tag 'perf-urgent-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Remove extra double quote mark perf/x86/intel/uncore: Fix die ID init and look up bugs perf/x86/intel/uncore: Skip discovery table for offline dies perf/x86/intel/uncore: Fix iounmap() leak on global_init failure
2026-04-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "s390: - vsie: Fix races with partial gmap invalidations x86: - Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: vsie: Fix races with partial gmap invalidations KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs
2026-04-11Merge tag 'kvm-s390-master-7.0-4' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: One very last second fix Fix one more gmap-rewrite issue: races with partial gmap invalidations.
2026-04-11Merge tag 'kvm-x86-fixes-7.1' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes for 7.1 Declare flexible arrays in uAPI structures using __DECLARE_FLEX_ARRAY() so that KVM's uAPI headers can be included in C++ projects.
2026-04-10Merge tag 'riscv-for-linus-v7.0-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley: "Before v7.0 is released, fix a few issues with the CFI patchset, merged earlier in v7.0-rc, that primarily affect interfaces to non-kernel code: - Improve the prctl() interface for per-task indirect branch landing pad control to expand abbreviations and to resemble the speculation control prctl() interface - Expand the "LP" and "SS" abbreviations in the ptrace uapi header file to "branch landing pad" and "shadow stack", to improve readability - Fix a typo in a CFI-related macro name in the ptrace uapi header file - Ensure that the indirect branch tracking state and shadow stack state are unlocked immediately after an exec() on the new task so that libc subsequently can control it - While working in this area, clean up the kernel-internal, cross-architecture prctl() function names by expanding the abbreviations mentioned above" * tag 'riscv-for-linus-v7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: prctl: cfi: change the branch landing pad prctl()s to be more descriptive riscv: ptrace: cfi: expand "SS" references to "shadow stack" in uapi headers prctl: rename branch landing pad implementation functions to be more explicit riscv: ptrace: expand "LP" references to "branch landing pads" in uapi headers riscv: cfi: clear CFI lock status in start_thread() riscv: ptrace: cfi: fix "PRACE" typo in uapi header
2026-04-08x86: shadow stacks: proper error handling for mmap lockLinus Torvalds
김영민 reports that shstk_pop_sigframe() doesn't check for errors from mmap_read_lock_killable(), which is a silly oversight, and also shows that we haven't marked those functions with "__must_check", which would have immediately caught it. So let's fix both issues. Reported-by: 김영민 <osori@hspace.io> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-04-07KVM: s390: vsie: Fix races with partial gmap invalidationsClaudio Imbrenda
Introduce a new boolean flag, used for shadow gmaps, to keep track of whether the gmap has been invalidated, either partially or totally. Use the new flag to check whether shadow gmap invalidations happened during shadowing. In such cases, abort whatever was going on, return -EAGAIN and let the caller try again. Fixes: 19d6c5b80443 ("KVM: s390: vsie: Fix unshadowing while shadowing") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260407161721.247044-1-imbrenda@linux.ibm.com>
2026-04-07perf/x86/intel/uncore: Remove extra double quote markZide Chen
The third argument in INTEL_UNCORE_FR_EVENT_DESC() is subject to __stringify(), and the extra double quote marks can result in the expansion "3.814697266e-6" in the sysfs knobs, instead of 3.814697266e-6. This is incorrect, though it may still work for perf, e.g. perf stat -e uncore_iio_free_running_0/bw_in_port0/ Fixes: d8987048f665 ("perf/x86/intel/uncore: Support IIO free-running counters on DMR") Closes: https://lore.kernel.org/all/20251231224233.113839-1-zide.chen@intel.com/ Reported-by: Chun-Tse Shao <ctshao@google.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chun-Tse Shao <ctshao@google.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://patch.msgid.link/20260313174050.171704-5-zide.chen@intel.com
2026-04-07perf/x86/intel/uncore: Fix die ID init and look up bugsZide Chen
In snbep_pci2phy_map_init(), in the nr_node_ids > 8 path, uncore_device_to_die() may return -1 when all CPUs associated with the UBOX device are offline. Remove the WARN_ON_ONCE(die_id == -1) check for two reasons: - The current code breaks out of the loop. This is incorrect because pci_get_device() does not guarantee iteration in domain or bus order, so additional UBOX devices may be skipped during the scan. - Returning -EINVAL is incorrect, since marking offline buses with die_id == -1 is expected and should not be treated as an error. Separately, when NUMA is disabled on a NUMA-capable platform, pcibus_to_node() returns NUMA_NO_NODE, causing uncore_device_to_die() to return -1 for all PCI devices. As a result, spr_update_device_location(), used on Intel SPR and EMR, ignores the corresponding PMON units and does not add them to the RB tree. Fix this by using uncore_pcibus_to_dieid(), which retrieves topology from the UBOX GIDNIDMAP register and works regardless of whether NUMA is enabled in Linux. This requires snbep_pci2phy_map_init() to be added in spr_uncore_pci_init(). Keep uncore_device_to_die() only for the nr_node_ids > 8 case, where NUMA is expected to be enabled. Fixes: 9a7832ce3d92 ("perf/x86/intel/uncore: With > 8 nodes, get pci bus die id from NUMA info") Fixes: 65248a9a9ee1 ("perf/x86/uncore: Add a quirk for UPI on SPR") Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Steve Wahl <steve.wahl@hpe.com> Link: https://patch.msgid.link/20260313174050.171704-4-zide.chen@intel.com
2026-04-07perf/x86/intel/uncore: Skip discovery table for offline diesZide Chen
This warning can be triggered if NUMA is disabled and the system boots with fewer CPUs than the number of CPUs in die 0. WARNING: CPU: 9 PID: 7257 at uncore.c:1157 uncore_pci_pmu_register+0x136/0x160 [intel_uncore] Currently, the discovery table continues to be parsed even if all CPUs in the associated die are offline. This can lead to an array overflow at "pmu->boxes[die] = box" in uncore_pci_pmu_register(), which may trigger the warning above or cause other issues. Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables") Reported-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Steve Wahl <steve.wahl@hpe.com> Link: https://patch.msgid.link/20260313174050.171704-3-zide.chen@intel.com
2026-04-07perf/x86/intel/uncore: Fix iounmap() leak on global_init failureZide Chen
Kernel test robot reported: Unverified Error/Warning (likely false positive, kindly check if interested): arch/x86/events/intel/uncore_discovery.c:293:2-8: ERROR: missing iounmap; ioremap on line 288 and execution via conditional on line 292 If domain->global_init() fails in __parse_discovery_table(), the ioremap'ed MMIO region is not released before returning, resulting in an MMIO mapping leak. Fixes: b575fc0e3357 ("perf/x86/intel/uncore: Add domain global init callback") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Zide Chen <zide.chen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://patch.msgid.link/20260313174050.171704-2-zide.chen@intel.com
2026-04-06Merge tag 'soc-fixes-7.0-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The largest part here are devicetree fixes for Qualcomm, and NXP i.MX, addressing a few regressions and incorrect settings in board and SoC pecific dts files. The largest single commits are a revert of a cleanup patch for i.MX that caused regressions for the NAND flash controller and a fixup for an incomplete cleanup of the PCIe controller on Qualcomm platforms that broke because the state was left incompatible with both the old and new behavior. On the Rockchips, Hisilicon, Renesas, Allwinner and AT91 platforms, only a single simple dts bugfix each was added since the last round of fixes. On the SoC specific device drivers, everything is relatively harmless: three reset controller driver fixes, a compatibility for fix ASpeed soc ID, and error handling fixes for Qualcomm and Microchip. One regression fix on Qualcomm addresses a problem with a previous fix for DisplayPort alt mode" * tag 'soc-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits) arm64: dts: qcom: hamoa: Fix incomplete Root Port property migration dt-bindings: display/msm: qcm2290-mdss: Fix missing ranges in example firmware: microchip: fail auto-update probe if no flash found arm64: dts: renesas: sparrow-hawk: Reserve first 128 MiB of DRAM arm64: dts: qcom: agatti: Fix IOMMU DT properties dt-bindings: media: venus: Fix iommus property dt-bindings: display: msm: qcm2290-mdss: Fix iommus property arm64: dts: allwinner: sun55i: Fix r-spi DMA reset: spacemit: k3: Decouple composite reset lines reset: gpio: fix double free in reset_add_gpio_aux_device() error path ARM: dts: microchip: sam9x7: fix gpio-lines count for pioB arm64: dts: hisilicon: hi3798cv200: Add missing dma-ranges arm64: dts: hisilicon: poplar: Correct PCIe reset GPIO polarity reset: rzg2l-usbphy-ctrl: Fix malformed MODULE_AUTHOR string soc: microchip: mpfs-mss-top-sysreg: Fix resource leak on driver unbind soc: microchip: mpfs-control-scb: Fix resource leak on driver unbind soc: qcom: pmic_glink_altmode: Fix TBT->SAFE->!TBT transition arm64: dts: qcom: monaco: Reserve full Gunyah metadata region arm64: dts: imx8mq-librem5: Bump BUCK1 suspend voltage up to 0.85V Revert "arm64: dts: imx8mq-librem5: Set the DVS voltages lower" ...
2026-04-05Merge tag 'riscv-for-linus-7.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: - Fix a CONFIG_SPARSEMEM crash on RV32 by avoiding early phys_to_page() - Prevent runtime const infrastructure from being used by modules, similar to what was done for x86 - Avoid problems when shutting down ACPI systems with IOMMUs by adding a device dependency between IOMMU and devices that use it - Fix a bug where the CPU pointer masking state isn't properly reset when tagged addresses aren't enabled for a task - Fix some incorrect register assignments, and add some missing ones, in kgdb support code - Fix compilation of non-kernel code that uses the ptrace uapi header by replacing BIT() with _BITUL() - Fix compilation of the validate_v_ptrace kselftest by working around kselftest macro expansion issues * tag 'riscv-for-linus-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: ACPI: RIMT: Add dependency between iommu and devices selftests: riscv: Add braces around EXPECT_EQ() riscv: use _BITUL macro rather than BIT() in ptrace uapi and kselftests riscv: Reset pmm when PR_TAGGED_ADDR_ENABLE is not set riscv: make runtime const not usable by modules riscv: patch: Avoid early phys_to_page() riscv: kgdb: fix several debug register assignment bugs
2026-04-05Merge tag 'x86-urgent-2026-04-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix kexec crash on KCOV-instrumented kernels (Aleksandr Nogikh) - Fix Geode platform driver on-stack property data use-after-return bug (Dmitry Torokhov) * tag 'x86-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/geode: Fix on-stack property data use-after-return bug x86/kexec: Disable KCOV instrumentation after load_segments()
2026-04-05Merge tag 'perf-urgent-2026-04-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: - Fix potential bad container_of() in intel_pmu_hw_config() (Ian Rogers) * tag 'perf-urgent-2026-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix potential bad container_of in intel_pmu_hw_config
2026-04-05Merge tag 'mips-fixes_7.0_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - Fix TLB uniquification for systems with TLB not initialised by firmware - Fix allocation in TLB uniquification - Fix SiByte cache initialisation - Check uart parameters from firmware on Loongson64 systems - Fix clock id mismatch for Ralink SoCs - Fix GCC version check for __mutli3 workaround * tag 'mips-fixes_7.0_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: mips: mm: Allocate tlb_vpn array atomically MIPS: mm: Rewrite TLB uniquification for the hidden bit feature MIPS: mm: Suppress TLB uniquification on EHINV hardware MIPS: Always record SEGBITS in cpu_data.vmbits MIPS: Fix the GCC version check for `__multi3' workaround MIPS: SiByte: Bring back cache initialisation mips: ralink: update CPU clock index MIPS: Loongson64: env: Check UARTs passed by LEFI cautiously
2026-04-05x86/mce/amd: Filter bogus hardware errors on Zen3 clientsYazen Ghannam
Users have been observing multiple L3 cache deferred errors after recent kernel rework of deferred error handling.¹ ⁴ The errors are bogus due to inconsistent status values. Also, user verified that bogus MCA_DESTAT values are present on the system even with an older kernel.² The errors seem to be garbage values present in the MCA_DESTAT of some L3 cache banks. These were implicitly ignored before the recent kernel rework because these do not generate a deferred error interrupt. A later revision of the rework patch was merged for v6.19. This naturally filtered out most of the bogus error logs. However, a few signatures still remain.³ Minimize the scope of the filter to the reported CPU family/model/stepping and only for errors which don't have the Enabled bit in the MCi status MSR. ¹ https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de ² https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de ³ https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.de ⁴ https://lore.kernel.org/r/CAKFB093B2k3sKsGJ_QNX1jVQsaXVFyy=wNwpzCGLOXa_vSDwXw@mail.gmail.com [ bp: Generalize the condition according to which errors are bogus. ] Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling") Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de Reported-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-By: Bert Karwatzki <spasswolf@web.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
2026-04-04prctl: cfi: change the branch landing pad prctl()s to be more descriptivePaul Walmsley
Per Linus' comments requesting the replacement of "INDIR_BR_LP" in the indirect branch tracking prctl()s with something more readable, and suggesting the use of the speculation control prctl()s as an exemplar, reimplement the prctl()s and related constants that control per-task forward-edge control flow integrity. This primarily involves two changes. First, the prctls are restructured to resemble the style of the speculative execution workaround control prctls PR_{GET,SET}_SPECULATION_CTRL, to make them easier to extend in the future. Second, the "indir_br_lp" abbrevation is expanded to "branch_landing_pads" to be less telegraphic. The kselftest and documentation is adjusted accordingly. Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/ Cc: Deepak Gupta <debug@rivosinc.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: ptrace: cfi: expand "SS" references to "shadow stack" in uapi headersPaul Walmsley
Similar to the recent change to expand "LP" to "branch landing pad", let's expand "SS" in the ptrace uapi macros to "shadow stack" as well. This aligns with the existing prctl() arguments, which use the expanded "shadow stack" names, rather than just the abbreviation. Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/ Cc: Deepak Gupta <debug@rivosinc.com> Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04prctl: rename branch landing pad implementation functions to be more explicitPaul Walmsley
Per Linus' comments about the unreadability of abbreviations such as "indir_br_lp", rename the three prctl() implementation functions to be more explicit. This involves renaming "indir_br_lp_status" in the function names to "branch_landing_pad_state". While here, add _prctl_ into the function names, following the speculation control prctl implementation functions. Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/ Cc: Deepak Gupta <debug@rivosinc.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: ptrace: expand "LP" references to "branch landing pads" in uapi headersPaul Walmsley
Per Linus' comments about the unreadability of abbreviations such as "LP", rename the RISC-V ptrace landing pad CFI macro names to be more explicit. This primarily involves expanding "LP" in the names to some variant of "branch landing pad." Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/ Cc: Deepak Gupta <debug@rivosinc.com> Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: cfi: clear CFI lock status in start_thread()Zong Li
When libc locks the CFI status through the following prctl: - PR_LOCK_SHADOW_STACK_STATUS - PR_LOCK_INDIR_BR_LP_STATUS A newly execd address space will inherit the lock status if it does not clear the lock bits. Since the lock bits remain set, libc will later fail to enable the landing pad and shadow stack. Signed-off-by: Zong Li <zong.li@sifive.com> Link: https://patch.msgid.link/20260323065640.4045713-1-zong.li@sifive.com [pjw@kernel.org: ensure we unlock before changing state; cleaned up subject line] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: ptrace: cfi: fix "PRACE" typo in uapi headerPaul Walmsley
A CFI-related macro defined in arch/riscv/uapi/asm/ptrace.h misspells "PTRACE" as "PRACE"; fix this. Fixes: 2af7c9cf021c ("riscv/ptrace: expose riscv CFI status and state via ptrace and in core files") Cc: Deepak Gupta <debug@rivosinc.com> Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: use _BITUL macro rather than BIT() in ptrace uapi and kselftestsPaul Walmsley
Fix the build of non-kernel code that includes the RISC-V ptrace uapi header, and the RISC-V validate_v_ptrace.c kselftest, by using the _BITUL() macro rather than BIT(). BIT() is not available outside the kernel. Based on patches and comments from Charlie Jenkins, Michael Neuling, and Andreas Schwab. Fixes: 30eb191c895b ("selftests: riscv: verify ptrace rejects invalid vector csr inputs") Fixes: 2af7c9cf021c ("riscv/ptrace: expose riscv CFI status and state via ptrace and in core files") Cc: Andreas Schwab <schwab@suse.de> Cc: Michael Neuling <mikey@neuling.org> Cc: Charlie Jenkins <thecharlesjenkins@gmail.com> Link: https://patch.msgid.link/20260330024248.449292-1-mikey@neuling.org Link: https://lore.kernel.org/linux-riscv/20260309-fix_selftests-v2-1-9d5a553a531e@gmail.com/ Link: https://lore.kernel.org/linux-riscv/20260309-fix_selftests-v2-3-9d5a553a531e@gmail.com/ Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: Reset pmm when PR_TAGGED_ADDR_ENABLE is not setZishun Yi
In set_tagged_addr_ctrl(), when PR_TAGGED_ADDR_ENABLE is not set, pmlen is correctly set to 0, but it forgets to reset pmm. This results in the CPU pmm state not corresponding to the software pmlen state. Fix this by resetting pmm along with pmlen. Fixes: 2e1743085887 ("riscv: Add support for the tagged address ABI") Signed-off-by: Zishun Yi <vulab@iscas.ac.cn> Reviewed-by: Samuel Holland <samuel.holland@sifive.com> Link: https://patch.msgid.link/20260322160022.21908-1-vulab@iscas.ac.cn Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: make runtime const not usable by modulesJisheng Zhang
Similar as commit 284922f4c563 ("x86: uaccess: don't use runtime-const rewriting in modules") does, make riscv's runtime const not usable by modules too, to "make sure this doesn't get forgotten the next time somebody wants to do runtime constant optimizations". The reason is well explained in the above commit: "The runtime-const infrastructure was never designed to handle the modular case, because the constant fixup is only done at boot time for core kernel code." Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Link: https://patch.msgid.link/20260221023731.3476-1-jszhang@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: patch: Avoid early phys_to_page()Vivian Wang
Similarly to commit 8d09e2d569f6 ("arm64: patching: avoid early page_to_phys()"), avoid using phys_to_page() for the kernel address case in patch_map(). Since this is called from apply_boot_alternatives() in setup_arch(), and commit 4267739cabb8 ("arch, mm: consolidate initialization of SPARSE memory model") has moved sparse_init() to after setup_arch(), phys_to_page() is not available there yet, and it panics on boot with SPARSEMEM on RV32, which does not use SPARSEMEM_VMEMMAP. Reported-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Closes: https://lore.kernel.org/r/20260223144108-dcace0b9-02e8-4b67-a7ce-f263bed36f26@linutronix.de/ Fixes: 4267739cabb8 ("arch, mm: consolidate initialization of SPARSE memory model") Suggested-by: Mike Rapoport <rppt@kernel.org> Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260310-riscv-sparsemem-alternatives-fix-v1-1-659d5dd257e2@iscas.ac.cn [pjw@kernel.org: fix the subject line to align with the patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04riscv: kgdb: fix several debug register assignment bugsPaul Walmsley
Fix several bugs in the RISC-V kgdb implementation: - The element of dbg_reg_def[] that is supposed to pertain to the S1 register embeds instead the struct pt_regs offset of the A1 register. Fix this to use the S1 register offset in struct pt_regs. - The sleeping_thread_to_gdb_regs() function copies the value of the S10 register into the gdb_regs[] array element meant for the S9 register, and copies the value of the S11 register into the array element meant for the S10 register. It also neglects to copy the value of the S11 register. Fix all of these issues. Fixes: fe89bd2be8667 ("riscv: Add KGDB support") Cc: Vincent Chen <vincent.chen@sifive.com> Link: https://patch.msgid.link/fde376f8-bcfd-bfe4-e467-07d8f7608d05@kernel.org Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-04-04Merge tag 'at91-fixes-7.0' of ↵Krzysztof Kozlowski
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/at91/linux into arm/fixes Microchip AT91 fixes for v7.0 This update includes: - fix gpio-lines for SAM9X7 PIOB GPIO controller * tag 'at91-fixes-7.0' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/at91/linux: ARM: dts: microchip: sam9x7: fix gpio-lines count for pioB Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-04-03Merge tag 'powerpc-7.0-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Madhavan Srinivasan: - fix iommu incorrectly bypassing DMA APIs Thanks to Dan Horak, Gaurav Batra, and Ritesh Harjani (IBM). * tag 'powerpc-7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv/iommu: iommu incorrectly bypass DMA APIs
2026-04-03Merge tag 's390-7.0-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Fix a memory leak in the zcrypt driver where the AP message buffer for clear key RSA requests was allocated twice, once by the caller and again locally, causing the first allocation to never be freed - Fix the cpum_sf perf sampling rate overflow adjustment to clamp the recalculated rate to the hardware maximum, preventing exceptions on heavily loaded systems running with HZ=1000 * tag 's390-7.0-7' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix memory leak with CCA cards used as accelerator s390/cpum_sf: Cap sampling rate to prevent lsctl exception
2026-04-03Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: - Implement a basic static call trampoline to fix CFI failures with the generic implementation * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Use static call trampolines when kCFI is enabled
2026-04-02perf/x86: Fix potential bad container_of in intel_pmu_hw_configIan Rogers
Auto counter reload may have a group of events with software events present within it. The software event PMU isn't the x86_hybrid_pmu and a container_of operation in intel_pmu_set_acr_caused_constr (via the hybrid helper) could cause out of bound memory reads. Avoid this by guarding the call to intel_pmu_set_acr_caused_constr with an is_x86_event check. Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Falcon <thomas.falcon@intel.com> Link: https://patch.msgid.link/20260312194305.1834035-1-irogers@google.com
2026-04-01Merge tag 'qcom-arm64-fixes-for-7.0-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes More Qualcomm Arm64 DeviceTree fixes for v7.0 The shuffling of reset and wake GPIO properties across various Hamoa devices left things in an incomplete state, fix this. Add the missing "ranges" property to the QCM2290 MDSS DeviceTree binding example, to fix the validation warning that was introduced by the previous fix. * tag 'qcom-arm64-fixes-for-7.0-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: hamoa: Fix incomplete Root Port property migration dt-bindings: display/msm: qcm2290-mdss: Fix missing ranges in example arm64: dts: qcom: agatti: Fix IOMMU DT properties dt-bindings: media: venus: Fix iommus property dt-bindings: display: msm: qcm2290-mdss: Fix iommus property arm64: dts: qcom: monaco: Reserve full Gunyah metadata region arm64: dts: qcom: monaco: Fix UART10 pinconf arm64: dts: qcom: qcm6490-idp: Fix WCD9370 reset GPIO polarity arm64: dts: qcom: hamoa/x1: fix idle exit latency Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-04-01Merge tag 'sunxi-fixes-for-7.0' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes Allwinner fixes for 7.0 Just one fix to make the r-spi SPI controller use the mcu-dma DMA controller for DMA instead of the main DMA controller. * tag 'sunxi-fixes-for-7.0' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: arm64: dts: allwinner: sun55i: Fix r-spi DMA Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-04-01Merge tag 'renesas-fixes-for-v7.0-tag2' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into arm/fixes Renesas fixes for v7.0 (take two) - Fix TFA BL31 memory corruption on Sparrow Hawk. * tag 'renesas-fixes-for-v7.0-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel: arm64: dts: renesas: sparrow-hawk: Reserve first 128 MiB of DRAM Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-04-01mips: mm: Allocate tlb_vpn array atomicallyStefan Wiehler
Found by DEBUG_ATOMIC_SLEEP: BUG: sleeping function called from invalid context at /include/linux/sched/mm.h:306 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 no locks held by swapper/1/0. irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff801477fc>] copy_process+0x75c/0x1b68 softirqs last enabled at (0): [<ffffffff801477fc>] copy_process+0x75c/0x1b68 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.119-d79e757675ec-fct #1 Stack : 800000000290bad8 0000000000000000 0000000000000008 800000000290bae8 800000000290bae8 800000000290bc78 0000000000000000 0000000000000000 ffffffff80c80000 0000000000000001 ffffffff80d8dee8 ffffffff810d09c0 784bb2a7ec10647d 0000000000000010 ffffffff80a6fd60 8000000001d8a9c0 0000000000000000 0000000000000000 ffffffff80d90000 0000000000000000 ffffffff80c9e0e8 0000000007ffffff 0000000000000cc0 0000000000000400 ffffffffffffffff 0000000000000001 0000000000000002 ffffffffc0149ed8 fffffffffffffffe 8000000002908000 800000000290bae0 ffffffff80a81b74 ffffffff80129fb0 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffffffff80129fd0 0000000000000000 ... Call Trace: [<ffffffff80129fd0>] show_stack+0x60/0x158 [<ffffffff80a7f894>] dump_stack_lvl+0x88/0xbc [<ffffffff8018d3c8>] __might_resched+0x268/0x288 [<ffffffff803648b0>] __kmem_cache_alloc_node+0x2e0/0x330 [<ffffffff80302788>] __kmalloc+0x58/0xd0 [<ffffffff80a81b74>] r4k_tlb_uniquify+0x7c/0x428 [<ffffffff80143e8c>] tlb_init+0x7c/0x110 [<ffffffff8012bdb4>] per_cpu_trap_init+0x16c/0x1d0 [<ffffffff80133258>] start_secondary+0x28/0x128 Fixes: 231ac951faba ("MIPS: mm: kmalloc tlb_vpn array to avoid stack overflow") Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>