summaryrefslogtreecommitdiff
path: root/tools/testing/selftests/kvm
AgeCommit message (Collapse)Author
2025-11-20KVM: selftests: Change VM_MODE_PXXV48_4K to VM_MODE_PXXVYY_4KJim Mattson
Use 57-bit addresses with 5-level paging on hardware that supports LA57. Continue to use 48-bit addresses with 4-level paging on hardware that doesn't support LA57. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Link: https://patch.msgid.link/20251028225827.2269128-4-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Use a loop to walk guest page tablesJim Mattson
Walk the guest page tables via a loop when searching for a PTE, instead of using unique variables for each level of the page tables. This simplifies the code and makes it easier to support 5-level paging in the future. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251028225827.2269128-3-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Use a loop to create guest page tablesJim Mattson
Walk the guest page tables via a loop when creating new mappings, instead of using unique variables for each level of the page tables. This simplifies the code and makes it easier to support 5-level paging in the future. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251028225827.2269128-2-jmattson@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Remove the unused argument to prepare_eptp()Yosry Ahmed
eptp_memslot is unused, remove it. No functional change intended. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251021074736.1324328-10-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Stop hardcoding PAGE_SIZE in x86 selftestsYosry Ahmed
Use PAGE_SIZE instead of 4096. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251021074736.1324328-9-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Extend vmx_tsc_adjust_test to cover SVMYosry Ahmed
Add SVM L1 code to run the nested guest, and allow the test to run with SVM as well as VMX. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251021074736.1324328-8-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Extend nested_invalid_cr3_test to cover SVMYosry Ahmed
Add SVM L1 code to run the nested guest, and allow the test to run with SVM as well as VMX. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251021074736.1324328-7-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Move nested invalid CR3 check to its own testYosry Ahmed
vmx_tsc_adjust_test currently verifies that a nested VMLAUNCH fails with an invalid CR3. This is irrelevant to TSC scaling, move it to a standalone test. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251021074736.1324328-6-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Extend vmx_nested_tsc_scaling_test to cover SVMYosry Ahmed
Add SVM L1 code to run the nested guest, and allow the test to run with SVM as well as VMX. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251021074736.1324328-5-yosry.ahmed@linux.dev Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-20KVM: selftests: Extend vmx_close_while_nested_test to cover SVMYosry Ahmed
Add SVM L1 code to run the nested guest, and allow the test to run with SVM as well as VMX. Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251021074736.1324328-4-yosry.ahmed@linux.dev [sean: rename to "nested_close_kvm_test" to provide nested_* sorting] Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-19KVM: selftests: SYNC after guest ITS setup in vgic_lpi_stressMaximilian Dittgen
vgic_lpi_stress sends MAPTI and MAPC commands during guest GIC setup to map interrupt events to ITT entries and collection IDs to redistributors, respectively. We have no guarantee that the ITS will finish handling these mapping commands before the selftest calls KVM_SIGNAL_MSI to inject LPIs to the guest. If LPIs are injected before ITS mapping completes, the ITS cannot properly pass the interrupt on to the redistributor. Fix by adding a SYNC command to the selftests ITS library, then calling SYNC after ITS mapping to ensure mapping completes before signal_lpi() writes to GITS_TRANSLATER. Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de> Link: https://msgid.link/20251119135744.68552-2-mdittgen@amazon.de Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-19KVM: selftests: Assert GICR_TYPER.Processor_Number matches selftest CPU numberMaximilian Dittgen
The selftests GIC library and tests assume that the GICR_TYPER.Processor_number associated with a given CPU is the same as the CPU's selftest index. Since this assumption is not guaranteed by specification, add an assert in gicv3_cpu_init() that validates this is true. Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de> Link: https://msgid.link/20251119135744.68552-1-mdittgen@amazon.de Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-12KVM: selftests: Test for KVM_EXIT_ARM_SEAJiaqi Yan
Test how KVM handles guest SEA when APEI is unable to claim it, and KVM_CAP_ARM_SEA_TO_USER is enabled. The behavior is triggered by consuming recoverable memory error (UER) injected via EINJ. The test asserts two major things: 1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and has provided expected fault information, e.g. esr, flags, gva, gpa. 2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to guest and KVM injects expected SEA into the VCPU. Tested on a data center server running Siryn AmpereOne processor that has RAS support. Several things to notice before attempting to run this selftest: - The test relies on EINJ support in both firmware and kernel to inject UER. Otherwise the test will be skipped. - The under-test platform's APEI should be unable to claim the SEA. Otherwise the test will be skipped. - Some platform doesn't support notrigger in EINJ, which may cause APEI and GHES to offline the memory before guest can consume injected UER, and making test unable to trigger SEA. Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> Link: https://msgid.link/20251013185903.1372553-3-jiaqiyan@google.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-09Merge tag 'kvmarm-fixes-6.18-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm654 fixes for 6.18, take #2 * Core fixes - Fix trapping regression when no in-kernel irqchip is present (20251021094358.1963807-1-sascha.bischoff@arm.com) - Check host-provided, untrusted ranges and offsets in pKVM (20251016164541.3771235-1-vdonnefort@google.com) (20251017075710.2605118-1-sebastianene@google.com) - Fix regression restoring the ID_PFR1_EL1 register (20251030122707.2033690-1-maz@kernel.org - Fix vgic ITS locking issues when LPIs are not directly injected (20251107184847.1784820-1-oupton@kernel.org) * Test fixes - Correct target CPU programming in vgic_lpi_stress selftest (20251020145946.48288-1-mdittgen@amazon.de) - Fix exposure of SCTLR2_EL2 and ZCR_EL2 in get-reg-list selftest (20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org) (20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org) * Misc - Update Oliver's email address (20251107012830.1708225-1-oupton@kernel.org)
2025-11-03KVM: selftests: Rename "guest_paddr" variables to "gpa"Sean Christopherson
Rename "guest_paddr" variables in vm_userspace_mem_region_add() and vm_mem_add() to KVM's de facto standard "gpa", both for consistency and to shorten line lengths. Opportunistically fix the indentation of the vm_userspace_mem_region_add() declaration. Link: https://patch.msgid.link/20251007223625.369939-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-30KVM: arm64: selftests: Filter ZCR_EL2 in get-reg-listMark Brown
get-reg-list includes ZCR_EL2 in the list of EL2 registers that it looks for when NV is enabled but does not have any feature gate for this register, meaning that testing any combination of features that includes EL2 but does not include SVE will result in a test failure due to a missing register being reported: | The following lines are missing registers: | | ARM64_SYS_REG(3, 4, 1, 2, 0), Add ZCR_EL2 to feat_id_regs so that the test knows not to expect to see it without SVE being enabled. Fixes: 3a90b6f27964 ("KVM: arm64: selftests: get-reg-list: Add base EL2 registers") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20251024-kvm-arm64-get-reg-list-zcr-el2-v1-1-0cd0ff75e22f@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30KVM: arm64: selftests: Add SCTLR2_EL2 to get-reg-listMark Brown
We recently added support for SCTLR2_EL2 to the kernel but did not add it to get-reg-list, resulting in it reporting the missing register when it is available. Add it. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20251023-b4-kvm-arm64-get-reg-list-sctlr-el2-v1-1-088f88ff992a@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-30KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stressMaximilian Dittgen
Since GITS_TYPER.PTA == 0, the ITS MAPC command demands a CPU ID, rather than a physical redistributor address, for its RDbase command argument. As such, when MAPC-ing guest ITS collections, vgic_lpi_stress iterates over CPU IDs in the range [0, nr_cpus), passing them as the RDbase vcpu_id argument to its_send_mapc_cmd(). However, its_encode_target() in the its_send_mapc_cmd() selftest handler expects RDbase arguments to be formatted with a 16 bit offset, as shown by the 16-bit target_addr right shift its implementation: its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16) At the moment, all CPU IDs passed into its_send_mapc_cmd() have no offset, therefore becoming 0x0 after the bit shift. Thus, when vgic_its_cmd_handle_mapc() receives the ITS command in vgic-its.c, it always interprets the RDbase target CPU as CPU 0. All interrupts sent to collections will be processed by vCPU 0, which defeats the purpose of this multi-vCPU test. Fix by creating procnum_to_rdbase() helper function, which left-shifts the vCPU parameter received by its_send_mapc_cmd 16 bits before passing it to its_encode_target for encoding. Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de> Link: https://patch.msgid.link/20251020145946.48288-1-mdittgen@amazon.de Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-20KVM: selftests: Use "gpa" and "gva" for local variable names in pre-fault testSean Christopherson
Rename guest_test_{phys,virt}_mem to g{p,v}a in the pre-fault memory test to shorten line lengths and to use standard terminology. Opportunsitically use "base_gva" in the guest code instead of "base_gpa" to match the host side code, which now passes in "gva" (and because referencing the virtual address avoids having to know that the data is identity mapped). No functional change intended. Cc: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/r/20251007224515.374516-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Forcefully override ARCH from x86_64 to x86Sean Christopherson
Forcefully override ARCH from x86_64 to x86 to handle the scenario where the user specifies ARCH=x86_64 on the command line. Fixes: 9af04539d474 ("KVM: selftests: Override ARCH for x86_64 instead of using ARCH_DIR") Cc: stable@vger.kernel.org Reported-by: David Matlack <dmatlack@google.com> Closes: https://lore.kernel.org/all/20250724213130.3374922-1-dmatlack@google.com Link: https://lore.kernel.org/r/20251007223057.368082-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Don't fall over in mmu_stress_test when only one CPU is presentBrendan Jackman
Running mmu_stress_test on a system with only one CPU is not a recipe for success. However, there's no clear-cut reason why it absolutely shouldn't work, so the test shouldn't completely reject such a platform. At present, the *3/4 calculation will return zero on these platforms and the test fails. So, instead just skip that calculation. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/r/20251007-b4-kvm-mmu-stresstest-1proc-v1-1-8c95aa0e30b6@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Add guest_memfd tests for mmap and NUMA policy supportShivank Garg
Add tests for NUMA memory policy binding and NUMA aware allocation in guest_memfd. This extends the existing selftests by adding proper validation for: - KVM GMEM set_policy and get_policy() vm_ops functionality using mbind() and get_mempolicy() - NUMA policy application before and after memory allocation Run the NUMA mbind() test with and without INIT_SHARED, as KVM should allow doing mbind(), madvise(), etc. on guest-private memory, e.g. so that userspace can set NUMA policy for CoCo VMs. Run the NUMA allocation test only for INIT_SHARED, i.e. if the host can't fault-in memory (via direct access, madvise(), etc.) as move_pages() returns -ENOENT if the page hasn't been faulted in (walks the host page tables to find the associated folio) [sean: don't skip entire test when running on non-NUMA system, test mbind() with private memory, provide more info in assert messages] Signed-off-by: Shivank Garg <shivankg@amd.com> Tested-by: Ashish Kalra <ashish.kalra@amd.com> Link: https://lore.kernel.org/r/20251016172853.52451-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Add helpers to probe for NUMA support, and multi-node systemsShivank Garg
Add NUMA helpers to probe for support/availability and to check if the test is running on a multi-node system. The APIs will be used to verify guest_memfd NUMA support. Signed-off-by: Shivank Garg <shivankg@amd.com> [sean: land helpers in numaif.h, add comments, tweak names] Link: https://lore.kernel.org/r/20251016172853.52451-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Use proper uAPI headers to pick up mempolicy.h definitionsSean Christopherson
Drop the KVM's re-definitions of MPOL_xxx flags in numaif.h as they are defined by the already-included, kernel-provided mempolicy.h. The only reason the duplicate definitions don't cause compiler warnings is because they are identical, but only on x86-64! The syscall numbers in particular are subtly x86_64-specific, i.e. will cause problems if/when numaif.h is used outsize of x86. Opportunistically clean up the file comment as the license information is covered by the SPDX header, the path is superfluous, and as above the comment about the contents is flat out wrong. Fixes: 346b59f220a2 ("KVM: selftests: Add missing header file needed by xAPIC IPI tests") Reviewed-by: Shivank Garg <shivankg@amd.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lore.kernel.org/r/20251016172853.52451-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Add additional equivalents to libnuma APIs in KVM's numaif.hSean Christopherson
Add APIs for all syscalls defined in the kernel's mm/mempolicy.c to match those that would be provided by linking to libnuma. Opportunistically use the recently inroduced KVM_SYSCALL_DEFINE() builders to take care of the boilerplate, and to fix a flaw where the two existing wrappers would generate multiple symbols if numaif.h were to be included multiple times. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lore.kernel.org/r/20251016172853.52451-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Report stacktraces SIGBUS, SIGSEGV, SIGILL, and SIGFPE by ↵Sean Christopherson
default Register handlers for signals for all selftests that are likely happen due to test (or kernel) bugs, and explicitly fail tests on unexpected signals so that users get a stack trace, i.e. don't have to go spelunking to do basic triage. Register the handlers as early as possible, to catch as many unexpected signals as possible, and also so that the common code doesn't clobber a handler that's installed by test (or arch) code. Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lore.kernel.org/r/20251016172853.52451-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-20KVM: selftests: Define wrappers for common syscalls to assert successSean Christopherson
Add kvm_<sycall> wrappers for munmap(), close(), fallocate(), and ftruncate() to cut down on boilerplate code when a sycall is expected to succeed, and to make it easier for developers to remember to assert success. Implement and use a macro framework similar to the kernel's SYSCALL_DEFINE infrastructure to further cut down on boilerplate code, and to drastically reduce the probability of typos as the kernel's syscall definitions can be copy+paste almost verbatim. Provide macros to build the raw <sycall>() wrappers as well, e.g. to replace hand-coded wrappers (NUMA) or pure open-coded calls. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Shivank Garg <shivankg@amd.com> Tested-by: Shivank Garg <shivankg@amd.com> Link: https://lore.kernel.org/r/20251016172853.52451-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-18Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes for 6.18: - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace deletes/moves memslot during prefault") - Don't try to get PMU capabbilities from perf when running a CPU with hybrid CPUs/PMUs, as perf will rightly WARN. - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more generic KVM_CAP_GUEST_MEMFD_FLAGS - Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set said flag to initialize memory as SHARED, irrespective of MMAP. The behavior merged in 6.18 is that enabling mmap() implicitly initializes memory as SHARED, which would result in an ABI collision for x86 CoCo VMs as their memory is currently always initialized PRIVATE. - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private memory, to enable testing such setups, i.e. to hopefully flush out any other lurking ABI issues before 6.18 is officially released. - Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP, and host userspace accesses to mmap()'d private memory.
2025-10-13KVM: arm64: selftests: Fix misleading comment about virtual timer encodingMarc Zyngier
The userspace-visible encoding for CNTV_CVAL_EL0 and CNTVCNT_EL0 have been swapped for as long as usersapce has had access to the registers. This is documented in arch/arm64/include/uapi/asm/kvm.h. Despite that, the get_reg_list test has unhelpful comments indicating the wrong register for the encoding. Replace this with definitions exposed in the include file, and a comment explaining again the brokenness. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_listMarc Zyngier
Add yet another configuration, this time dealing E2H=0. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: selftests: Make dependencies on VHE-specific registers explicitMarc Zyngier
The hyp virtual timer registers only exist when VHE is present, Similarly, VNCR_EL2 only exists when NV2 is present. Make these dependencies explicit. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stressOliver Upton
vgic_lpi_stress rather hilariously leaves IRQs disabled for the duration of the test. While the ITS translation of MSIs happens regardless of this, for completeness the guest should actually handle the LPIs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: selftests: Allocate vcpus with correct sizeZenghui Yu
vcpus array contains pointers to struct kvm_vcpu {}. It is way overkill to allocate the array with (nr_cpus * sizeof(struct kvm_vcpu)). Fix the allocation by using the correct size. Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: selftests: Sync ID_AA64PFR1, MPIDR, CLIDR in guestZenghui Yu
We forgot to sync several registers (ID_AA64PFR1, MPIDR, CLIDR) in guest to make sure that the guest had seen the written value. Add them to the list. Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev> Reviewed-By: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: selftests: Fix irqfd_test for non-x86 architecturesOliver Upton
The KVM_IRQFD ioctl fails if no irqchip is present in-kernel, which isn't too surprising as there's not much KVM can do for an IRQ if it cannot resolve a destination. As written the irqfd_test assumes that a 'default' VM created in selftests has an in-kernel irqchip created implicitly. That may be the case on x86 but it isn't necessarily true on other architectures. Add an arch predicate indicating if 'default' VMs get an irqchip and make the irqfd_test depend on it. Work around arm64 VGIC initialization requirements by using vm_create_with_one_vcpu(), ignoring the created vCPU as it isn't used for the test. Reported-by: Sebastian Ott <sebott@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Sean Christopherson <seanjc@google.com> Fixes: 7e9b231c402a ("KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: selftests: Track width of timer counter as "int", not "uint64_t"Sean Christopherson
Store the width of arm64's timer counter as an "int", not a "uint64_t". ilog2() returns an "int", and more importantly using what is an "unsigned long" under the hood makes clang unhappy due to a type mismatch when clamping the width to a sane value. arm64/arch_timer_edge_cases.c:1032:10: error: comparison of distinct pointer types ('typeof (width) *' (aka 'unsigned long *') and 'typeof (56) *' (aka 'int *')) [-Werror,-Wcompare-distinct-pointer-types] 1032 | width = clamp(width, 56, 64); | ^~~~~~~~~~~~~~~~~~~~ tools/include/linux/kernel.h:47:45: note: expanded from macro 'clamp' 47 | #define clamp(val, lo, hi) min((typeof(val))max(val, lo), hi) | ^~~~~~~~~~~~ tools/include/linux/kernel.h:33:17: note: expanded from macro 'max' 33 | (void) (&_max1 == &_max2); \ | ~~~~~~ ^ ~~~~~~ tools/include/linux/kernel.h:39:9: note: expanded from macro 'min' 39 | typeof(x) _min1 = (x); \ | ^ Fixes: fad4cf944839 ("KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases") Cc: Sebastian Ott <sebott@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-13KVM: arm64: selftests: Test effective value of HCR_EL2.AMOOliver Upton
A defect against the architecture now allows an implementation to treat AMO as 1 when HCR_EL2.{E2H, TGE} = {1, 0}. KVM now takes advantage of this interpretation to address a quality of emulation issue w.r.t. SError injection. Add a corresponding test case and expect a pending SError to be taken. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-10-10KVM: selftests: Verify that reads to inaccessible guest_memfd VMAs SIGBUSSean Christopherson
Expand the guest_memfd negative testcases for overflow and MAP_PRIVATE to verify that reads to inaccessible memory also get a SIGBUS. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Lisa Wang <wyihan@google.com> Tested-by: Lisa Wang <wyihan@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: selftests: Verify that faulting in private guest_memfd memory failsSean Christopherson
Add a guest_memfd testcase to verify that faulting in private memory gets a SIGBUS. For now, test only the case where memory is private by default since KVM doesn't yet support in-place conversion. Deliberately run the CoW test with and without INIT_SHARED set as KVM should disallow MAP_PRIVATE regardless of whether the memory itself is private from a CoCo perspective. Cc: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: selftests: Add wrapper macro to handle and assert on expected SIGBUSSean Christopherson
Extract the guest_memfd test's SIGBUS handling functionality into a common TEST_EXPECT_SIGBUS() macro in anticipation of adding more SIGBUS testcases. Eating a SIGBUS isn't terrible difficult, but it requires a non-trivial amount of boilerplate code, and using a macro allows selftests to print out the exact action that failed to generate a SIGBUS without the developer needing to remember to add a useful error message. Explicitly mark the SIGBUS handler as "used", as gcc-14 at least likes to discard the function before linking. Opportunistically use TEST_FAIL(...) instead of TEST_ASSERT(false, ...), and fix the write path of the guest_memfd test to use the local "val" instead of hardcoding the literal value a second time. Suggested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Lisa Wang <wyihan@google.com> Tested-by: Lisa Wang <wyihan@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: selftests: Isolate the guest_memfd Copy-on-Write negative testcaseSean Christopherson
Move the guest_memfd Copy-on-Write (CoW) testcase to its own function to better separate positive testcases from negative testcases. No functional change intended. Suggested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: selftests: Add wrappers for mmap() and munmap() to assert successSean Christopherson
Add and use wrappers for mmap() and munmap() that assert success to reduce a significant amount of boilerplate code, to ensure all tests assert on failure, and to provide consistent error messages on failure. No functional change intended. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: selftests: Add test coverage for guest_memfd without GUEST_MEMFD_FLAG_MMAPAckerley Tng
If a VM type supports KVM_CAP_GUEST_MEMFD_MMAP, the guest_memfd test will run all test cases with GUEST_MEMFD_FLAG_MMAP set. This leaves the code path for creating a non-mmap()-able guest_memfd on a VM that supports mappable guest memfds untested. Refactor the test to run the main test suite with a given set of flags. Then, for VM types that support the mappable capability, invoke the test suite twice: once with no flags, and once with GUEST_MEMFD_FLAG_MMAP set. This ensures both creation paths are properly exercised on capable VMs. Run test_guest_memfd_flags() only once per VM type since it depends only on the set of valid/supported flags, i.e. iterating over an arbitrary set of flags is both unnecessary and wrong. Signed-off-by: Ackerley Tng <ackerleytng@google.com> [sean: use double-underscores for the inner helper] Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: selftests: Create a new guest_memfd for each testcaseSean Christopherson
Refactor the guest_memfd selftest to improve test isolation by creating a a new guest_memfd for each testcase. Currently, the test reuses a single guest_memfd instance for all testcases, and thus creates dependencies between tests, e.g. not truncating folios from the guest_memfd instance at the end of a test could lead to unexpected results (see the PUNCH_HOLE purging that needs to done by in-flight the NUMA testcases[1]). Invoke each test via a macro wrapper to create and close a guest_memfd to cut down on the boilerplate copy+paste needed to create a test. Link: https://lore.kernel.org/all/20250827175247.83322-10-shivankg@amd.com Reported-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: selftests: Stash the host page size in a global in the guest_memfd testSean Christopherson
Use a global variable to track the host page size in the guest_memfd test so that the information doesn't need to be constantly passed around. The state is purely a reflection of the underlying system, i.e. can't be set by the test and is constant for a given invocation of the test, and thus explicitly passing the host page size to individual testcases adds no value, e.g. doesn't allow testing different combinations. Making page_size a global will simplify an upcoming change to create a new guest_memfd instance per testcase. No functional change intended. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20251003232606.4070510-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: guest_memfd: Add INIT_SHARED flag, reject user page faults if not setSean Christopherson
Add a guest_memfd flag to allow userspace to state that the underlying memory should be configured to be initialized as shared, and reject user page faults if the guest_memfd instance's memory isn't shared. Because KVM doesn't yet support in-place private<=>shared conversions, all guest_memfd memory effectively follows the initial state. Alternatively, KVM could deduce the initial state based on MMAP, which for all intents and purposes is what KVM currently does. However, implicitly deriving the default state based on MMAP will result in a messy ABI when support for in-place conversions is added. For x86 CoCo VMs, which don't yet support MMAP, memory is currently private by default (otherwise the memory would be unusable). If MMAP implies memory is shared by default, then the default state for CoCo VMs will vary based on MMAP, and from userspace's perspective, will change when in-place conversion support is added. I.e. to maintain guest<=>host ABI, userspace would need to immediately convert all memory from shared=>private, which is both ugly and inefficient. The inefficiency could be avoided by adding a flag to state that memory is _private_ by default, irrespective of MMAP, but that would lead to an equally messy and hard to document ABI. Bite the bullet and immediately add a flag to control the default state so that the effective behavior is explicit and straightforward. Fixes: 3d3a04fad25a ("KVM: Allow and advertise support for host mmap() on guest_memfd files") Cc: David Hildenbrand <david@redhat.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-10KVM: Rework KVM_CAP_GUEST_MEMFD_MMAP into KVM_CAP_GUEST_MEMFD_FLAGSSean Christopherson
Rework the not-yet-released KVM_CAP_GUEST_MEMFD_MMAP into a more generic KVM_CAP_GUEST_MEMFD_FLAGS capability so that adding new flags doesn't require a new capability, and so that developers aren't tempted to bundle multiple flags into a single capability. Note, kvm_vm_ioctl_check_extension_generic() can only return a 32-bit value, but that limitation can be easily circumvented by adding e.g. KVM_CAP_GUEST_MEMFD_FLAGS2 in the unlikely event guest_memfd supports more than 32 flags. Reviewed-by: Ackerley Tng <ackerleytng@google.com> Tested-by: Ackerley Tng <ackerleytng@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20251003232606.4070510-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-10-07KVM: selftests: Test prefault memory during concurrent memslot removalYan Zhao
Expand the prefault memory selftest to add a regression test for a KVM bug where KVM's retry logic would result in (breakable) deadlock due to the memslot deletion waiting on prefaulting to release SRCU, and prefaulting waiting on the memslot to fully disappear (KVM uses a two-step process to delete memslots, and KVM x86 retries page faults if a to-be-deleted, a.k.a. INVALID, memslot is encountered). To exercise concurrent memslot remove, spawn a second thread to initiate memslot removal at roughly the same time as prefaulting. Test memslot removal for all testcases, i.e. don't limit concurrent removal to only the success case. There are essentially three prefault scenarios (so far) that are of interest: 1. Success 2. ENOENT due to no memslot 3. EAGAIN due to INVALID memslot For all intents and purposes, #1 and #2 are mutually exclusive, or rather, easier to test via separate testcases since writing to non-existent memory is trivial. But for #3, making it mutually exclusive with #1 _or_ #2 is actually more complex than testing memslot removal for all scenarios. The only requirement to let memslot removal coexist with other scenarios is a way to guarantee a stable result, e.g. that the "no memslot" test observes ENOENT, not EAGAIN, for the final checks. So, rather than make memslot removal mutually exclusive with the ENOENT scenario, simply restore the memslot and retry prefaulting. For the "no memslot" case, KVM_PRE_FAULT_MEMORY should be idempotent, i.e. should always fail with ENOENT regardless of how many times userspace attempts prefaulting. Pass in both the base GPA and the offset (instead of the "full" GPA) so that the worker can recreate the memslot. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250924174255.2141847-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-09-30Merge tag 'kvm-x86-cet-6.18' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 CET virtualization support for 6.18 Add support for virtualizing Control-flow Enforcement Technology (CET) on Intel (Shadow Stacks and Indirect Branch Tracking) and AMD (Shadow Stacks). CET is comprised of two distinct features, Shadow Stacks (SHSTK) and Indirect Branch Tracking (IBT), that can be utilized by software to help provide Control-flow integrity (CFI). SHSTK defends against backward-edge attacks (a.k.a. Return-oriented programming (ROP)), while IBT defends against forward-edge attacks (a.k.a. similarly CALL/JMP-oriented programming (COP/JOP)). Attackers commonly use ROP and COP/JOP methodologies to redirect the control- flow to unauthorized targets in order to execute small snippets of code, a.k.a. gadgets, of the attackers choice. By chaining together several gadgets, an attacker can perform arbitrary operations and circumvent the system's defenses. SHSTK defends against backward-edge attacks, which execute gadgets by modifying the stack to branch to the attacker's target via RET, by providing a second stack that is used exclusively to track control transfer operations. The shadow stack is separate from the data/normal stack, and can be enabled independently in user and kernel mode. When SHSTK is is enabled, CALL instructions push the return address on both the data and shadow stack. RET then pops the return address from both stacks and compares the addresses. If the return addresses from the two stacks do not match, the CPU generates a Control Protection (#CP) exception. IBT defends against backward-edge attacks, which branch to gadgets by executing indirect CALL and JMP instructions with attacker controlled register or memory state, by requiring the target of indirect branches to start with a special marker instruction, ENDBRANCH. If an indirect branch is executed and the next instruction is not an ENDBRANCH, the CPU generates a #CP. Note, ENDBRANCH behaves as a NOP if IBT is disabled or unsupported. From a virtualization perspective, CET presents several problems. While SHSTK and IBT have two layers of enabling, a global control in the form of a CR4 bit, and a per-feature control in user and kernel (supervisor) MSRs (U_CET and S_CET respectively), the {S,U}_CET MSRs can be context switched via XSAVES/XRSTORS. Practically speaking, intercepting and emulating XSAVES/XRSTORS is not a viable option due to complexity, and outright disallowing use of XSTATE to context switch SHSTK/IBT state would render the features unusable to most guests. To limit the overall complexity without sacrificing performance or usability, simply ignore the potential virtualization hole, but ensure that all paths in KVM treat SHSTK/IBT as usable by the guest if the feature is supported in hardware, and the guest has access to at least one of SHSTK or IBT. I.e. allow userspace to advertise one of SHSTK or IBT if both are supported in hardware, even though doing so would allow a misbehaving guest to use the unadvertised feature. Fully emulating SHSTK and IBT would also require significant complexity, e.g. to track and update branch state for IBT, and shadow stack state for SHSTK. Given that emulating large swaths of the guest code stream isn't necessary on modern CPUs, punt on emulating instructions that meaningful impact or consume SHSTK or IBT. However, instead of doing nothing, explicitly reject emulation of such instructions so that KVM's emulator can't be abused to circumvent CET. Disable support for SHSTK and IBT if KVM is configured such that emulation of arbitrary guest instructions may be required, specifically if Unrestricted Guest (Intel only) is disabled, or if KVM will emulate a guest.MAXPHYADDR that is smaller than host.MAXPHYADDR. Lastly disable SHSTK support if shadow paging is enabled, as the protections for the shadow stack are novel (shadow stacks require Writable=0,Dirty=1, so that they can't be directly modified by software), i.e. would require non-trivial support in the Shadow MMU. Note, AMD CPUs currently only support SHSTK. Explicitly disable IBT support so that KVM doesn't over-advertise if AMD CPUs add IBT, and virtualizing IBT in SVM requires KVM modifications.
2025-09-30Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 changes for 6.18 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's intercepts and trigger a variety of WARNs in KVM. - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs. - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs. - Clean up KVM's vector hashing code for delivering lowest priority IRQs. - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are actually "fast", as opposed to handling those that KVM _hopes_ are fast, and in the process of doing so add fastpath support for TSC_DEADLINE writes on AMD CPUs. - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs. - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios). - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support. - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs, as KVM can't faithfully emulate an I/O APIC for such guests. - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation for mediated vPMU support, as KVM will need to recalculate MSR intercepts in response to PMU refreshes for guests with mediated vPMUs. - Misc cleanups and minor fixes.