<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/arch/x86/kvm/vmx/nested.c, branch linux-rolling-stable</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-rolling-stable'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2026-03-19T15:15:02Z</updated>
<entry>
<title>KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM</title>
<updated>2026-03-19T15:15:02Z</updated>
<author>
<name>Jim Mattson</name>
<email>jmattson@google.com</email>
</author>
<published>2026-02-05T23:15:26Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cb979700c40f592473704be14a8441b3a4451255'/>
<id>urn:sha1:cb979700c40f592473704be14a8441b3a4451255</id>
<content type='text'>
commit e2ffe85b6d2bb7780174b87aa4468a39be17eb81 upstream.

Add KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM to allow L1 to set
FREEZE_IN_SMM in vmcs12's GUEST_IA32_DEBUGCTL field, as permitted
prior to commit 6b1dd26544d0 ("KVM: VMX: Preserve host's
DEBUGCTLMSR_FREEZE_IN_SMM while running the guest").  Enable the quirk
by default for backwards compatibility (like all quirks); userspace
can disable it via KVM_CAP_DISABLE_QUIRKS2 for consistency with the
constraints on WRMSR(IA32_DEBUGCTL).

Note that the quirk only bypasses the consistency check.  The vmcs02 bit is
still owned by the host, and PMCs are not frozen during virtualized SMM.
In particular, if a host administrator decides that PMCs should not be
frozen during physical SMM, then L1 has no say in the matter.

Fixes: 095686e6fcb4 ("KVM: nVMX: Check vmcs12-&gt;guest_ia32_debugctl on nested VM-Enter")
Cc: stable@vger.kernel.org
Signed-off-by: Jim Mattson &lt;jmattson@google.com&gt;
Link: https://patch.msgid.link/20260205231537.1278753-1-jmattson@google.com
[sean: tag for stable@, clean-up and fix goofs in the comment and docs]
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
[Rename quirk. - Paolo]
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'kvm-x86-fixes-6.19-rc1' of https://github.com/kvm-x86/linux into HEAD</title>
<updated>2025-12-18T17:38:45Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2025-12-18T17:38:45Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=0499add8efd72456514c6218c062911ccc922a99'/>
<id>urn:sha1:0499add8efd72456514c6218c062911ccc922a99</id>
<content type='text'>
KVM fixes for 6.19-rc1

 - Add a missing "break" to fix param parsing in the rseq selftest.

 - Apply runtime updates to the _current_ CPUID when userspace is setting
   CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid
   dropping the pending update.

 - Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not
   supported by KVM and leads to a use-after-free due to KVM failing to unbind
   the memslot from the previously-associated guest_memfd instance.

 - Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting
   flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging.

 - Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested
   SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined
   as -1ull (a 64-bit value).

 - Update SVI when activating APICv to fix a bug where a post-activation EOI
   for an in-service IRQ would effective be lost due to SVI being stale.

 - Immediately refresh APICv controls (if necessary) on a nested VM-Exit
   instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is
   effectively ignored because KVM thinks the vCPU already has the correct
   APICv settings.
</content>
</entry>
<entry>
<title>KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit</title>
<updated>2025-12-08T14:56:29Z</updated>
<author>
<name>Dongli Zhang</name>
<email>dongli.zhang@oracle.com</email>
</author>
<published>2025-12-05T23:19:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=29763138830916f46daaa50e83e7f4f907a3236b'/>
<id>urn:sha1:29763138830916f46daaa50e83e7f4f907a3236b</id>
<content type='text'>
If an APICv status updated was pended while L2 was active, immediately
refresh vmcs01's controls instead of pending KVM_REQ_APICV_UPDATE as
kvm_vcpu_update_apicv() only calls into vendor code if a change is
necessary.

E.g. if APICv is inhibited, and then activated while L2 is running:

  kvm_vcpu_update_apicv()
  |
  -&gt; __kvm_vcpu_update_apicv()
     |
     -&gt; apic-&gt;apicv_active = true
      |
      -&gt; vmx_refresh_apicv_exec_ctrl()
         |
         -&gt; vmx-&gt;nested.update_vmcs01_apicv_status = true
          |
          -&gt; return

Then L2 exits to L1:

  __nested_vmx_vmexit()
  |
  -&gt; kvm_make_request(KVM_REQ_APICV_UPDATE)

  vcpu_enter_guest(): KVM_REQ_APICV_UPDATE
  -&gt; kvm_vcpu_update_apicv()
     |
     -&gt; __kvm_vcpu_update_apicv()
        |
        -&gt; return // because if (apic-&gt;apicv_active == activate)

Reported-by: Chao Gao &lt;chao.gao@intel.com&gt;
Closes: https://lore.kernel.org/all/aQ2jmnN8wUYVEawF@intel.com
Fixes: 7c69661e225c ("KVM: nVMX: Defer APICv updates while L2 is active until L1 is active")
Cc: stable@vger.kernel.org
Signed-off-by: Dongli Zhang &lt;dongli.zhang@oracle.com&gt;
[sean: write changelog]
Link: https://patch.msgid.link/20251205231913.441872-3-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'kvm-x86-vmx-6.19' of https://github.com/kvm-x86/linux into HEAD</title>
<updated>2025-11-26T08:44:52Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2025-11-26T08:44:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=d1e7b4613e2fce06f7a7e3cf4285fbaa547656ba'/>
<id>urn:sha1:d1e7b4613e2fce06f7a7e3cf4285fbaa547656ba</id>
<content type='text'>
KVM VMX changes for 6.19:

 - Use the root role from kvm_mmu_page to construct EPTPs instead of the
   current vCPU state, partly as worthwhile cleanup, but mostly to pave the
   way for tracking per-root TLB flushes so that KVM can elide EPT flushes on
   pCPU migration if KVM has flushed the root at least once.

 - Add a few missing nested consistency checks.

 - Rip out support for doing "early" consistency checks via hardware as the
   functionality hasn't been used in years and is no longer useful in general,
   and replace it with an off-by-default module param to detected missed
   consistency checks (i.e. WARN if hardware finds a check that KVM does not).

 - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32]
   on VM-Enter.

 - Misc cleanups.
</content>
</entry>
<entry>
<title>Merge tag 'kvm-x86-misc-6.19' of https://github.com/kvm-x86/linux into HEAD</title>
<updated>2025-11-26T08:34:21Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2025-11-26T08:34:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e64dcfab57ac41b19e7433511ba0fa5be4f99e83'/>
<id>urn:sha1:e64dcfab57ac41b19e7433511ba0fa5be4f99e83</id>
<content type='text'>
KVM x86 misc changes for 6.19:

 - Fix an async #PF bug where KVM would clear the completion queue when the
   guest transitioned in and out of paging mode, e.g. when handling an SMI and
   then returning to paged mode via RSM.

 - Fix a bug where TDX would effectively corrupt user-return MSR values if the
   TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected.

 - Leave the user-return notifier used to restore MSRs registered when
   disabling virtualization, and instead pin kvm.ko.  Restoring host MSRs via
   IPI callback is either pointless (clean reboot) or dangerous (forced reboot)
   since KVM has no idea what code it's interrupting.

 - Use the checked version of {get,put}_user(), as Linus wants to kill them
   off, and they're measurably faster on modern CPUs due to the unchecked
   versions containing an LFENCE.

 - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC
   timers can result in a hard lockup in the host.

 - Revert the periodic kvmclock sync logic now that KVM doesn't use a
   clocksource that's subject to NPT corrections.

 - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter
   behind CONFIG_CPU_MITIGATIONS.

 - Context switch XCR0, XSS, and PKRU outside of the entry/exit fastpath as
   the only reason they were handled in the faspath was to paper of a bug in
   the core #MC code that has long since been fixed.

 - Add emulator support for AVX MOV instructions to play nice with emulated
   devices whose PCI BARs guest drivers like to access with large multi-byte
   instructions.
</content>
</entry>
<entry>
<title>KVM: x86: Unify L1TF flushing under per-CPU variable</title>
<updated>2025-11-19T00:22:45Z</updated>
<author>
<name>Brendan Jackman</name>
<email>jackmanb@google.com</email>
</author>
<published>2025-11-13T23:37:46Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=38ee66cb1845dbf1e97c5e5d3db01ae4513f66a9'/>
<id>urn:sha1:38ee66cb1845dbf1e97c5e5d3db01ae4513f66a9</id>
<content type='text'>
Currently the tracking of the need to flush L1D for L1TF is tracked by
two bits: one per-CPU and one per-vCPU.

The per-vCPU bit is always set when the vCPU shows up on a core, so
there is no interesting state that's truly per-vCPU. Indeed, this is a
requirement, since L1D is a part of the physical CPU.

So simplify this by combining the two bits.

The vCPU bit was being written from preemption-enabled regions.  To play
nice with those cases, wrap all calls from KVM and use a raw write so that
request a flush with preemption enabled doesn't trigger what would
effectively be DEBUG_PREEMPT false positives.  Preemption doesn't need to
be disabled, as kvm_arch_vcpu_load() will mark the new CPU as needing a
flush if the vCPU task is migrated, or if userspace runs the vCPU on a
different task.

Signed-off-by: Brendan Jackman &lt;jackmanb@google.com&gt;
[sean: put raw write in KVM instead of in a hardirq.h variant]
Link: https://patch.msgid.link/20251113233746.1703361-10-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: VMX: Inject #UD if guest tries to execute SEAMCALL or TDCALL</title>
<updated>2025-10-20T16:37:04Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-10-16T18:21:47Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9d7dfb95da2cb5c1287df2f3468bcb70d8b31087'/>
<id>urn:sha1:9d7dfb95da2cb5c1287df2f3468bcb70d8b31087</id>
<content type='text'>
Add VMX exit handlers for SEAMCALL and TDCALL to inject a #UD if a non-TD
guest attempts to execute SEAMCALL or TDCALL.  Neither SEAMCALL nor TDCALL
is gated by any software enablement other than VMXON, and so will generate
a VM-Exit instead of e.g. a native #UD when executed from the guest kernel.

Note!  No unprivileged DoS of the L1 kernel is possible as TDCALL and
SEAMCALL #GP at CPL &gt; 0, and the CPL check is performed prior to the VMX
non-root (VM-Exit) check, i.e. userspace can't crash the VM. And for a
nested guest, KVM forwards unknown exits to L1, i.e. an L2 kernel can
crash itself, but not L1.

Note #2!  The Intel® Trust Domain CPU Architectural Extensions spec's
pseudocode shows the CPL &gt; 0 check for SEAMCALL coming _after_ the VM-Exit,
but that appears to be a documentation bug (likely because the CPL &gt; 0
check was incorrectly bundled with other lower-priority #GP checks).
Testing on SPR and EMR shows that the CPL &gt; 0 check is performed before
the VMX non-root check, i.e. SEAMCALL #GPs when executed in usermode.

Note #3!  The aforementioned Trust Domain spec uses confusing pseudocode
that says that SEAMCALL will #UD if executed "inSEAM", but "inSEAM"
specifically means in SEAM Root Mode, i.e. in the TDX-Module.  The long-
form description explicitly states that SEAMCALL generates an exit when
executed in "SEAM VMX non-root operation".  But that's a moot point as the
TDX-Module injects #UD if the guest attempts to execute SEAMCALL, as
documented in the "Unconditionally Blocked Instructions" section of the
TDX-Module base specification.

Cc: stable@vger.kernel.org
Cc: Kai Huang &lt;kai.huang@intel.com&gt;
Cc: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Cc: Rick Edgecombe &lt;rick.p.edgecombe@intel.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Cc: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Reviewed-by: Kai Huang &lt;kai.huang@intel.com&gt;
Reviewed-by: Binbin Wu &lt;binbin.wu@linux.intel.com&gt;
Reviewed-by: Xiaoyao Li &lt;xiaoyao.li@intel.com&gt;
Link: https://lore.kernel.org/r/20251016182148.69085-2-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: nVMX: Add an off-by-default module param to WARN on missed consistency checks</title>
<updated>2025-10-17T22:11:27Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-09-19T00:59:55Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1100e4910ad207bc00aedc8dfdb228dd1b81f310'/>
<id>urn:sha1:1100e4910ad207bc00aedc8dfdb228dd1b81f310</id>
<content type='text'>
Add an off-by-default param, "warn_on_missed_cc", to have KVM WARN on a
missed VMX Consistency Check on nested VM-Enter, specifically so that KVM
developers and maintainers can more easily detect missing checks.  KVM's
goal/intent is that KVM detect *all* VM-Fail conditions in software, as
relying on hardware leads to false passes when KVM's nested support is a
subset of hardware support, e.g. see commit 095686e6fcb4 ("KVM: nVMX:
Check vmcs12-&gt;guest_ia32_debugctl on nested VM-Enter").

With one notable exception, KVM now detects all VM-Fail scenarios for
which there is known test coverage, i.e. KVM developers can enable the
param and expect a clean run, and thus can use the param to detect missed
checks, e.g. when enabling new features, when writing new tests, etc.

The one exception is an unfortunate consistency check on vTPR.  Because
the vTPR for L2 comes from the virtual APIC page provided by L1, L2's vTPR
is fully writable at all times, i.e. is inherently subject to TOCTOU
issues with respect to checks in software versus consumption in hardware.
Further complicating matters is KVM's deferred handling of vmcs12 pages
when loading nested state; KVM flat out cannot check vTPR during
KVM_SET_NESTED_STATE without breaking setups that do on-demand paging,
e.g. for live migration and/or live update.

To fudge around the vTPR issue, add a "late" controls check for vTPR and
also treat an invalid virtual APIC as VM-Fail, but gate the check on
warn_on_missed_cc being enabled to avoid unwanted false positives, i.e. to
avoid breaking KVM in production.

Cc: Jim Mattson &lt;jmattson@google.com&gt;
Link: https://lore.kernel.org/r/20250919005955.1366256-10-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: nVMX: Remove support for "early" consistency checks via hardware</title>
<updated>2025-10-17T22:11:27Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-09-19T00:59:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a175da6d430ef7f8e24153e44c59ab6903e20f97'/>
<id>urn:sha1:a175da6d430ef7f8e24153e44c59ab6903e20f97</id>
<content type='text'>
Remove nested_early_check and all associated code, as it's quite obviously
not being used or tested (it's been broken for 4+ years without a single
bug report).  More importantly, KVM's software-based consistency checks
have matured since the option to do hardware-based checks was added; KVM
appears to be missing only _one_ consistency check, on vTPR.  And even
*more* importantly, that consistency check can't be prevented by an early
hardware check due to L1 being able to modify the virtual APIC at any
time, i.e. there's an inherent TOCTOU flaw that could cause KVM to "miss"
a consistency check VM-Fail, regardless of whether the check is performed
by software or by hardware.

In other words, KVM _must_ be able to unwind from a late VM-Fail (which
was a big motivation for doing early checks).  I.e. now that KVM provides
(almost) all necessary consistency checks, what's really needed is a way
to detect missing checks in KVM, not a way to avoid having to unwind from
a late VM-Fail.  And that can be done much more simply, e.g. by an simple
module param to guard a WARN (which, sadly, must be off-by-default to
avoid splats due to the aforementioned TOCTOU issue).

For all intents and purposes, this reverts commit 52017608da33 ("KVM:
nVMX: add option to perform early consistency checks via H/W").

Link: https://lore.kernel.org/r/20250919005955.1366256-9-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
<entry>
<title>KVM: nVMX: Stuff vmcs02.TSC_MULTIPLIER early on for nested early checks</title>
<updated>2025-10-17T22:11:27Z</updated>
<author>
<name>Sean Christopherson</name>
<email>seanjc@google.com</email>
</author>
<published>2025-09-19T00:59:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f91699d5692ddd0ee92b9487014fc477179ab3a7'/>
<id>urn:sha1:f91699d5692ddd0ee92b9487014fc477179ab3a7</id>
<content type='text'>
If KVM is doing "early" nested VM-Enter consistency checks and TSC scaling
is supported, stuff vmcs02's TSC Multiplier early on to avoid getting a
false positive VM-Fail due to trying to do VM-Enter with TSC_MULTIPLIER=0.
To minimize complexity around L1 vs. L2 TSC, KVM sets the actual TSC
Multiplier rather late during VM-Entry, i.e. may have '0' at the time of
early consistency checks.

If vmcs12 has TSC Scaling enabled, use the multiplier from vmcs12 so that
nested early checks actually check vmcs12 state, otherwise throw in an
arbitrary value of '1' (anything non-zero is legal).

Fixes: d041b5ea9335 ("KVM: nVMX: Enable nested TSC scaling")
Link: https://lore.kernel.org/r/20250919005955.1366256-8-seanjc@google.com
Signed-off-by: Sean Christopherson &lt;seanjc@google.com&gt;
</content>
</entry>
</feed>
