<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/Documentation/virtual/kvm/api.txt, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-04-30T19:22:15Z</updated>
<entry>
<title>KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size</title>
<updated>2019-04-30T19:22:15Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2019-04-17T13:28:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=76d58e0f07ec203bbdfcaabd9a9fc10a5a3ed5ea'/>
<id>urn:sha1:76d58e0f07ec203bbdfcaabd9a9fc10a5a3ed5ea</id>
<content type='text'>
If a memory slot's size is not a multiple of 64 pages (256K), then
the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
either requires the requested page range to go beyond memslot-&gt;npages,
or requires log-&gt;num_pages to be unaligned, and kvm_clear_dirty_log_protect
requires log-&gt;num_pages to be both in range and aligned.

To allow this case, allow log-&gt;num_pages not to be a multiple of 64 if
it ends exactly on the last page of the slot.

Reported-by: Peter Xu &lt;peterx@redhat.com&gt;
Fixes: 98938aa8edd6 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Documentation: kvm: fix dirty log ioctl arch lists</title>
<updated>2019-04-29T12:06:04Z</updated>
<author>
<name>Andrew Jones</name>
<email>drjones@redhat.com</email>
</author>
<published>2019-04-29T08:27:10Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=dbcdae185a704068c22984d6d05acc140ec03a8f'/>
<id>urn:sha1:dbcdae185a704068c22984d6d05acc140ec03a8f</id>
<content type='text'>
KVM_GET_DIRTY_LOG is implemented by all architectures, not just x86,
and KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is additionally implemented by
arm, arm64, and mips.

Signed-off-by: Andrew Jones &lt;drjones@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION</title>
<updated>2019-03-28T16:30:11Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2019-03-28T16:22:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e2788c4a41cb5fa68096f5a58bccacec1a700295'/>
<id>urn:sha1:e2788c4a41cb5fa68096f5a58bccacec1a700295</id>
<content type='text'>
The documentation does not mention how to delete a slot, add the
information.

Reported-by: Nathaniel McCallum &lt;npmccallum@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: doc: Document the life cycle of a VM and its resources</title>
<updated>2019-03-28T16:29:51Z</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-02-15T20:48:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=919f6cd8bb2fe7151f8aecebc3b3d1ca2567396e'/>
<id>urn:sha1:919f6cd8bb2fe7151f8aecebc3b3d1ca2567396e</id>
<content type='text'>
The series to add memcg accounting to KVM allocations[1] states:

  There are many KVM kernel memory allocations which are tied to the
  life of the VM process and should be charged to the VM process's
  cgroup.

While it is correct to account KVM kernel allocations to the cgroup of
the process that created the VM, it's technically incorrect to state
that the KVM kernel memory allocations are tied to the life of the VM
process.  This is because the VM itself, i.e. struct kvm, is not tied to
the life of the process which created it, rather it is tied to the life
of its associated file descriptor.  In other words, kvm_destroy_vm() is
not invoked until fput() decrements its associated file's refcount to
zero.  A simple example is to fork() in Qemu and have the child sleep
indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
descriptor *and* the rogue child is killed.

The allocations are guaranteed to be *accounted* to the process which
created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
ioctl() with -EIO if kvm-&gt;mm != current-&gt;mm.  I.e. the child can keep
the VM "alive" but can't do anything useful with its reference.

Note that because 'struct kvm' also holds a reference to the mm_struct
of its owner, the above behavior also applies to userspace allocations.

Given that mucking with a VM's file descriptor can lead to subtle and
undesirable behavior, e.g. memcg charges persisting after a VM is shut
down, explicitly document a VM's lifecycle and its impact on the VM's
resources.

Alternatively, KVM could aggressively free resources when the creating
process exits, e.g. via mmu_notifier-&gt;release().  However, mmu_notifier
isn't guaranteed to be available, and freeing resources when the creator
exits is likely to be error prone and fragile as KVM would need to
ensure that it only freed resources that are truly out of reach. In
practice, the existing behavior shouldn't be problematic as a properly
configured system will prevent a child process from being moved out of
the appropriate cgroup hierarchy, i.e. prevent hiding the process from
the OOM killer, and will prevent an unprivileged user from being able to
to hold a reference to struct kvm via another method, e.g. debugfs.

[1]https://patchwork.kernel.org/patch/10806707/

Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: Reject device ioctls from processes other than the VM's creator</title>
<updated>2019-03-28T16:27:06Z</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-02-15T20:48:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ddba91801aeb5c160b660caed1800eb3aef403f8'/>
<id>urn:sha1:ddba91801aeb5c160b660caed1800eb3aef403f8</id>
<content type='text'>
KVM's API requires thats ioctls must be issued from the same process
that created the VM.  In other words, userspace can play games with a
VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the
creator can do anything useful.  Explicitly reject device ioctls that
are issued by a process other than the VM's creator, and update KVM's
API documentation to extend its requirements to device ioctls.

Fixes: 852b6d57dc7f ("kvm: add device control API")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: doc: Fix incorrect word ordering regarding supported use of APIs</title>
<updated>2019-03-28T16:27:04Z</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-02-15T20:48:38Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5e124900c6ebf5dfbe31b7b67073a64b2da14967'/>
<id>urn:sha1:5e124900c6ebf5dfbe31b7b67073a64b2da14967</id>
<content type='text'>
Per Paolo[1], instantiating multiple VMs in a single process is legal;
but this conflicts with KVM's API documentation, which states:

  The only supported use is one virtual machine per process, and one
  vcpu per thread.

However, an earlier section in the documentation states:

   Only run VM ioctls from the same process (address space) that was used
   to create the VM.

and:

   Only run vcpu ioctls from the same thread that was used to create the
   vcpu.

This suggests that the conflicting documentation is simply an incorrect
ordering of of words, i.e. what's really meant is that a virtual machine
can't be shared across multiple processes and a vCPU can't be shared
across multiple threads.

Tweak the blurb on issuing ioctls to use a more assertive tone, and
rewrite the "supported use" sentence to reference said blurb instead of
poorly restating it in different terms.

Opportunistically add missing punctuation.

[1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com

Fixes: 9c1b96e34717 ("KVM: Document basic API")
Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
[Improve notes on asynchronous ioctl]
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>KVM: doc: Document the life cycle of a VM and its resources</title>
<updated>2019-03-15T18:24:33Z</updated>
<author>
<name>Sean Christopherson</name>
<email>sean.j.christopherson@intel.com</email>
</author>
<published>2019-02-15T20:48:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=eca6be566d47029f945a5f8e1c94d374e31df2ca'/>
<id>urn:sha1:eca6be566d47029f945a5f8e1c94d374e31df2ca</id>
<content type='text'>
The series to add memcg accounting to KVM allocations[1] states:

  There are many KVM kernel memory allocations which are tied to the
  life of the VM process and should be charged to the VM process's
  cgroup.

While it is correct to account KVM kernel allocations to the cgroup of
the process that created the VM, it's technically incorrect to state
that the KVM kernel memory allocations are tied to the life of the VM
process.  This is because the VM itself, i.e. struct kvm, is not tied to
the life of the process which created it, rather it is tied to the life
of its associated file descriptor.  In other words, kvm_destroy_vm() is
not invoked until fput() decrements its associated file's refcount to
zero.  A simple example is to fork() in Qemu and have the child sleep
indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
descriptor *and* the rogue child is killed.

The allocations are guaranteed to be *accounted* to the process which
created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
ioctl() with -EIO if kvm-&gt;mm != current-&gt;mm.  I.e. the child can keep
the VM "alive" but can't do anything useful with its reference.

Note that because 'struct kvm' also holds a reference to the mm_struct
of its owner, the above behavior also applies to userspace allocations.

Given that mucking with a VM's file descriptor can lead to subtle and
undesirable behavior, e.g. memcg charges persisting after a VM is shut
down, explicitly document a VM's lifecycle and its impact on the VM's
resources.

Alternatively, KVM could aggressively free resources when the creating
process exits, e.g. via mmu_notifier-&gt;release().  However, mmu_notifier
isn't guaranteed to be available, and freeing resources when the creator
exits is likely to be error prone and fragile as KVM would need to
ensure that it only freed resources that are truly out of reach. In
practice, the existing behavior shouldn't be problematic as a properly
configured system will prevent a child process from being moved out of
the appropriate cgroup hierarchy, i.e. prevent hiding the process from
the OOM killer, and will prevent an unprivileged user from being able to
to hold a reference to struct kvm via another method, e.g. debugfs.

[1]https://patchwork.kernel.org/patch/10806707/

Signed-off-by: Sean Christopherson &lt;sean.j.christopherson@intel.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID</title>
<updated>2018-12-14T16:59:54Z</updated>
<author>
<name>Vitaly Kuznetsov</name>
<email>vkuznets@redhat.com</email>
</author>
<published>2018-12-10T17:21:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2bc39970e9327ceb06cb210f86ba35f81d00e350'/>
<id>urn:sha1:2bc39970e9327ceb06cb210f86ba35f81d00e350</id>
<content type='text'>
With every new Hyper-V Enlightenment we implement we're forced to add a
KVM_CAP_HYPERV_* capability. While this approach works it is fairly
inconvenient: the majority of the enlightenments we do have corresponding
CPUID feature bit(s) and userspace has to know this anyways to be able to
expose the feature to the guest.

Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
cap to rule them all!") returning all Hyper-V CPUID feature leaves.

Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
0x40000001) and we would probably confuse userspace in case we decide to
return these twice.

KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.

Suggested-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Signed-off-by: Vitaly Kuznetsov &lt;vkuznets@redhat.com&gt;
Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: introduce manual dirty log reprotect</title>
<updated>2018-12-14T11:34:19Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2018-10-23T00:36:47Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=2a31b9db153530df4aa02dac8c32837bf5f47019'/>
<id>urn:sha1:2a31b9db153530df4aa02dac8c32837bf5f47019</id>
<content type='text'>
There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
it can take kvm-&gt;mmu_lock for an extended period of time.  Second, its user
can actually see many false positives in some cases.  The latter is due
to a benign race like this:

  1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
     them.
  2. The guest modifies the pages, causing them to be marked ditry.
  3. Userspace actually copies the pages.
  4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
     they were not written to since (3).

This is especially a problem for large guests, where the time between
(1) and (3) can be substantial.  This patch introduces a new
capability which, when enabled, makes KVM_GET_DIRTY_LOG not
write-protect the pages it returns.  Instead, userspace has to
explicitly clear the dirty log bits just before using the content
of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
64-page granularity rather than requiring to sync a full memslot;
this way, the mmu_lock is taken for small amounts of time, and
only a small amount of time will pass between write protection
of pages and the sending of their content.

Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
<entry>
<title>kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic</title>
<updated>2018-12-14T11:34:18Z</updated>
<author>
<name>Paolo Bonzini</name>
<email>pbonzini@redhat.com</email>
</author>
<published>2017-02-16T09:40:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e5d83c74a5800c2a1fa3ba982c1c4b2b39ae6db2'/>
<id>urn:sha1:e5d83c74a5800c2a1fa3ba982c1c4b2b39ae6db2</id>
<content type='text'>
The first such capability to be handled in virt/kvm/ will be manual
dirty page reprotection.

Signed-off-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
</content>
</entry>
</feed>
