-
Notifications
You must be signed in to change notification settings - Fork 127
KVM Dirty log ring interface #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The capability is used for the KVM dirty ring interface for tracking dirtied pages. Signed-off-by: David Kleymann <[email protected]>
Adds the KVM_RESET_DIRTY_RINGS ioctl and the function reset_dirty_rings in impl VmFd to wrap it. Signed-off-by: David Kleymann <[email protected]>
Adds vCPU functions to mmap the dirty ring and iterate over dirty pages. Also adds return value to reset_dirty_rings and flush_dirty_gfns. More info: https://www.kernel.org/doc/html/latest/virt/kvm/api.html#kvm-cap-dirty-log-ring-kvm-cap-dirty-log-ring-acq-rel Signed-off-by: David Kleymann <[email protected]>
Signed-off-by: David Kleymann <[email protected]>
… of map_dirty_log_ring Signed-off-by: David Kleymann <[email protected]>
Signed-off-by: David Kleymann <[email protected]>
Signed-off-by: David Kleymann <[email protected]>
Signed-off-by: David Kleymann <[email protected]>
Comments on the safety on the operations used to mmap the shared memory buffer of kvm_dirty_gfn entries. Signed-off-by: David Kleymann <[email protected]>
Signed-off-by: David Kleymann <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please squash your commits so that we do not have later commits be fixups for previous commits.
@@ -2,6 +2,8 @@ | |||
|
|||
## Upcoming Release | |||
|
|||
- Plumb through KVM_CAP_DIRTY_LOG_RING as DirtyLogRing cap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go into an ### Added
section
unsafe { | ||
let gfn = self.gfns.add(i as usize).as_mut(); | ||
if (*gfn).flags & KVM_DIRTY_GFN_F_DIRTY == 0 { | ||
// next_dirty stays the same, it will become the next dirty element | ||
return None; | ||
} else { | ||
self.next_dirty += 1; | ||
(*gfn).flags ^= KVM_DIRTY_GFN_F_RESET; | ||
return Some(((*gfn).slot, (*gfn).offset)); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should probably do just a single read_volatile
of the struct kvm_dirty_gfn
, to avoid data races, and write the updated kvm_dirty_gfn flags via write_volatile
(or, on weakly ordered architectures such as arm64, an atomic read with acquire ordering, and write with release ordering. Which means we'll probably also want to support checking KVM_CAP_DIRTY_LOG_RING_ACQ_REL
)
/// } | ||
/// } | ||
/// ``` | ||
pub fn dirty_log_ring_iter(&mut self) -> Option<&mut KvmDirtyLogRing> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can hide the actual iterator type here by just returning impl Iterator<Item = blablabla>
. Then maybe the entire struct doesnt need to be exported?
@@ -1930,14 +1930,14 @@ impl VmFd { | |||
/// } | |||
/// ``` | |||
/// | |||
pub fn reset_dirty_rings(&self) -> Result<()> { | |||
pub fn reset_dirty_rings(&self) -> Result<c_int> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should go into the previous commit
/// Resets all vCPU's dirty log rings. This notifies the kernel that pages have been harvested | ||
/// from the dirty ring and the corresponding pages can be reprotected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also just go into the commit that introduced the function
@@ -2106,8 +2106,7 @@ impl VcpuFd { | |||
} | |||
} | |||
|
|||
/// Maps the coalesced MMIO ring page. This allows reading entries from | |||
/// the ring via [`coalesced_mmio_read()`](VcpuFd::coalesced_mmio_read). | |||
/// Maps the KVM dirty log ring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should go into the commit that introduced the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and all the changes to this file should go into the commit that introduced struct KvmDirtyLogRing
@@ -3,6 +3,7 @@ | |||
## Upcoming Release | |||
|
|||
- Plumb through KVM_CAP_DIRTY_LOG_RING as DirtyLogRing cap. | |||
- Added support for dirty log ring interface introducing `VcpuFd::reset_dirty_rings`, `KvmDirtyLogRing` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add the changelog entries in the commit that introduces the respective functions.
}; | ||
|
||
let offset = page_size * KVM_DIRTY_LOG_PAGE_OFFSET as usize; | ||
|
||
if bytes % std::mem::size_of::<kvm_dirty_gfn>() != 0 { | ||
// Size of dirty ring in bytes must be multiples of slot size | ||
return Err(errno::Error::new(libc::EINVAL)); | ||
} | ||
let slots = bytes / std::mem::size_of::<kvm_dirty_gfn>(); | ||
if slots & (slots - 1) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slots.is_power_of_two()
kvm-ioctls/src/ioctls/vcpu.rs
Outdated
/// # extern crate kvm_ioctls; | ||
/// # use kvm_ioctls::{Cap, Kvm}; | ||
/// let kvm = Kvm::new().unwrap(); | ||
/// let vm = kvm.create_vm().unwrap(); | ||
/// let mut vcpu = vm.create_vcpu(0).unwrap(); | ||
/// if kvm.check_extension(Cap::DirtyLogRing) { | ||
/// vcpu.coalesced_mmio_ring().unwrap(); | ||
/// } | ||
/// ``` | ||
pub fn map_dirty_log_ring(&mut self, bytes: usize) -> Result<()> { | ||
if self.dirty_log_ring.is_none() { | ||
let ring = KvmDirtyLogRing::mmap_from_fd(&self.vcpu, bytes)?; | ||
self.dirty_log_ring = Some(ring); | ||
} | ||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dirty log ring interface is enabled via KVM_ENABLE_CAP
(which we expose as enable_cap()
in this crate), which is where the size of the ring buffer is specified as well. So I would propose to make the API here a wrapper on top of enable_cap()
, say enable_dirty_log_ring(bytes: usize)
, which does the VmFd::enable_cap() call and then stores the actual size of the ring buffer directly in the VmFd struct. Then on VcpuFd creation, we check if the dirty ring capability was ever enabled on the VmFd, and if so, just mmap the ring with the size stored in the VmFd at vcpu creation time (maybe disallow calling enable_dirty_log_ring() if vcpus were already created previously, although KVM also already checks this). That way the issue of "what is the correct size to mmap" goes away.
Also, do we need to do anything about KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP? |
As for this, Now, whether Send and Sync will actually be useful is a different matter, because I'm assuming you want them to be able to harvest the dirty ring while the vcpu is still running, but with the current API in this PR, you can only get a &mut reference to the ring buffer structure, which you cannot send to another thread anyway. So in this case, you'd need some API that lets you take ownership of the KvmDirtyRingLog structure (maybe store the one that gets created at vcpu creation time in an option, and then have a function that just wraps .take() on that option, keeping in mind that we must never allow safe code to get two owned versions of it for the same vcpu). |
Also sorry for the late response, we were all preparing for / traveling to KVM Forum last week! |
Summary of the PR
Add support for dirty page tracking via the Dirty ring interface. Adds KvmDirtyLogRing structure for keeping track of the indices and the base pointer to the shared memory buffer. Implements iterating over dirty pages, thereby harvesting them. Implements reset_dirty_rings on VmFd to trigger recycling of dirty ring buffer elements by the kernel after processing. Adds the dirty_log_ring field to VcpuFd.
This is a draft that needs some review and improvements, I'm hereby asking for suggestions for improving the following remaining weaknesses:
More info on the interface:
https://www.kernel.org/doc/html/latest/virt/kvm/api.html#kvm-cap-dirty-log-ring-kvm-cap-dirty-log-ring-acq-rel
Requirements
Before submitting your PR, please make sure you addressed the following
requirements:
git commit -s
), and the commit message has max 60 characters for thesummary and max 75 characters for each description line.
test.
Release" section of CHANGELOG.md (if no such section exists, please create one).
unsafe
code is properly documented.