Skip to content

Conversation

@priyasiddharth
Copy link
Contributor

Summary of the PR

Fix to virtio-queue verification code to follow new vm-memory code

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

Updates the requirements on [vm-memory](https://github.com/rust-vmm/vm-memory) to permit the latest version.

Updates `vm-memory` to 0.17.1
- [Release notes](https://github.com/rust-vmm/vm-memory/releases)
- [Changelog](https://github.com/rust-vmm/vm-memory/blob/v0.16.1/CHANGELOG.md)
- [Commits](rust-vmm/vm-memory@v0.16.0...v0.17.1)

---
updated-dependencies:
- dependency-name: vm-memory
  dependency-version: 0.17.1
  dependency-type: direct:production
  dependency-group: vm-virtio
...

Signed-off-by: dependabot[bot] <[email protected]>
[SG: Set the right version in the commit message and updated vm-memory
 also in fuzz/]
[SG: Adapted `fuzz/common/src/vsock.rs` to the new vm-memory code]
Signed-off-by: Stefano Garzarella <[email protected]>
@priyasiddharth
Copy link
Contributor Author

Will fix the build failures

@priyasiddharth
Copy link
Contributor Author

The GuestMemory architecture has changed beyond the addition/deletion of methods. Currently load from our StubRegion is not working.
The symptom is :

Failed Checks: This is a placeholder message; Kani doesn't support message formatted at runtime
 File: "/Users/runner/.rustup/toolchains/nightly-2025-04-24-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/res$

Likely an unwrap is being called on the value being loaded (the happy path is that there is no failure and hence kani does not have to compute an error message) and I suspect rust-vmm/vm-memory#339 is the source of the issue. I ran out of time to look more into this. Can someone who is familiar with vm-memory take a look?

@stefano-garzarella
Copy link
Member

The GuestMemory architecture has changed beyond the addition/deletion of methods. Currently load from our StubRegion is not working. The symptom is :

Failed Checks: This is a placeholder message; Kani doesn't support message formatted at runtime
 File: "/Users/runner/.rustup/toolchains/nightly-2025-04-24-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/res$

Likely an unwrap is being called on the value being loaded (the happy path is that there is no failure and hence kani does not have to compute an error message) and I suspect rust-vmm/vm-memory#339 is the source of the issue. I ran out of time to look more into this. Can someone who is familiar with vm-memory take a look?

@XanClic @roypat @bonzini any thought on this?

@bonzini
Copy link
Member

bonzini commented Oct 27, 2025

I would need someone to explain more verbosely what's going on. :) the linked PR changes get_slice() to an iterator, is that related?

@XanClic
Copy link

XanClic commented Oct 27, 2025

Yes, at least a stack trace to know where the failure is would be good.

suspect rust-vmm/vm-memory#339 is the source of the issue.

As far as I can tell, the only panic that it adds to the load() path could happen when trying to load/store an object of size 0. Specifically, the panic checks that .get_slices() returns no empty iterator, which can only happen when the object would have size 0.

rust-vmm/vm-memory#349 added another panic, checking that GuestMemoryRegion::get_slice() returns a slice of exactly the requested length (after we ensured that the start..(start + len) range will fit into the region). So it could be that StubRegion doesn’t do that.

@XanClic
Copy link

XanClic commented Oct 27, 2025

This is the full error

Check 804: std::result::unwrap_failed.assertion.1
         - Status: FAILURE
         - Description: "This is a placeholder message; Kani doesn't support message formatted at runtime"
         - Location: ../../../../runner/.rustup/toolchains/nightly-2025-08-06-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1852:5 in function std::result::unwrap_failed

So it looks like an .unwrap() (or adjacent) error, yes, not a plain assertion. I verified that rust-vmm/vm-memory@66b9427 causes this breakage.

Replacing various unwrap()s with plain unformatted panic!()s, I got lucky with the get_used_idx(&queue, &memory).unwrap() in verify_add_used() in virtio-queue/src/queue/verification.rs.

So the problem is not a new unwrap() in vm-memory, but that load() returns an error. The intention of the commit was indeed to introduce errors instead of doing the wrong thing (specifically for short/overlapping GuestMemoryRegion mappings, i.e. the atomic operation happening across two different GuestMemoryRegions), but I’m still investigating the exact error source.

@XanClic
Copy link

XanClic commented Oct 27, 2025

Ah, I think I got it. As far as I can tell, the error comes from the default impl GuestMemoryRegion::get_slice(), because StubRegion doesn’t implement this. Bytes::load() used to call .to_region_addr(), but, well, now it needs .get_slice().

@XanClic
Copy link

XanClic commented Oct 27, 2025

Adding this to impl GuestMemoryRegion for StubRegion seems to work:

fn get_slice(
    &self,
    offset: MemoryRegionAddress,
    count: usize,
) -> GuestMemoryResult<VolatileSlice<()>> {
    Ok(unsafe {
        VolatileSlice::with_bitmap(
            self.buffer.add(offset.raw_value() as usize),
            count,
            (),
            None,
        )
    })
}

I don’t know if it’s right, I just copied it from https://github.com/rust-vmm/vm-memory/blob/main/src/mmap/unix.rs#L394 – but at least it does make the test pass.

(EDIT: I really should add that it’s at least missing a bounds check.)

@stefano-garzarella
Copy link
Member

@XanClic thank you very much! After applying your fix, now I have another failure that is unrelated to that issue.

Check 5471: vm_memory::volatile_memory::copy_slice_impl::copy_slice_volatile::{closure
#0}.unwind.0
        - Status: FAILURE
        - Description: "unwinding assertion loop 0"
        - Location: ../../.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vm-mem
ory-0.17.1/src/volatile_memory.rs:1346:13 in function vm_memory::volatile_memory::copy
_slice_impl::copy_slice_volatile::{closure#0}

I guess it's related #[kani::unwind(0)] on top of verify_add_used() but I don't know the details. @priyasiddharth @MatiasVara can you check it?

Otherwise we should disable kani in the CI, since we would like to do a new release with the new vm-memory to update vhost/vhot-devices.

@MatiasVara
Copy link
Contributor

@XanClic thank you very much! After applying your fix, now I have another failure that is unrelated to that issue.

Check 5471: vm_memory::volatile_memory::copy_slice_impl::copy_slice_volatile::{closure
#0}.unwind.0
        - Status: FAILURE
        - Description: "unwinding assertion loop 0"
        - Location: ../../.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/vm-mem
ory-0.17.1/src/volatile_memory.rs:1346:13 in function vm_memory::volatile_memory::copy
_slice_impl::copy_slice_volatile::{closure#0}

I guess it's related #[kani::unwind(0)] on top of verify_add_used() but I don't know the details. @priyasiddharth @MatiasVara can you check it?

Otherwise we should disable kani in the CI, since we would like to do a new release with the new vm-memory to update vhost/vhot-devices.

I will give a look, apologies for the delay.

@priyasiddharth
Copy link
Contributor Author

I looked into this a bit.

  1. The with_bitmap VolatileSlice call looks fine.
  2. The problem seems to be that we expect to iterate through VolatileSlices in load and store methods in [1]. To get through this, we need to unroll [2] once. However, Kani does not support per loop unrolling and the setting all loops to unroll to one gets the execution into a bad case of forever unrolling (i killed after a couple of minutes).

To validate this hypothesis, a programmer can manually unroll the iterator once and then iterate when len(volatile_slices) are > 1 in [1]. We might need to have a verification friendly version of GuestMemory if this hypothesis is true.

References

@MatiasVara
Copy link
Contributor

I looked into this a bit.

  1. The with_bitmap VolatileSlice call looks fine.
  2. The problem seems to be that we expect to iterate through VolatileSlices in load and store methods in [1]. To get through this, we need to unroll [2] once. However, Kani does not support per loop unrolling and the setting all loops to unroll to one gets the execution into a bad case of forever unrolling (i killed after a couple of minutes).

To validate this hypothesis, a programmer can manually unroll the iterator once and then iterate when len(volatile_slices) are > 1 in [1]. We might need to have a verification friendly version of GuestMemory if this hypothesis is true.

References

The only quick solution I can see is to 1) add Hanna's changes and 2) disable verify_add_used() for the moment. That we'll give us more time to understand how to fix it.

verify_add_used is disabled until rust-vmm#373 is fixed

Signed-off-by: Siddharth Priya <[email protected]>
@MatiasVara
Copy link
Contributor

Thanks @priyasiddharth! I think this PR can be split into two commits. First, the changes that correspond with following new vm-memory. Second, the disabling of the verify_add_used proof. WDYT?

3 similar comments
@MatiasVara
Copy link
Contributor

Thanks @priyasiddharth! I think this PR can be split into two commits. First, the changes that correspond with following new vm-memory. Second, the disabling of the verify_add_used proof. WDYT?

@MatiasVara
Copy link
Contributor

Thanks @priyasiddharth! I think this PR can be split into two commits. First, the changes that correspond with following new vm-memory. Second, the disabling of the verify_add_used proof. WDYT?

@MatiasVara
Copy link
Contributor

Thanks @priyasiddharth! I think this PR can be split into two commits. First, the changes that correspond with following new vm-memory. Second, the disabling of the verify_add_used proof. WDYT?

@priyasiddharth
Copy link
Contributor Author

@stefano-garzarella (and code-owners) please add other comments also. I want to make all changes in a single shot if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants