Skip to content

Conversation

last-genius
Copy link
Contributor

@last-genius last-genius commented Jul 3, 2025

Don't assume the presence of PV drivers magically solves everything, actually honour feature-suspend and make data/cant_suspend_reason critical.


Manually tested to not break the following cases (thanks to @dinhngtu):

  • Test on migratable Windows (shouldn't block suspend) - OK

  • Test on Windows with a non-migratable device (should block suspend) - OK (VM lacks feature)

  • Test on migratable Linux with unplug - OK

  • Test on Linux with a non-migratable device - OK (VM lacks feature)

  • Test nopv Windows/Linux without unmigratable devices with manual feature-suspend xenstore entry forced

    • Windows BIOS: nopv OK

    • Linux BIOS: nopv OK

    • Linux UEFI: nopv OK

…tures

A 10+ year old note justified assuming that feature-suspend and others are
always available by referring to a buggy Windows guest agent. It has been
correctly writing 1 to control/feature-suspend for at least a decade now
(tested on old Windows PV drivers), so there is no reason to maintain this
workaround.

This allows PV drivers to potentially disable such features.

Signed-off-by: Andrii Sultanov <[email protected]>
When data/cant_suspend_reason is present in xenstore (renamed
data-cant-suspend-reason in "other" guest metrics), it would exclude
operations involving suspend from the allowed operations list, but still allow
actually suspending (strict=true during allowed operations checks but could be
false otherwise. This was additionally overriden by assuming that PV driver
presence guarantees feature-suspend).

Suspending in such a situation would always result in a crash, usually with
the following error:

```
$ xe vm-suspend uuid=d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2
The server failed to handle your request, due to an internal error.
The given message may give details useful for debugging the problem.
message: xenopsd internal error:
Device_common.QMP_Error(8,
"{\"error\":
            {\"class\":\"GenericError\",
             \"desc\":\"State blocked by non-migratable device '0000:00:07.0/nvme'\",
             \"data\":{}
             },
  \"id\":\"qmp-000006-8\"}")
```

So disallow suspending altogether when cant_suspend_reason is present.
Log the QEMU error directly in the XAPI error as well:

```
$ xe vm-suspend uuid=d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2
You attempted an operation on a VM which lacks the feature.
vm: d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2 (Windows 10 (64-bit))
reason: {"error":{"class":"GenericError","desc":"State blocked by non-migratable device '0000:00:07.0/nvme'","data":{}},"id":"qmp-000012-9"}
```

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the asv/win-feature-suspend branch from c0aa1f8 to c895469 Compare July 3, 2025 13:17
@andyhhp
Copy link
Collaborator

andyhhp commented Jul 3, 2025

What about a VM with no Xen awareness at all? e.g. memtest.

These VMs do migrate safely, but have nothing in xenstore.

P.S. Guess what's a great VM to test that logdirty is working properly in the live phase of migrate.

@last-genius
Copy link
Contributor Author

What about a VM with no Xen awareness at all? e.g. memtest.

These VMs do migrate safely, but have nothing in xenstore.

P.S. Guess what's a great VM to test that logdirty is working properly in the live phase of migrate.

Existing behaviour should be preserved for this one, but I will test

@last-genius
Copy link
Contributor Author

What about a VM with no Xen awareness at all? e.g. memtest.
These VMs do migrate safely, but have nothing in xenstore.
P.S. Guess what's a great VM to test that logdirty is working properly in the live phase of migrate.

Existing behaviour should be preserved for this one, but I will test

It works just like before:

# xe vm-migrate uuid=X host=Y
You attempted an operation on a VM which lacks the feature.
vm: X (memtest86)

# xe vm-migrate uuid=X host=Y force=true <-- this succeeds

@last-genius last-genius added this pull request to the merge queue Jul 4, 2025
Merged via the queue into xapi-project:master with commit 5733419 Jul 4, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants