-
Notifications
You must be signed in to change notification settings - Fork 292
xapi_vm_lifecycle
: Improve feature handling, avoid crashes
#6569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xapi_vm_lifecycle
: Improve feature handling, avoid crashes
#6569
Conversation
…tures A 10+ year old note justified assuming that feature-suspend and others are always available by referring to a buggy Windows guest agent. It has been correctly writing 1 to control/feature-suspend for at least a decade now (tested on old Windows PV drivers), so there is no reason to maintain this workaround. This allows PV drivers to potentially disable such features. Signed-off-by: Andrii Sultanov <[email protected]>
When data/cant_suspend_reason is present in xenstore (renamed data-cant-suspend-reason in "other" guest metrics), it would exclude operations involving suspend from the allowed operations list, but still allow actually suspending (strict=true during allowed operations checks but could be false otherwise. This was additionally overriden by assuming that PV driver presence guarantees feature-suspend). Suspending in such a situation would always result in a crash, usually with the following error: ``` $ xe vm-suspend uuid=d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2 The server failed to handle your request, due to an internal error. The given message may give details useful for debugging the problem. message: xenopsd internal error: Device_common.QMP_Error(8, "{\"error\": {\"class\":\"GenericError\", \"desc\":\"State blocked by non-migratable device '0000:00:07.0/nvme'\", \"data\":{} }, \"id\":\"qmp-000006-8\"}") ``` So disallow suspending altogether when cant_suspend_reason is present. Log the QEMU error directly in the XAPI error as well: ``` $ xe vm-suspend uuid=d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2 You attempted an operation on a VM which lacks the feature. vm: d546c8c2-bcb1-0ed6-7931-d0fc9393ccb2 (Windows 10 (64-bit)) reason: {"error":{"class":"GenericError","desc":"State blocked by non-migratable device '0000:00:07.0/nvme'","data":{}},"id":"qmp-000012-9"} ``` Signed-off-by: Andrii Sultanov <[email protected]>
c0aa1f8
to
c895469
Compare
What about a VM with no Xen awareness at all? e.g. memtest. These VMs do migrate safely, but have nothing in xenstore. P.S. Guess what's a great VM to test that logdirty is working properly in the live phase of migrate. |
Existing behaviour should be preserved for this one, but I will test |
It works just like before:
|
Don't assume the presence of PV drivers magically solves everything, actually honour
feature-suspend
and makedata/cant_suspend_reason
critical.Manually tested to not break the following cases (thanks to @dinhngtu):
Test on migratable Windows (shouldn't block suspend) - OK
Test on Windows with a non-migratable device (should block suspend) - OK (VM lacks feature)
Test on migratable Linux with unplug - OK
Test on Linux with a non-migratable device - OK (VM lacks feature)
Test nopv Windows/Linux without unmigratable devices with manual
feature-suspend
xenstore entry forcedWindows BIOS: nopv OK
Linux BIOS: nopv OK
Linux UEFI: nopv OK