Skip to content

Conversation

lindig
Copy link
Contributor

@lindig lindig commented Jul 11, 2025

Change in semantics: the API call now waits for the VM to shut down (by observing the xenstore tree being removed). This is guarded by a user-provided timeout.

Christian Lindig added 3 commits July 11, 2025 14:14
The timeout (in seconds) we wait for the current domain to shut down
before we return. This is supplied by the user; for the XE CLI we can
provide a default.

Signed-off-by: Christian Lindig <[email protected]>
An easy way to wait for the shutdown of a domain is to watch its
xenstore tree to disappear. Wait for this after triggering sysprep via
the gueat agent.

Signed-off-by: Christian Lindig <[email protected]>
Explain the timeout parameter.

Signed-off-by: Christian Lindig <[email protected]>
@lindig lindig requested a review from edwintorok July 11, 2025 13:57
) ;
"running"
with Watch.Timeout _ -> xs.Xs.read (control // "action")
debug "%s: sysprep is runnung; waiting for shutdown" __FUNCTION__ ;
Copy link
Contributor

@edwintorok edwintorok Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could also wait for the key "/control/sysprep/action" to disappear first: I think then we know sysprep stopped running, so we can eject the CD.
The VM will then reboot soon after that. That key will also disappear on a reboot.

We could have different timeouts for the two: sysprep itself can take a long time to run (so we could have a timeout of a few hours perhaps, it depends how much work unattend.xml has given it).
OTOH once sysprep finished we expect the reboot to happen quickly, so we can have a shorter timeout on the order of minutes.

This would be useful for debugging how much progress the guest makes (we could set the progress field of the task at various points, though for now debug messages would probably be sufficient).

Copy link
Contributor

@edwintorok edwintorok Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity one timeout is probably sufficient: we let sysprep run for as long as it wishes (we don't have a way to cancel it anyway), i.e. the first Watch.wait_for..key_to_disappear wouldn't have a timeout.

The 2nd timeout would be as it is here; a timeout for the reboot.

Then the 'wait for sysprep to finish running' can be nicely added as a separate commit on top of this one, something like:

Watch.(wait_for ~xs  (key_to_disappear (control // "action")))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to run anything without a timeout because we want to fail this API call to signal that something is wrong and it is more difficult to cancel an API call from the outside and make sure we are not leaking any resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we could wait for either task cancelation or the xenstore key to disappear, but I don't think we currently have a mechanism in XAPI that could easily support that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have support to wait for all/any xenstore events. I don't see the value of waiting for "running" to disappear if we ultimately want to wait for the domain to disappear which implies that "running" disappears as well.

Monitor the VM's reaction more carefully: wait for sysprep to terminate.

Signed-off-by: Christian Lindig <[email protected]>
@lindig lindig requested a review from robhoes July 11, 2025 15:15
__FUNCTION__ ;
Watch.(wait_for ~xs ~timeout (key_to_disappear (control // "action"))) ;
debug "%s sysprep is finished" __FUNCTION__ ;
Watch.(wait_for ~xs ~timeout (key_to_disappear domain)) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the xapi sees the last change of disappear directly without seeing the changes above? In other words, can the xenstore ensure the individual consecutive changes will never be combined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rule out sysprep terminating and the domain disappearing together. But that is an argument based on how the Windows domain works. not how xapi and xenstore work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the domain disappears then all keys underneath it disappear too, so the 'action' will always disappear on shutdown/reboot.

@lindig lindig added this pull request to the merge queue Jul 14, 2025
Merged via the queue into xapi-project:master with commit fd6d4f8 Jul 14, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants