Skip to content

Conversation

@last-genius
Copy link
Contributor

With bab83d9, host evacuation was parallelized by grouping VMs
into batches, and starting a new batch once the previous one has finished.
This means that a single slow VM can potentially slow down the whole evacuation.

Add a new Tasks.wait_for_all_with_callback function that will
invoke a callback every time one of the tasks is
deemed non-pending. This will allow its users to:

  1. track the progress of tasks within the submitted batch
  2. schedule new tasks to replace the completed ones

Use the new Tasks.wait_for_all_with_callback in xapi_host to
schedule a new migration as soon as any of the previous ones have
finished, thus maintaining a constant flow of n migrations.

Additionally expose the evacuate-batch-size parameter in the CLI, this
was missed when it was originally added with the CLI setting it to 0
(pick the default) all the time.

===

Manually tested multiple times, confirmed to not break anything and to actually maintain a constant flow of migrations. This should greatly speed up host evacuations when there is a combination of bigger and smaller VMs (in terms of memory/disk, or VMs with some other reason for slow migration) on the host

Add a new function that will invoke a callback every time one of the tasks is
deemed non-pending. This will allow its users to:

1) track the progress of tasks within the submitted batch
2) schedule new tasks to replace the completed ones

Modify wait_for_all_inner so that it adds the tasks returned from the callback
to its internal set on every new task completion.

Signed-off-by: Andrii Sultanov <[email protected]>
With bab83d9, host evacuation was parallelized
by grouping VMs into batches, and starting a new batch once the previous one
has finished. This means that a single slow VM can potentially slow down the
whole evacuation.

Instead use Tasks.wait_for_all_with_callback to schedule a new migration as
soon as any of the previous ones have finished, thus maintaining a constant
flow of n migrations.

Signed-off-by: Andrii Sultanov <[email protected]>
@lindig
Copy link
Contributor

lindig commented Jun 10, 2025

This is an exciting direction. I implemented the batched migration and the reason it was batched was simplicity; a more pool-like behaviour clearly can improve performance.

@edwintorok
Copy link
Contributor

We also have a TaskChains module in xapi_ha_vm_failover, although that one doesn't seem to limit the number of running tasks.
Eventually it'd be good if all these mechanisms were unified, but improving Host.evacuate is a good first step.

@last-genius last-genius added this pull request to the merge queue Jun 10, 2025
Merged via the queue into xapi-project:master with commit 817b434 Jun 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants