Skip to content

Conversation

@BengangY
Copy link
Contributor

@BengangY BengangY commented May 20, 2025

When the customers open "Migrate VM Wizard" on XenCenter, XenCenter will call
VM.assert_can_migrate to check each host in each pool connected to XenCenter
if the VM can be migrated to it. The API VM.assert_can_migrate then calls
VM.export_metadata. VM.export_metadata will lock VM. During this time, other
VM.export_metadata requests will fail as they can't get VM lock.

The solution is to add retry when failing to lock VM.

Copy link
Contributor

@Vincent-lau Vincent-lau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said export_metadata does not support concurrency, so what would happen if it was called concurrently? Reporting a failure? If it was a genuine failure do we still want to retry?

When the customers open "Migrate VM Wizard" on XenCenter, XenCenter will call
`VM.assert_can_migrate` to check each host in each pool connected to XenCenter
if the VM can be migrated to it. The API `VM.assert_can_migrate` then calls
`VM.export_metadata`. `VM.export_metadata` will lock VM. During this time, other
`VM.export_metadata` requests will fail as they can't get VM lock.

The solution is to add retry when failing to lock VM.

Signed-off-by: Bengang Yuan <[email protected]>
@BengangY BengangY force-pushed the private/bengangy/CA-411319 branch from fd90246 to fadf706 Compare May 22, 2025 07:06
@BengangY BengangY marked this pull request as ready for review May 22, 2025 07:06
@BengangY
Copy link
Contributor Author

As you said export_metadata does not support concurrency, so what would happen if it was called concurrently? Reporting a failure? If it was a genuine failure do we still want to retry?

I said concurrently from XC view. It's misleading. I updated the description.

Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GabrielBuica also noticed that we're missing locking around quite a few allowed operation updates, which can lead to more race conditions.

Eventually we'll probably need a more reliable solution, where you can't misuse the allowed ops APIs, for now this fixes a particular bug that has been observed.

Copy link
Contributor

@Vincent-lau Vincent-lau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still sounds a bit strange to me... Why would export_metadata fail if it cannot acqurie the VM lock? Shouldn't it just block waiting for the VM lock?

Also I am a bit confused which line of code is acquiring the lock here? Is it add_to_current_operations?

@BengangY
Copy link
Contributor Author

BengangY commented May 22, 2025

This still sounds a bit strange to me... Why would export_metadata fail if it cannot acqurie the VM lock? Shouldn't it just block waiting for the VM lock?

lock_vm would fail if another operation is in progress (in assert_operation_valid).

Also I am a bit confused which line of code is acquiring the lock here? Is it add_to_current_operations?

Yes.

@robhoes robhoes added this pull request to the merge queue May 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 22, 2025
@robhoes
Copy link
Member

robhoes commented May 22, 2025

There is something wrong with the static analysis test?

@psafont psafont added this pull request to the merge queue May 22, 2025
@psafont
Copy link
Member

psafont commented May 22, 2025

Error: ref 'refs/heads/gh-readonly-queue/master/pr-6476-4f2f18582cee0ed39f87af8bc5e70057b33545c1' not found in this repository - https://docs.github.com/rest

@lindig
Copy link
Contributor

lindig commented May 22, 2025

I recently saw a failing CI step as well. Re-ran that CI test and it passed.

Merged via the queue into xapi-project:master with commit d27a118 May 22, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants