-
Notifications
You must be signed in to change notification settings - Fork 293
CA-411319: Concurrent VM.assert_can_migrate failure
#6476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-411319: Concurrent VM.assert_can_migrate failure
#6476
Conversation
Vincent-lau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said export_metadata does not support concurrency, so what would happen if it was called concurrently? Reporting a failure? If it was a genuine failure do we still want to retry?
When the customers open "Migrate VM Wizard" on XenCenter, XenCenter will call `VM.assert_can_migrate` to check each host in each pool connected to XenCenter if the VM can be migrated to it. The API `VM.assert_can_migrate` then calls `VM.export_metadata`. `VM.export_metadata` will lock VM. During this time, other `VM.export_metadata` requests will fail as they can't get VM lock. The solution is to add retry when failing to lock VM. Signed-off-by: Bengang Yuan <[email protected]>
fd90246 to
fadf706
Compare
I said |
edwintorok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GabrielBuica also noticed that we're missing locking around quite a few allowed operation updates, which can lead to more race conditions.
Eventually we'll probably need a more reliable solution, where you can't misuse the allowed ops APIs, for now this fixes a particular bug that has been observed.
Vincent-lau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still sounds a bit strange to me... Why would export_metadata fail if it cannot acqurie the VM lock? Shouldn't it just block waiting for the VM lock?
Also I am a bit confused which line of code is acquiring the lock here? Is it add_to_current_operations?
Yes. |
|
There is something wrong with the static analysis test? |
|
|
I recently saw a failing CI step as well. Re-ran that CI test and it passed. |
When the customers open "Migrate VM Wizard" on XenCenter, XenCenter will call
VM.assert_can_migrateto check each host in each pool connected to XenCenterif the VM can be migrated to it. The API
VM.assert_can_migratethen callsVM.export_metadata.VM.export_metadatawill lock VM. During this time, otherVM.export_metadatarequests will fail as they can't get VM lock.The solution is to add retry when failing to lock VM.