-
Notifications
You must be signed in to change notification settings - Fork 294
CA-383867: Add local disk cache library for xapi guard #5460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-383867: Add local disk cache library for xapi guard #5460
Conversation
robhoes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed just the first commit and added some comments/questions. I'll be back tomorrow for the rest :)
robhoes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed the rest of the commits now, which insert the new "cache" into the existing xapi-guard server. All seems to make sense as far as I can see.
683d93f to
49c460d
Compare
Vincent-lau
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some quick questions on the design, will try to look at the code later
| retry (k + 1) | ||
| in | ||
| Lwt.try_bind push' | ||
| (function Ok () -> Lwt.return_unit | Error e -> on_error e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimising characters 😏
Lwt.try_bind push' (Result.fold ~ok:Lwt.return ~error:on_error) on_error
| (fun () -> Lwt_unix.unlink file) | ||
| (function | ||
| | Unix.(Unix_error (ENOENT, _, _)) -> | ||
| Lwt.pause () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply Lwt.return_unit? Are we waiting on something to happen here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows for other tasks to be scheduled in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why do that only on ENOENT? If we do want for other tasks to get scheduled we should do that in a more central place (although unless we're dealing with really high request rates we probably don't need to worry about it: Lwt would've run other tasks when you've scheduled the unlink syscall).
Lwt.pause() might be useful in some kind of exponential backoff/retry code, but this seems a too low-level place to put it.
| let@ () = with_lock queue.lock in | ||
| match queue.state with | ||
| | Disengaged -> | ||
| let* () = Lwt.pause () in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the pause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To let other threads to be scheduled
| let* () = | ||
| Lwt.catch f (function exn -> | ||
| D.info "%s failed with %s, retrying..." fname (Printexc.to_string exn) ; | ||
| Lwt_unix.sleep 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also another retry function above with 0.1s, should both use the same retry function (that you can later replace with exponential backoff?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These retry functions are used in different situation. The exponential backoff is useful for expected errors during runtime, in particular to wait while xapi is offline and the pushes are expected to fail.
This function is a recovery mechanism in case there's a malfunction: code is not expected to fail and reach the recovery mechanism during normal operation. Maybe there's a case for exponential backoff to avoid spamming the logs, or maybe xapi-guard should be restarted to try to recover, it's not clear to know in foresight
|
With all the todos done, I'm doing another round of tests before removing the draft status |
|
I've rerun the vtpm and the ring3 tests suites, all looking green except the test that expect vtpm writes to fail when xapi restarts: SR 194904 and 194906 Today I'm running manual tests around vm lifecycle while xapi is down (vm restarts get blocked on unplugging, these use xapi calls and can't go forward when xapi is offline) |
95dfc48 to
2bd64c7
Compare
| let get_fs_action root now = function | ||
| | Latest ((uuid, timestamp, key), from) as latest -> | ||
| if Mtime.is_later ~than:now timestamp then | ||
| let timestamp = now in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this means that everything in the buffer existing larger timestamps will be capped to the current timestamp, without preserving their previous timestamp ordering. I assume this ordering is not important any more, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one 'Latest' per folder. When the rest of the files are deleted, there's no ordering to be kept among them.
35f58ee to
379e2c5
Compare
| This cache acts as a buffer and stores the requests temporarily until xapi can be contacted again. | ||
| This situation usually happens when xapi is being restarted as part of an update. | ||
| SWTPM, the vTPM daemon, reads the contents of the TPM from xapi-guard on startup, suspend, and resume. | ||
| During normal operation SWTPM does not send read requests from xapi-guard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if we added some assertions to check that this is really the case. It may match observed behaviour (i.e. swtpm has a cache of its own), and from a quick look at the source code it always tries to use the local cache, and only if that is empty it falls back to reading from the backend.
However this is only true for each TPM state individually (out of the 3), so you could still see mixed writes and reads between different TPM states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://github.com/xapi-project/xen-api/pull/5460/files#r1504263947 I think xapi-guard will create this scenario itself, it always uses read-modify-write and can lead to lost states if more than 1 of the 3 vTPM states is updated while xapi is down!
So we need to handle the case where we've got mixed reads and writes (and add a test for this scenario).
Not automated ones, I've run it manually with the current test binary to test temporary files, invalid files, and |
8073b33 to
7ed500f
Compare
|
Added a temoporary logline to easily verify that the fistpoint works as expected, on top of seeing that requests error out when xapi is offline and the fistpoint is enabled: |
This will allow to handle serialization of key as well as states in server_interface and the write cache Signed-off-by: Pau Ruiz Safont <[email protected]>
This enables xapi-guard to decouple persistence of TPM contents from the xapi service being online. That is, when xapi is down. The contents of the TPMs will be written to disk, and when xapi is back online the contents will be uploaded. This is needed to protect VMs while xapi is being restarted, usually as part of an update. Some properties of the cache: - The cache is tried to be bypassed whenever possible, and is only used as fallback after a write fails. - The cache is handled by a thread that writes to cache and one that reads from it. They communicate through a bounded queue. - Whenever a TPM content is written to disk, previous versions of it are deleted. This helps the reading thread to catch up. - When the queue has been filled the writer stops adding elements to the queue, and the reader will try to flush the queue, and after it will try to flush the cache. After this happens both threads will transition to cache bypass operation. Signed-off-by: Pau Ruiz Safont <[email protected]>
This allows to pass the UUID directly to the on-disk cache that will be introduced Signed-off-by: Pau Ruiz Safont <[email protected]>
This allows to use the persistence function from outside the callback, which will be useful to thread into the on-disk cache Signed-off-by: Pau Ruiz Safont <[email protected]>
Now the process creates a thread to read from disk and push vtpm events to xapi at its own pace, and integrates the disk-writing part into the callback of the deprivileged sockets. Special consideration was taken for the resume, when the deprivileged sockets and the write-to-cache function need to be integrated in a different way from the codepath that creates the sockets from the message-switch server. Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Because timestamps depend on a monotonic timestamp that depends on boot, files need to be renamed to ensure future writes have higher timestamps to be considered newer and be uploaded to xapi. On top of this, allows to report about remnant temporary files, delete invalid files and remove empty directories. Signed-off-by: Pau Ruiz Safont <[email protected]>
This is needed to a be able to disable the disk cache completely, maintaining previous behaviour if needed. Signed-off-by: Pau Ruiz Safont <[email protected]>
This is done through the fist point. Xapi_fist is not used directly because it needs to a new opam package, creating a lot of churn which is currently unwanted. Signed-off-by: Pau Ruiz Safont <[email protected]>
7ed500f to
b80357e
Compare
Now all domains' vtpm read requests go through the cache. The read function is the same as before. There is no change in behaviour Signed-off-by: Pau Ruiz Safont <[email protected]>
196c88c to
b80357e
Compare
For domains requesting the TPM's contents, the xapi-guards returns the contents in the cache, if they are available from in-flight requests. It falls back to xapi if that couldn't be possible. The cache doesn't try to provide any availability for reads, like it does for writes. This means that if swtpm issues a read request while xapi is offline, the request will fail, as it happened before this change. Signed-off-by: Pau Ruiz Safont <[email protected]>
|
The first new commit is just plumbing to get a read function inside the cache module. My only comment would be what Edwin said as well, where the direct-push function may want to go through the cache as well, but I'm not sure this would actually fit with the types... The second commit updates the read function in the cache to first check if the thing to be read is in the filesystem cache, and if not just does the direct xapi call. The whole queueing system is not used here and seems irrelevant, as this is just for flushing the cache/buffer by the "watcher" thread. Correct? |
Previously, they were sorted by string order, which in rare cases might lead to erroneous ordering Signed-off-by: Pau Ruiz Safont <[email protected]>
edwintorok
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably OK for now. We need to document that we don't support HA with vTPM (at least for now), and we can look at the locking again in the meantime.
Direct push is only triggered by the cache code, actually :)
That's correct, yes |
This comment was marked as spam.
This comment was marked as spam.
…-gardon CA-383867: Add local disk cache library for xapi guard
…-gardon CA-383867: Add local disk cache library for xapi guard
This enables xapi-guard to decouple persistence of TPM contents from the xapi
service being online. That is, when xapi is down. The contents of the TPMs will
be written to disk, and when xapi is back online the contents will be uploaded.
This is needed to protect VMs while xapi is being restarted, usually as part of
an update.
Some properties of the cache:
fallback after a write fails.
it. They communicate through a bounded queue.
deleted. This helps the reading thread to catch up.
and the reader will try to flush the queue, and after it will try to flush
the cache. After this happens both threads will transition to cache bypass
operation.
TODO
The PR is best-reviewed per-commit