Skip to content

Conversation

@psafont
Copy link
Member

@psafont psafont commented Feb 19, 2024

This enables xapi-guard to decouple persistence of TPM contents from the xapi
service being online. That is, when xapi is down. The contents of the TPMs will
be written to disk, and when xapi is back online the contents will be uploaded.

This is needed to protect VMs while xapi is being restarted, usually as part of
an update.

Some properties of the cache:

  • The cache is tried to be bypassed whenever possible, and is only used as
    fallback after a write fails.
  • The cache is handled by a thread that writes to cache and one that reads from
    it. They communicate through a bounded queue.
  • Whenever a TPM content is written to disk, previous versions of it are
    deleted. This helps the reading thread to catch up.
  • When the queue has been filled the writer stops adding elements to the queue,
    and the reader will try to flush the queue, and after it will try to flush
    the cache. After this happens both threads will transition to cache bypass
    operation.

TODO

  • Handle TPM keys / states correctly. Currently, they might be overwritten. The idea is to use subdirectories for each vtpm. slashes need to be handled with care.
  • Refresh cache at startup. Because monotonic times depend on boot, the cache contents might need to be refreshed if the current time is lower than the timestamps present in the cache.

The PR is best-reviewed per-commit

@psafont psafont marked this pull request as draft February 19, 2024 14:13
Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed just the first commit and added some comments/questions. I'll be back tomorrow for the rest :)

Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the rest of the commits now, which insert the new "cache" into the existing xapi-guard server. All seems to make sense as far as I can see.

@psafont psafont force-pushed the private/paus/double-gardon branch 2 times, most recently from 683d93f to 49c460d Compare February 20, 2024 17:14
@psafont psafont marked this pull request as ready for review February 21, 2024 15:15
Copy link
Contributor

@Vincent-lau Vincent-lau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some quick questions on the design, will try to look at the code later

@psafont psafont marked this pull request as draft February 22, 2024 11:04
retry (k + 1)
in
Lwt.try_bind push'
(function Ok () -> Lwt.return_unit | Error e -> on_error e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimising characters 😏
Lwt.try_bind push' (Result.fold ~ok:Lwt.return ~error:on_error) on_error

(fun () -> Lwt_unix.unlink file)
(function
| Unix.(Unix_error (ENOENT, _, _)) ->
Lwt.pause ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply Lwt.return_unit? Are we waiting on something to happen here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows for other tasks to be scheduled in the meantime

Copy link
Contributor

@edwintorok edwintorok Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why do that only on ENOENT? If we do want for other tasks to get scheduled we should do that in a more central place (although unless we're dealing with really high request rates we probably don't need to worry about it: Lwt would've run other tasks when you've scheduled the unlink syscall).

Lwt.pause() might be useful in some kind of exponential backoff/retry code, but this seems a too low-level place to put it.

let@ () = with_lock queue.lock in
match queue.state with
| Disengaged ->
let* () = Lwt.pause () in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the pause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To let other threads to be scheduled

let* () =
Lwt.catch f (function exn ->
D.info "%s failed with %s, retrying..." fname (Printexc.to_string exn) ;
Lwt_unix.sleep 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also another retry function above with 0.1s, should both use the same retry function (that you can later replace with exponential backoff?)

Copy link
Member Author

@psafont psafont Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These retry functions are used in different situation. The exponential backoff is useful for expected errors during runtime, in particular to wait while xapi is offline and the pushes are expected to fail.

This function is a recovery mechanism in case there's a malfunction: code is not expected to fail and reach the recovery mechanism during normal operation. Maybe there's a case for exponential backoff to avoid spamming the logs, or maybe xapi-guard should be restarted to try to recover, it's not clear to know in foresight

@psafont
Copy link
Member Author

psafont commented Feb 23, 2024

With all the todos done, I'm doing another round of tests before removing the draft status

@psafont
Copy link
Member Author

psafont commented Feb 26, 2024

I've rerun the vtpm and the ring3 tests suites, all looking green except the test that expect vtpm writes to fail when xapi restarts: SR 194904 and 194906

Today I'm running manual tests around vm lifecycle while xapi is down (vm restarts get blocked on unplugging, these use xapi calls and can't go forward when xapi is offline)

@psafont psafont force-pushed the private/paus/double-gardon branch from 95dfc48 to 2bd64c7 Compare February 26, 2024 10:30
let get_fs_action root now = function
| Latest ((uuid, timestamp, key), from) as latest ->
if Mtime.is_later ~than:now timestamp then
let timestamp = now in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this means that everything in the buffer existing larger timestamps will be capped to the current timestamp, without preserving their previous timestamp ordering. I assume this ordering is not important any more, is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one 'Latest' per folder. When the rest of the files are deleted, there's no ordering to be kept among them.

@psafont psafont marked this pull request as ready for review February 26, 2024 13:59
@psafont psafont force-pushed the private/paus/double-gardon branch from 35f58ee to 379e2c5 Compare February 26, 2024 15:52
This cache acts as a buffer and stores the requests temporarily until xapi can be contacted again.
This situation usually happens when xapi is being restarted as part of an update.
SWTPM, the vTPM daemon, reads the contents of the TPM from xapi-guard on startup, suspend, and resume.
During normal operation SWTPM does not send read requests from xapi-guard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we added some assertions to check that this is really the case. It may match observed behaviour (i.e. swtpm has a cache of its own), and from a quick look at the source code it always tries to use the local cache, and only if that is empty it falls back to reading from the backend.

However this is only true for each TPM state individually (out of the 3), so you could still see mixed writes and reads between different TPM states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/xapi-project/xen-api/pull/5460/files#r1504263947 I think xapi-guard will create this scenario itself, it always uses read-modify-write and can lead to lost states if more than 1 of the 3 vTPM states is updated while xapi is down!
So we need to handle the case where we've got mixed reads and writes (and add a test for this scenario).

@psafont
Copy link
Member Author

psafont commented Feb 27, 2024

Do we have tests for the startup recovery code?

Not automated ones, I've run it manually with the current test binary to test temporary files, invalid files, and future files

$ dune exec ocaml/xapi-guard/test/cache_test.exe
cache_test.exe: [INFO] Xapi_guard__Disk_cache.Setup.retime_cache_contents: Deleting '/var/lib/xapi-guard/Swtpm/321e3a5e-b32b-4b26-be1c-c20c902c3cff/0/invalid'
cache_test.exe: [INFO] Xapi_guard__Disk_cache.Setup.retime_cache_contents: Moving '/var/lib/xapi-guard/Swtpm/321e3a5e-b32b-4b26-be1c-c20c902c3cff/0/30727492706339' to '/var/lib/xapi-guard/Swtpm/321e3a5e-b32b-4b26-be1c-c20c902c3cff/0/4617725464136'
cache_test.exe: [WARNING] Xapi_guard__Disk_cache.Setup.retime_cache_contents: Found temporary file, ignoring '/var/lib/xapi-guard/Swtpm/321e3a5e-b32b-4b26-be1c-c20c902c3cff/0/20727492706339.pre'
cache_test.exe: [WARNING] Xapi_guard__Disk_cache.Setup.retime_cache_contents: Found temporary file, ignoring '/var/lib/xapi-guard/Swtpm/98d8f499-ed85-49b3-a6c5-268f64efa627/1/2770685160101.pre

@psafont psafont force-pushed the private/paus/double-gardon branch from 8073b33 to 7ed500f Compare February 27, 2024 16:09
@psafont
Copy link
Member Author

psafont commented Feb 28, 2024

Added a temoporary logline to easily verify that the fistpoint works as expected, on top of seeing that requests error out when xapi is offline and the fistpoint is enabled:

Feb 28 10:02:46 lcy2-dt132 varstored-guard: [error||0 ||Xapi_guard__Disk_cache] Fistpoint disable_xapi_guard_cache present
Feb 28 10:02:46 lcy2-dt132 varstored-guard: [error||0 ||backtrace] Raised Unix.Unix_error(Unix.ECONNREFUSED, "connect", "")
Feb 28 10:02:46 lcy2-dt132 varstored-guard: [error||0 ||backtrace] 1/1 varstored-guard Raised at file (Thread 0 has no backtrace table. Was with_backtraces called?, line 0
Feb 28 10:02:46 lcy2-dt132 varstored-guard: [error||0 ||backtrace]
Feb 28 10:02:46 lcy2-dt132 varstored-guard: [error||0 ||cohttp.lwt.server] Error handling ((headers\x0A  ((Host xapi-depriv-socket) (Accept */*)\x0A   (Content-Type application/octet-stream) (Content-Length 11100)))\x0A (meth PUT) (scheme ()) (resource /tpm2-00.permall) (version HTTP_1_1)\x0A (encoding (Fixed 11100))): Unix.Unix_error(Unix.ECONNREFUSED, "connect", "")

This will allow to handle serialization of key as well as states in
server_interface and the write cache

Signed-off-by: Pau Ruiz Safont <[email protected]>
This enables xapi-guard to decouple persistence of TPM contents from the xapi
service being online. That is, when xapi is down. The contents of the TPMs will
be written to disk, and when xapi is back online the contents will be uploaded.

This is needed to protect VMs while xapi is being restarted, usually as part of
an update.

Some properties of the cache:
- The cache is tried to be bypassed whenever possible, and is only used as
  fallback after a write fails.
- The cache is handled by a thread that writes to cache and one that reads from
  it. They communicate through a bounded queue.
- Whenever a TPM content is written to disk, previous versions of it are
  deleted. This helps the reading thread to catch up.
- When the queue has been filled the writer stops adding elements to the queue,
  and the reader will try to flush the queue, and after it will try to flush
  the cache. After this happens both threads will transition to cache bypass
  operation.

Signed-off-by: Pau Ruiz Safont <[email protected]>
This allows to pass the UUID directly to the on-disk cache that will be
introduced

Signed-off-by: Pau Ruiz Safont <[email protected]>
This allows to use the persistence function from outside the callback, which
will be useful to thread into the on-disk cache

Signed-off-by: Pau Ruiz Safont <[email protected]>
Now the process creates a thread to read from disk and push vtpm events to xapi
at its own pace, and integrates the disk-writing part into the callback of the
deprivileged sockets.

Special consideration was taken for the resume, when the deprivileged sockets
and the write-to-cache function need to be integrated in a different way from
the codepath that creates the sockets from the message-switch server.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Because timestamps depend on a monotonic timestamp that depends on boot, files
need to be renamed to ensure future writes have higher timestamps to be
considered newer and be uploaded to xapi.

On top of this, allows to report about remnant temporary files, delete invalid
files and remove empty directories.

Signed-off-by: Pau Ruiz Safont <[email protected]>
This is needed to a be able to disable the disk cache completely, maintaining
previous behaviour if needed.

Signed-off-by: Pau Ruiz Safont <[email protected]>
This is done through the fist point.

Xapi_fist is not used directly because it needs to a new opam package, creating
a lot of churn which is currently unwanted.

Signed-off-by: Pau Ruiz Safont <[email protected]>
@psafont psafont force-pushed the private/paus/double-gardon branch from 7ed500f to b80357e Compare February 28, 2024 10:16
Now all domains' vtpm read requests go through the cache. The read function is
the same as before.

There is no change in behaviour

Signed-off-by: Pau Ruiz Safont <[email protected]>
@psafont psafont force-pushed the private/paus/double-gardon branch from 196c88c to b80357e Compare February 28, 2024 16:46
For domains requesting the TPM's contents, the xapi-guards returns the contents
in the cache, if they are available from in-flight requests. It falls back to
xapi if that couldn't be possible.

The cache doesn't try to provide any availability for reads, like it does for
writes. This means that if swtpm issues a read request while xapi is offline,
the request will fail, as it happened before this change.

Signed-off-by: Pau Ruiz Safont <[email protected]>
@robhoes
Copy link
Member

robhoes commented Feb 28, 2024

The first new commit is just plumbing to get a read function inside the cache module. My only comment would be what Edwin said as well, where the direct-push function may want to go through the cache as well, but I'm not sure this would actually fit with the types...

The second commit updates the read function in the cache to first check if the thing to be read is in the filesystem cache, and if not just does the direct xapi call. The whole queueing system is not used here and seems irrelevant, as this is just for flushing the cache/buffer by the "watcher" thread. Correct?

Previously, they were sorted by string order, which in rare cases might lead to
erroneous ordering

Signed-off-by: Pau Ruiz Safont <[email protected]>
Copy link
Contributor

@edwintorok edwintorok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably OK for now. We need to document that we don't support HA with vTPM (at least for now), and we can look at the locking again in the meantime.

@psafont
Copy link
Member Author

psafont commented Feb 29, 2024

My only comment would be what Edwin said as well, where the direct-push function may want to go through the cache as well, but I'm not sure this would actually fit with the types...

Direct push is only triggered by the cache code, actually :)

The whole queueing system is not used here and seems irrelevant, as this is just for flushing the cache/buffer by the "watcher" thread. Correct?

That's correct, yes

@robhoes robhoes merged commit 1ae1dd8 into xapi-project:master Feb 29, 2024
@psafont psafont deleted the private/paus/double-gardon branch February 29, 2024 12:15
@github-actions

This comment was marked as spam.

liulinC pushed a commit to liulinC/xen-api that referenced this pull request Mar 7, 2024
…-gardon

CA-383867: Add local disk cache library for xapi guard
liulinC pushed a commit to liulinC/xen-api that referenced this pull request Mar 7, 2024
…-gardon

CA-383867: Add local disk cache library for xapi guard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants