Investigate hiccups that make preloading slow

### The problem you're addressing (if any)

This is a continuation of a discussion on #1512, opened as a separate issue for the contents to not be lost behind hudreds of "load more" buttons meant as a feature request to  to improve preload time.

I [wrote](https://github.com/QubesOS/qubes-issues/issues/1512#issuecomment-3275002140):

> The disposable stages that make the usage a bit long is the following:
> 
> Preload:
> - **start** (offloaded with preloaded disposable)
> - **pause** if preloaded (offlaoded with preloaded disposable)
> 
> Using:
> - **unpause** if preloaded (didn't time this)
> - **preloads** another disposable (this event is fired after a preload is marked as used, it uses `asyncio.ensure_future()`, it doesn't impact retrieving the preloaded disposable because of this, but it impacts on other operations, such as the execution and/or cleanup as these are coroutines)
> - **execution**
> - **cleanup**
> 
> ```
> user@dom0:~$ time qvm-run -p --no-gui --no-color-stder --no-color-output --dispvm -- 'echo $HOSTNAME'
> Starting: 01:32.145448
> Starting run_command_single: 01:32.145906
> Creating proc: 01:32.145941
> disp3991
> Finished proc: 01:32.908777
> Before cleanup: 01:32.909267
> After cleanup: 01:34.905010
> 
> real    0m2.902s
> user    0m0.094s
> sys     0m0.075s
> ```
> 
> If I add a `3` seconds delay to `domain-preload-dispvm-start` in `DispVM.use_preload()`:
> 
> ```
> user@dom0:~$ time qvm-run -p --no-gui --no-color-stder --no-color-output --dispvm -- 'echo $HOSTNAME'
> Starting: 03:23.699683
> Starting run_command_single: 03:23.700117
> Creating proc: 03:23.700142
> disp4272
> Finished proc: 03:24.456369
> Before cleanup: 03:24.456697
> After cleanup: 03:25.677925
> 
> real    0m2.117s
> user    0m0.097s
> sys     0m0.064s
> ```
> 
> The major issues are:
> - `1` second without load: `cleanup`, which on `qubesadmin` just calls `kill()` and the server is responsible for the rest.
> - `1` second without load: `preloading`, because the `DispVM.use_preload()` is called right
> 
> But I am not sure if it is useful to add a delay by default. Some things do not benefit from adding a delay to preload, such as commands run in the background and the user is not aware it is still running, such as clicking on app menu or applications from qui-domains. If you add a `1-3` seconds delay and have workflows with multiple iterations, it actually worsens the time because some iterations will take longer as the preload is not ready yet.
> 
> This discussion is related to the preloading benchmark and visualization. Therefore, the current average that calls right after a call finish is useful for workflows that require a lot of preloaded disposables, but it doesn't reflect the average from a "cold" request, which is fast as there is no load on the system compared to a "warm/hot" start, where there are multiple preloads being created, used and cleaned simultaneously. The value that best reflect how fast a preload can be used is a single metric, the execution and total from the first iteration.
> 
> If you know you will only need a certain amount of preloaded disposables and the time to use each one is long (at least longer than the time to preload a disposable), there is value in a setting: `preload-dispvm-delay: int`. Suppose there is a workflow that requires `4` qubes and takes `15`s or longer (suppose also that the time to preload is `8`s), the fastest way to execute and cleanup each qube would be to use concurrency and a delay of `3`s (I guesstimated these numbers). There is a problem, this is such a specialized workflow that I am not sure it is widely applicable and easily to write a documentation for, it is a possible future improvement though. Let's see some workflows:
> 
> - You have a workflow with a limited number of operations, such as opening mail attachments, that for some people, may be 4 attachements. If you can preload `4` and with the `preload-dispvm-delay` to `3`s, you'd get the fastest usage/display of the 4 attachments.
> - You have a workflow with a lot of operations, converting `50` PDFs to trusted PDFs, preloading `50` qubes is not really "nice" to your system, but preloading `4` with a `preload-dispvm-delay` of `0` would bring the best results.
> 
> I hope I made it easier to understand with these examples.

@marmarek [wrote](https://github.com/QubesOS/qubes-issues/issues/1512#issuecomment-3275168540):
 
> Generally, the main reason for the preloaded feature is to lower the time between user action that requires disposable and getting that action done. If that's a PDF conversion, it's mostly full call time (but not really cleanup). But if that's opening a file in a disposable, it's just the wait time for the application to open. Bulk processing may be improved by preloaded a bit, but it's not the main focus. It shouldn't get worse, though.
> 
> It would be useful to measure time until user application (fully) starts, not until it completes (and dom0 gets that info). The latter may include some clean steps that are less relevant for user experience really - especially, I hope preloading next disposable in the background affect application start only in a small way, but may affect cleanup more. What if you use `date` as the command and compare timestamp this way? This assumes preloaded disposable has accurate clock, but I hope this is the case...
> If preloading next one indeed affects the current startup, then having default non-zero delay IMO makes sense.
> 
> Check also if calling from a VM makes any difference in startup time - you'll likely get less detailed timing info, but I think overall time should be similar, right? Maybe `qvm-run-vm --dispvm date` ?
> 
> > You have a workflow with a lot of operations, converting `50` PDFs to trusted PDFs, preloading `50` qubes is not really "nice" to your system, but preloading `4` with a `preload-dispvm-delay` of `0` would bring the best results.
> 
> For the use case like this, I'm not convinced preloading will help that much... it may help with the first PDF (latency to start conversion), but you'll run out of preloaded pretty soon. And if the conversion is fast, preloading will not speed it up much, but may slow down the actual conversion happening in the meantime (depending on available CPU cores).
> Maybe it's worth doing a test like this and see how actually it behaves? (but don't enable it in CI, too long to execute every time)
> 
> What happens if `preload-dispvm-delay` is set to some time, and a dispvm is requested earlier? Does it wait for that time to expire, or start a non-preloaded before that?

### The solution you'd like

Decrease total preload usage time.

### The value to a user and who that user might be

Users of preloaded disposables will have a faster usage.

### Completion criteria checklist

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Investigate hiccups that make preloading slow #10230

The problem you're addressing (if any)

The solution you'd like

The value to a user and who that user might be

Completion criteria checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Investigate hiccups that make preloading slow #10230

Description

The problem you're addressing (if any)

The solution you'd like

The value to a user and who that user might be

Completion criteria checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions