Skip to content

Conversation

@last-genius
Copy link
Contributor

It's an in-house joke that most of xapi's errors are "expected" (either because they're handled by the caller after being logged or because they are frequent and benign) and knowing which ones are unusual and actually critical takes a lot of learning. Try to remove at least some of the most obvious ones from the toolstack startup/VM resume, either removing the logs completely or downgrading it to debug when appropriate.

Manually tested in the appropriate scenarios.

@last-genius
Copy link
Contributor Author

Prompted by a round of manual testing at Vates, where a lot of new starters installed new ISOs and were met with a barrage of errors they couldn't possibly know are critical or not :)

Will try to follow up with a few more cases, but the general motivation is to keep at least the first known-good installation error-free in the logs...

in
let conf_files = Array.to_list (Sys.readdir conf_dir) in
let conf_files =
try Array.to_list (Sys.readdir conf_dir) with Sys_error _ -> []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to explicitly check for the existence of the directory first, unfortunately Sys.* functions aren't very good at error handling because you can't distinguish between a failure due to the directory not being present and a failure that you don't have permission to read it, or a genuine I/O error.

@last-genius last-genius force-pushed the asv/improve-logging branch from da6d648 to aec8803 Compare June 10, 2025 15:50
on every toolstack restart on hosts without Nvidia GPUs, xapi complains about a
non-existent directory:

xapi: [error||0 |dbsync (update_env) R:733fc2551767|xapi_vgpu_type]
Failed to create NVidia compat config_file: Sys_error("/usr/share/nvidia/vgx: No such file or directory")

Handle the directory's absence without propagating the error.

Signed-off-by: Andrii Sultanov <[email protected]>
…_line

Otherwise this quite frequently logs something like:

```
xcp-networkd: [error||22 |dbsync (update_env)|network_utils]
Error in read one line of file: /sys/class/net/eth0/device/sriov_totalvfs,
exception Unix.Unix_error(Unix.ENOENT, "open", "/sys/class/net/eth0/device/sriov_totalvfs")
Backtrace ...
```

Signed-off-by: Andrii Sultanov <[email protected]>
non_debug_receive will dump an error after reading the last bits of the header,
which is expected and handled by the caller appropriately:

```
xenopsd-xc: [error||67 |Async.VM.resume R:beac7be348f1|xenguesthelper]
Memory F 6019464 KiB S 0 KiB T 8183 MiB <--- dumping error

xenopsd-xc: [debug||67 |Async.VM.resume R:beac7be348f1|mig64]
Finished emu-manager result processing <---- End_of_file expected and handled
```

Don't pollute the logs and instead just log the same info with 'debug' when the
error is End_of_file.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the asv/improve-logging branch from aec8803 to 3f0e977 Compare June 11, 2025 07:42
@last-genius last-genius added this pull request to the merge queue Jun 11, 2025
Merged via the queue into xapi-project:master with commit 743565c Jun 11, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants