Skip to content

Conversation

last-genius
Copy link
Contributor

@last-genius last-genius commented Sep 5, 2025

Currently Host.disable only disables the host for the current xapi's runtime - it will be re-enabled on toolstack restarts and host reboots (if other conditions are met). This greatly complicates manual maintenance in an HA cluster when reboots are needed, since VMs will start moving to the host as soon as it's enabled.

To allow for keeping a host persistently disabled across toolstack restarts and host reboots, add a new localdb flag host_auto_enable (set through the parameter on Host.disable). This coexists with the internal flag of host_disabled_until_reboot, which is only set on host poweroff internally and cannot be controlled by the user directly.

With host_auto_enable set to false, xapi will not re-enable a host on its own no matter what: toolstack restarts, host reboots, calls to consider_enabling_host (triggered by PBD plugs etc.) will have no effect. Only a manual call to Host.enable will re-enable the host.

Expose the new parameter in the CLI. Also fix up the comment in xapi_host_helpers.mli.

I've verified the new functionality manually.

@lindig
Copy link
Contributor

lindig commented Sep 5, 2025

Is it a good idea to split this across several variables rather than a reboot policy variable that takes an enum? I don't have a strong opinion but feel a single value is more easily understood by API clients.

@last-genius last-genius force-pushed the asv/host-disable-persistent branch from a3ea514 to 313c997 Compare September 8, 2025 07:10
@last-genius
Copy link
Contributor Author

Missed clearing host_disabled_across_reboot in Host.enable, fixed now. I think I'll rework this following @minglumlu's suggestion: a single auto_enable option that does not distinguish between toolstack restarts and reboots, unless there are objections.

@last-genius last-genius force-pushed the asv/host-disable-persistent branch from 313c997 to b94f9e6 Compare September 8, 2025 09:03
@last-genius last-genius changed the title Host.disable: add persistency flags Host.disable: add persistency flag Sep 8, 2025
@last-genius
Copy link
Contributor Author

Reworked into a single auto_enable parameter. Re-tested - works as intended.

Currently a manually disabled host will be re-enabled on toolstack restarts
and host reboots, which will provoke VM migrations in an HA cluster. If
maintenance requires many restarts, that could be painful.

To allow for keeping a host persistently disabled across toolstack
restarts and host reboots, add a new localdb flag "host_auto_enable"
(set through the parameter on Host.disable). This coexists with the internal
flag of host_disabled_until_reboot, which is only set on host poweroff
internally and cannot be controlled by the user directly.

With host_auto_enable set to false, xapi will not re-enable a host on
its own no matter what: toolstack restarts, host reboots, calls to
consider_enabling_host (triggered by PBD plugs etc.) will have no effect.
Only a manual call to Host.enable will re-enable the host.

Expose the new parameter in the CLI. Also fix up the comment in
xapi_host_helpers.mli.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius last-genius force-pushed the asv/host-disable-persistent branch from b94f9e6 to cf5be62 Compare September 8, 2025 09:40
@last-genius last-genius added this pull request to the merge queue Sep 9, 2025
Merged via the queue into xapi-project:master with commit 89d78a5 Sep 9, 2025
15 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Sep 30, 2025
… defaults (#6693)

#6652 added a new parameter
to `Host.disable`. Since the method is used during an RPU, when a new
client calls an older server unaware of the parameter, this broke it.
Add a test reproducing what happens during an RPU and fix the issue in
`client.ml`.

---

Adds an older `server.ml` and `client.ml` from xapi 25.30.0 (with
`server.ml`
modified to compile after the XSA-474 interface changes), before
`Host.disable`
gained the `auto_enable` parameter.

Adds compatibility tests verifying that an older client can talk to a
newer
server and the other way around.

Before the fix, both `test_compatibility_with_old_server_*` fail,
showing that
`auto_enable` in `Host.disable` is an unexpected parameter. This failure
is
triggered on RPUs, when a newer xapi talks to an older one:

[exception] Server_error(MESSAGE_PARAMETER_COUNT_MISMATCH, [
host.disable; 1; 2 ])

So allow `client.ml` to skip specifying an arbitrary number of rightmost
arguments if they're all equal to their default values (since arguments
are
positional, once an argument is not skipped, no arguments to its left
can be
skipped).

Generated code for `host.disable` looks like the following:

```
let session_id = rpc_of_ref_session session_id in
let host = rpc_of_ref_host host in
let auto_enable = rpc_of_bool auto_enable in

let needed_args, _ = List.fold_right2
 (fun param default (acc, skipped)->
   (* Since arguments are positional, we can only skip specifying an
      argument that's equal to its default value if all the arguments to
      its right were also not specified *)
   if skipped then
     (match default with
     | Some default_value when param = default_value -> (acc, true)
     | _ -> (param::acc, false))
   else
     (param :: acc, false)
 ) [ session_id; host; auto_enable ] [ None; None; Some (Rpc.Bool true) ] ([], true)
in

rpc_wrapper rpc "host.disable" needed_args >>= fun x -> return (ignore x)
```

This fixes an issue with `client.ml` always specifying values for new
parameters
that older `server.ml` did not know about (which happens during an RPU).

This makes `test_compatibility_with_old_server_default` pass, so drop
the
`try with` for it. `test_compatibility_with_old_server_non_default`
still fails,
indicating that everything works as intended.

Fixes: cf5be62 ("host.disable: Add auto_enabled parameter for
persistency")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants