Skip to content

Conversation

LunfanZhang
Copy link
Contributor

Merge master to feature branch and resolve the following conflicts:

$ git show db5ad7d472aebb4ab5d89d4a85371db8ad989ccd
commit db5ad7d472aebb4ab5d89d4a85371db8ad989ccd (HEAD -> private/luzhan/sync-master-to-feature, mygithub/private/luzhan/sync-master-to-feature)
Merge: d34d581e7 c185101d6
Author: Lunfan Zhang[Lunfan.Zhang] <Lunfan.Zhang@cloud.com>
Date:   Mon Jul 21 08:50:26 2025 +0000

    Merge master to feature branch

diff --cc ocaml/idl/schematest.ml
index d3914f478,7bd70cb3a..963231d7d
--- a/ocaml/idl/schematest.ml
+++ b/ocaml/idl/schematest.ml
@@@ -3,7 -3,7 +3,7 @@@ let hash x = Digest.string x |> Digest.
  (* BEWARE: if this changes, check that schema has been bumped accordingly in
     ocaml/idl/datamodel_common.ml, usually schema_minor_vsn *)
  
- let last_known_schema_hash = "a030fa0233a69a33200b628af0f030c7"
 -let last_known_schema_hash = "9cd32d98d092440c36617546a3d995bd"
++let last_known_schema_hash = "7586cb039918e573594fc358e90b0f04"
  
  let current_schema_hash : string =
    let open Datamodel_types in
diff --cc ocaml/xapi/xapi_globs.ml
index 14459e1c8,7bdd07079..ba08ad113
--- a/ocaml/xapi/xapi_globs.ml
+++ b/ocaml/xapi/xapi_globs.ml
@@@ -1757,12 -1758,16 +1762,22 @@@ let other_options 
      , (fun () -> string_of_bool !validate_reusable_pool_session)
      , "Enable validation of reusable pool sessions before use"
      )
 +  ; ( "ssh-auto-mode"
 +    , Arg.Bool (fun b -> ssh_auto_mode_default := b)
 +    , (fun () -> string_of_bool !ssh_auto_mode_default)
 +    , "Defaults to true; overridden to false via \
 +       /etc/xapi.conf.d/ssh-auto-mode.conf(e.g., in XenServer 8)"
 +    )
+   ; ( "vm-sysprep-enabled"
+     , Arg.Set vm_sysprep_enabled
+     , (fun () -> string_of_bool !vm_sysprep_enabled)
+     , "Enable VM.sysprep API"
+     )
+   ; ( "vm-sysprep-wait"
+     , Arg.Set_float vm_sysprep_wait
+     , (fun () -> string_of_float !vm_sysprep_wait)
+     , "Time in seconds to wait for VM to recognise inserted CD"
+     )
    ]
  
  (* The options can be set with the variable xapiflags in /etc/sysconfig/xapi

GabrielBuica and others added 30 commits June 18, 2025 15:42
Simplifies the logic of `exec_with_context` by letting the caller decide
when the task is destroyed from the database.

Adds helper function in `context.ml` to destroy and trace the destroy op
correctly.

Signed-off-by: Gabriel Buica <[email protected]>
Dynamic.t is not used currently, just drop it.

Signed-off-by: Andrii Sultanov <[email protected]>
There's already a 'localhost' in scope.

Signed-off-by: Andrii Sultanov <[email protected]>
No functional change, this just removes several indentation levels from the
500+ lines of the function, making it easier to refactor in the future.

This also clears up the logic of the function, now that two arms of if-else and
try-with clauses are not 500+ lines apart, and avoids splitting some
expressions and strings over several lines given that they reach the line
character limit more often inside of several levels of indentation.

Signed-off-by: Andrii Sultanov <[email protected]>
Best reviewed by commit, and also best viewed locally with a git client
that distinguishes between moved and new lines, with a config like:
```
[diff]
    colormoved = "default"
    colormovedws = "allow-indentation-change"
```

No functional changes intended.

----

This is split out from a larger series, but I figured these cleanups are
easier to review on their own and it would be better to get them in
early to avoid annoying rebases of a thousand lines of code ....
events_from_xenopsd thread is critical as it sync up VM status
in case of bootstorm, this thread is flood as lots of events comes
from xenopsd waiting for process.

During processing of the events, VM/VDI/VBD update_allowed_operations
will be called to refresh the allowed operations.

However, for each ops (start/suspend,etc) for the same object(VM),
the object info is always the same no matter what the ops is.
Thus, it is not necessary to query the object information over and
over again.

Disclosure is used to resovle the issue. Query once and the disclosure
will just remember the query result.

The performance test for starting 500 VM on 4 hosts improve around 10%
performance for both XS8 and XS9

This commit just introduce disclosure and caller call the disclosure
instead of the original function

Signed-off-by: Lin Liu <[email protected]>
Define db ops into variables and keep them inside returned function
for code review

Signed-off-by: Lin Liu <[email protected]>
Move ops unrelated db operation outside of returned function

Signed-off-by: Lin Liu <[email protected]>
This allows changing the type definition in Updates without modifying the
types here as well in the future.

Signed-off-by: Andrii Sultanov <[email protected]>
The only usage of it ignores values, so drop them altogether. This allows
changing the type of the values in the future without modifying inject_barrier
and its users in any way.

Signed-off-by: Andrii Sultanov <[email protected]>
Drop the first member of the (Vm.t * Vm.state) tuple as it's never used. This
removes the need for several 'snd info', 'Option.iter (fun (_, state) ->) info'
constructs.

Replace all the following constructs:
```
if different (fun x -> x.field) && predicate then
    Option.iter (fun state -> ...) info
```

With this:
```
different (fun x -> x.field)
          ((&&) predicate)
          (fun field -> ...);
```

It
1) removes the additional level of indentation inside Option.iter (fun state),
   hides the duplication of this construct
2) makes it obvious where the whole 'state' is accessed and not only the field
   that was checked to be different

Signed-off-by: Andrii Sultanov <[email protected]>
…oject#6542)

Simplifies the logic of `exec_with_context` by letting the caller decide
when the task is destroyed from the database.

Adds helper function in `context.ml` to destroy and trace the destroy op
correctly.
Instruments `process`/`Consumers` spans of message-switch service in
xenopsd.

Signed-off-by: Gabriel Buica <[email protected]>
Intruments functions in `xapi_xenops.ml` that carry the
traceparent/tracecontext through dbg.

`Debug_info.with_dbg` now accepts setting up attributes for spans.

Also, instruments `Events_from_xenopsd` to capture the event spans:
`subscribe`/`settle`. It's nto straight forward to link them on the same
trace. For now the only way they are connnected is having the same
`message.id` attribute.

Signed-off-by: Gabriel Buica <[email protected]>
Intruments the functions in `xapi_xenops.ml` that carry the
traceparent/tracecontext through `context`.

Signed-off-by: Gabriel Buica <[email protected]>
This function is called recursively and the context span is only closed
on failure. This makes it hard to read the trace.

Therefore, I am resetting the context tracing each recursion step and now
each recursion step will have its own trace.

Signed-off-by: Gabriel Buica <[email protected]>
…#6541)

events_from_xenopsd thread is critical as it sync up VM status in case
of bootstorm, this thread is flood as lots of events comes from xenopsd
waiting for process.

During processing of the events, VM/VDI/VBD update_allowed_operations
will be called to refresh the allowed operations.

However, for each ops (start/suspend,etc) for the same object(VM), the
object info is always the same no matter what the ops is. Thus, it is
not necessary to query the object information over and over again.

Disclosure is used to resovle the issue. Query once and the disclosure
will just remember the query result.

The performance test for starting 500 VM on 4 hosts improve around 10%
performance for both XS8 and XS9
1. WLB request will raise `wlb_authentication_failed` when catching
   `Http_client.Http_error`. But actually only error code 401 and 403
   should raise this type exception. For other error code, raise
   `wlb_connection_reset`. Also print the detail error code and
   message.
2. `message_forwarding` raises same error for `Http_request_rejected`
   and `Connection_reset` so we don't know which exception actually
   be raised. Print detailed logs for these 2 exceptions.

Signed-off-by: Bengang Yuan <[email protected]>
…xenops.ml` (xapi-project#6527)

As promised, more tracing instrumentation around message switch in
xenopsd.

There is also some instrumentation of `Events_from_xenops` around
waiting/waking up for events. Currently these are not straight forward
linkable as they have different traces, e.g. one is on the vm op and the
other in the xapi event watch thread.

It is now possible to search for for "`subscribe`/`settle` <queue_name>"
spans that have the same attribute value for "messaging.message.id".
This should help debugging.

![image](https://github.com/user-attachments/assets/4a408e97-cab0-4380-bf3a-fd5c3619e126)

![image](https://github.com/user-attachments/assets/f6620cc5-82a0-4d1e-a3ea-6088e79e25e2)
"data/updated" is not read or used anywhere in xenopsd or xapi:
* xapi_guest_agent's last_updated field is just Unix.gettimeofday ().
* xapi_xenops removes "data/updated" from the guest agent state altogether before
checking if it's changed:
```
let ignored_keys = ["data/meminfo_free"; "data/updated"; "data/update_cnt"]
```

So there is no need to watch this path at all.

This greatly reduces unnecessary traffic between xapi and xenopsd, since any VM
with a guest agent would write to data/updated once every 60 seconds, which
would generate a Dynamic.Vm event, making xapi call xenopsd's VM.stat to rescan
the domain's xenstore tree and perform several hypercalls. Almost always, this
would be completely unnecessary as nothing else about the VM would change, but
a lot of work would be done anyhow.

Signed-off-by: Andrii Sultanov <[email protected]>
1. WLB request will raise `wlb_authentication_failed` when catching
`Http_client.Http_error`. But actually only error code 401 and 403
should raise this type exception. For other error code, raise
`wlb_connection_reset`. Also print the detail error code and message.
2. `message_forwarding` raises same error for `Http_request_rejected`
and `Connection_reset` so we don't know which exception actually be
raised. Print detailed logs for these 2 exceptions.
Drop the first member of the (X.t * X.state) tuple coming from X.stat
immediately as it's never used. This removes the need for several 'snd info',
'Option.iter (fun (_, state) ->) info' constructs.

Signed-off-by: Andrii Sultanov <[email protected]>
Rename the "u" function used across storage and observer_helpers to make
its meaning more obvious and use __FUNCTION__ instead of hardcoding the
function name.

Signed-off-by: Steven Woods <[email protected]>
Some refactoring, but the most important commit is this one-line change
that greatly improves things on its own:

---

xenopsd: Remove `data/updated` from the list of watched paths

"data/updated" is not read or used anywhere in xenopsd or xapi:
* xapi_guest_agent's last_updated field is just Unix.gettimeofday ().
* xapi_xenops removes "data/updated" from the guest agent state
altogether before
checking if it's changed:

```
let ignored_keys = ["data/meminfo_free"; "data/updated"; "data/update_cnt"]
```

So there is no need to watch this path at all.

This greatly reduces unnecessary traffic between xapi and xenopsd, since
any VM
with a guest agent would write to data/updated once every 60 seconds,
which
would generate a Dynamic.Vm event, making xapi call xenopsd's VM.stat to
rescan
the domain's xenstore tree and perform several hypercalls. Almost
always, this
would be completely unnecessary as nothing else about the VM would
change, but
a lot of work would be done anyhow.
Rename the "u" function used across storage and observer_helpers to make
its meaning more obvious and use __FUNCTION__ instead of hardcoding the
function name.
The VM id part of Vbd.id is unnecessary in the attached_vdis key as the DB
is already indexed by the VM id. This also prevents problems when the VM is
renamed.

Signed-off-by: Steven Woods <[email protected]>
The `ref` parameter within the `location` attribute of the console can
refer to either the VM's ref or the console's own ref. Currently, the
console's location uses the VM's ref by default.

This causes an issue: when executing xs console, the requested
location contains the VM's ref. If the ref points to the VM, xapi will
attempt to use the RFB console (which is graphical) by default, rather
than the VT100 console (which is text-based). As a result, the xs
console command fails to open the console and hangs.

**Solution:**
Update the default ref in the console's `location` to the console's
own ref. With this change, whether accessing the RFB or the VT100
console, the ref in the `location` will always point to the respective
console itself.

Signed-off-by: Bengang Yuan <[email protected]>
Add details on specifying image format for VDI and VM migration.
In particular This revision explains how to choose the destination image
format during VDI creation and migration, including VM migration
scenarios.
Also fixes minor typos in the document.

Signed-off-by: Guillaume <[email protected]>
edwintorok and others added 26 commits July 15, 2025 14:41
The default for the Xen PCI MMIO BAR is UnCachable. Setting this to
WriteBack in the MTRR shows a massive performance improvement on AMD, at
least for Linux guests.

On Intel this is already set to WriteBack by Xen via another mechanism.

To be effective this also requires the corresponding [Xen
commit](https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=22650d6054625be10172fe0c78b9cadd1a39bd63),
old versions will just ignore this xenstore key.

The optimization is not enabled by default in Xen due to the wide range
of guests it supports, but XAPI supports a much narrower set of guest
OSes.

Setting the cache attribute to WB is done by setting UC=0.
The first argument of this API error is used for error codes as defined
by v6d. These product-specific error codes can be matched on by clients
who know about them (e.g. XenCenter for XenServer).

The new, second argument allows v6d to return an error message in
English that xe can print directly, while xe remains product agnostic
and does not need to know v6d's error definitions. This replaces some
awkward API-message handling code in cli_operations.

Signed-off-by: Rob Hoes <[email protected]>
The first argument of this API error is used for error codes as defined
by v6d. These product-specific error codes can be matched on by clients
who know about them (e.g. XenCenter for XenServer).

The new, second argument allows v6d to return an error message in
English that xe can print directly, while xe remains product agnostic
and does not need to know v6d's error definitions. This replaces some
awkward API-message handling code in cli_operations.
other_operation_in_progress has separate fields for the main class of the
object ("Cluster" or "Cluster_host") and the operation that's blocking
progress, do not concatenate these into one string. Also report the task ref
that's blocking progress.

Signed-off-by: Andrii Sultanov <[email protected]>
other_operation_in_progress has separate fields for the main class of the
object ("VM") and the operation that's blocking progress, do not concatenate
these into one string. Also report the task ref that's blocking progress.

Signed-off-by: Andrii Sultanov <[email protected]>
qcow-stream uses Lwt, which is not thread-safe, so we want to avoid using it in
the xapi process. Create a CLI wrapper for calls to qcow-stream.

Signed-off-by: Andrii Sultanov <[email protected]>
This patch allows to pass "qcow2" as a supported format when calling VDI
export and import.

Qcow_tool_wrapper is added as a helper that calls the Python script for export
(conversion from raw to qcow2 stream) and the qcow-tool CLI tool (ocaml-qcow
library) for import (conversion from qcow2 stream to raw).

Signed-off-by: Guillaume <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
Also add an mli file for qcow_tool_wrapper

Signed-off-by: Andrii Sultanov <[email protected]>
Final touches for qcow2 VDI import/export - add it as one of the
supported formats and plumb through the calls to the python script for
export and `qcow-stream` library for import via `qcow_tool_wrapper` and
`qcow-stream-tool`.

`qcow-stream-tool` is a thin wrapper executable over `qcow-stream`, so
that xapi doesn't have to use Lwt code in its process (it's not
thread-safe, see xapi-project#6573 for the discussion on this)

---

This was tested manually, confirmed to have the same behaviour as the
previous version (xapi-project#6573)

---

This PR requires the following additions to the specfile:
```diff
@@ -166,6 +172,7 @@ Requires: hwdata
 Requires: /usr/sbin/ssmtp
 Requires: stunnel >= 5.55
 Requires: vhd-tool
+Requires: qcow-stream-tool
 Requires: libffi
 Requires: busybox
 Requires: iproute
@@ -397,9 +404,15 @@ This packages contains plugins registering to the RRD daemon and exposing variou
 %package -n vhd-tool
 Summary: Command-line tools for manipulating and streaming .vhd format files

+%package -n qcow-stream-tool
+Summary: Minimal CLI wrapper for qcow-stream
+
 %description -n vhd-tool
 Simple command-line tools for manipulating and streaming .vhd format file.

+%description -n qcow-stream-tool
+Minimal CLI wrapper for qcow-stream
+
 %package -n xcp-networkd
 Summary:  Simple host network management service for the xapi toolstack
 Requires: ethtool
@@ -1339,6 +1353,9 @@ done
 /opt/xensource/libexec/get_nbd_extents.py
 /opt/xensource/libexec/python_nbd_client.py

+%files -n qcow-stream-tool
+%{_bindir}/qcow-stream-tool
+
 %files -n xcp-networkd
 %{_sbindir}/xcp-networkd
 %{_bindir}/networkd_db
```
The API supports a driver deselect call for which Xapi  calls the
underlying driver-tool. The mock driver is missing the corresponding -d
option, which this patch adds. This fixes an internal error that Xapi
raises otherwise.

An alternative would be to ignore the error from the driver-tool.

Signed-off-by: Christian Lindig <[email protected]>
…n_in_progress

Fix incorrect operations being reported - e.g. when other_operation_in_progress
is reported only when there are non-parallelisable operations currently
happening, we can't just report the first current op, we need to report the
first *non-parallelisable* op. This required some refactoring, moving to
options instead of bools, etc.

Additionally report the blocking operation's reference when raising the error.

Signed-off-by: Andrii Sultanov <[email protected]>
…other_operation_in_progress

Instead of logging all the blocking operations with 'debug', just report them
with the error.

Signed-off-by: Andrii Sultanov <[email protected]>
…ation_in_progress error

Signed-off-by: Andrii Sultanov <[email protected]>
Not all of other_operation_in_progress error contain two objects in the list
([cls; objref]), some contain three or four, but the match would miss them and
they would be put into a non-interruptible Thread.sleep instead of the
interruptible Delay.sleep that provides a mechanism for the other task to wake
us up on its way out.

Signed-off-by: Andrii Sultanov <[email protected]>
…progress

Most of the users report these when appropriate, so document them here.

Signed-off-by: Andrii Sultanov <[email protected]>
quality-gate: Reduce expected List.hd count
test_clustering: test for the blocking operation to be reported

Signed-off-by: Andrii Sultanov <[email protected]>
…s delays (xapi-project#6588)

`other_operation_in_progress` is reported inconsistently, with some
users only specifying `[class; class_ref]`, and others also `op_type`.
Make it consistent, changing all users to report `[class; class_ref;
op_type; op_ref]`. This makes it much easier to navigate the logs, since
the operation blocking progress is clearly referenced.

Compare before, where it's unclear which operation is precluding
progress:
```
Caught exception while marking SR for VDI.clone in message forwarder:
OTHER_OPERATION_IN_PROGRESS:
[ SR; OpaqueRef:d019f26a-b2e8-529b-bb66-fd4f008d4f82; VDI.clone ]
```

And after, with the ID of the operation present:
```
Caught exception while marking SR for VDI.clone in message forwarder:
OTHER_OPERATION_IN_PROGRESS:
[ SR; OpaqueRef:b0f54a40-485f-f48f-fdc7-6db28a64d3fd;
  scan; OpaqueRef:832fb22d-2ce5-0d26-8a00-a4165e78b34d ]
```

In addition, fix incorrect operation reporting, so that when progress is
blocked only when particular types of operations are present in
`current_operations`, these specific operations are reported, and not
just any current operation.

Also fix handling of `other_operation_in_progress` delays - some of the
users would miss the `match` arm and use the non-interruptible
`Thread.sleep` instead of the interruptible `Delay.sleep` that provides
a mechanism for the blocking task to wake us up on its way out.
The API supports a driver deselect call for which Xapi calls the
underlying driver-tool. The mock driver is missing the corresponding -d
option, which this patch adds. This fixes an internal error that Xapi
raises otherwise.

An alternative would be to ignore the error from the driver-tool.
The previous `xe` help is as below:
```
Usage: xe <cmd> [-s server] [-p port] ([-u username] [-pw password] or [-pwf <password file>]) [--traceparent traceparent] <other arguments>

A full list of commands can be obtained by running
	xe help -s <server> -p <port>
```

The previous `xe` help output lacked debug-related options and did not
provide detailed parameter description.
The new `xe` help output is as follows:
```
Usage:
  xe <command>
    [ -s <server> ]            XenServer host
    [ -p <port> ]              XenServer port number
    [ -u <username> -pw <password> | -pwf <password file> ]
                               User authentication (password or file)
    [ --nossl ]                Disable SSL/TLS
    [ --debug ]                Enable debug output
    [ --debug-on-fail ]        Enable debug output only on failure
    [ --traceparent <value> ]  Distributed tracing context
    [ <other arguments> ... ]  Command-specific options

A full list of commands can be obtained by running
  xe help -s <server> -p <port>
```
@LunfanZhang
Copy link
Contributor Author

The original confliction:

<<<<<<< HEAD
  ; ( "vm-sysprep-enabled"
    , Arg.Set vm_sysprep_enabled
    , (fun () -> string_of_bool !vm_sysprep_enabled)
    , "Enable VM.sysprep API"
    )
  ; ( "vm-sysprep-wait"
    , Arg.Set_float vm_sysprep_wait
    , (fun () -> string_of_float !vm_sysprep_wait)
    , "Time in seconds to wait for VM to recognise inserted CD"
=======
  ; ( "ssh-auto-mode"
    , Arg.Bool (fun b -> ssh_auto_mode_default := b)
    , (fun () -> string_of_bool !ssh_auto_mode_default)
    , "Defaults to true; overridden to false via \
       /etc/xapi.conf.d/ssh-auto-mode.conf(e.g., in XenServer 8)"
>>>>>>> feature/configure-ssh-phase3

@changlei-li changlei-li merged commit c368857 into xapi-project:feature/configure-ssh-phase3 Jul 21, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.