Skip to content

Conversation

@LunfanZhang
Copy link
Collaborator

merge master to feature branch

changlei-li and others added 30 commits February 18, 2025 16:24
Add a new field to store service aware data from guest vms

Signed-off-by: Changlei Li <[email protected]>
Add a new field to store service aware data from guest vms
1. xenopsd backend read and watch /local/domain/%d/data/service
2. xapi_guest_agent convert the data to (string, string) list
to store in VM_guest_metrics.services

Signed-off-by: Changlei Li <[email protected]>
…i-project#6317)

The field `VM_guest_metrics.services` added previously (xapi-project#6309) is used
to store VM service information reported by guest tools.
Xapi does not use this data but just holds it for outside consumers.
This PR reads the xenstore path `data/service` and converts to
`[("service_name/key", "value")]` pair list and then store to
VM_guest_metrics.services.
The path to read is like:
`/local/domain/<id>/data/service/<service_name>/<key> = <value>`
Conflict solved:
```
$ git diff
diff --cc ocaml/idl/schematest.ml
index b534f10,255e094e1..000000000
--- a/ocaml/idl/schematest.ml
+++ b/ocaml/idl/schematest.ml
@@@ -3,7 -3,7 +3,11 @@@ let hash x = Digest.string x |> Digest.
  (* BEWARE: if this changes, check that schema has been bumped accordingly in
     ocaml/idl/datamodel_common.ml, usually schema_minor_vsn *)

++<<<<<<< HEAD
 +let last_known_schema_hash = "34390a071f5df0fac8dcf9423a9111ae"
++=======
+ let last_known_schema_hash = "ad67a64cd47cdea32085518c1fb38d27"
++>>>>>>> master
```

Commit b6ced44 is new for updating
last_known_schema_hash and datamodel lifecycle
No conflict during merge. The code scanning warnings are introduced by
xapi-project#5995 , not relevant to this feature.
This simplifies making optimisations as it removes the need to look in
the logs for these events.

Signed-off-by: Steven Woods <[email protected]>
`import_activate` and `get_nbd_server` is more like datapath functions
so I am moving them into the `DATA` module whereas they were previoulsy
in `DATA.MIRROR` module.

Reserve `DATA.MIRROR` only for storage migrate functionalities that is
implemented in the xapi layer but no in the storage layer.

Signed-off-by: Vincent Liu <[email protected]>
There are several functions in storage_interface and hence storage_mux,
such as `start`, `stop`, `list`.

These functions are currently not multiplexed but just called
directly into storage_migrate. In fact, they are unlikely to be
multiplexed because they use the `State` module in storage migrate,
which is a in memory hashtable in xapi, not accessible by
xapi-storage-script.

So remove them from the storage interface, and callers of these
functions can call them from storage_migrate directly rather than
going through the storage interface.

None of these are remote functions so no need to worry about backwards
compatibility.

Signed-off-by: Vincent Liu <[email protected]>
The `send_start` function is a subroutine inside the
`Storage_migrate.start` function, which takes the mirror prepared by the
`receive_start` and initiates mirroring to the remote VDI.

This commit only defines the interface, which means this function is
currently unused.

Signed-off-by: Vincent Liu <[email protected]>
Just so that they type check, some of the functions are still
unimplemented, and these functions are still unused.

Signed-off-by: Vincent Liu <[email protected]>
There is already a call to `assert_can_migrate_vdis` present in
`assert_can_migrate`, which checks that none of the VDIs that *are going to be
moved* have CBT enabled. There is no need to additionally check that none of the
VDIs *in general* have CBT enabled.

Some clients, like XenOrchestra, will turn off CBT on VDIs and retry migration
after getting the `VDI_CBT_ENABLED` error on live migration. Dropping this
overly strict check allows not stripping CBT when VDI will not be moved (when
it's on a shared SR).
In addition, during rolling pool upgrades, disabling CBT is not allowed, hence
the live migration operation wouldn't be able to continue. Avoiding the strict
check fixes that as well.

Signed-off-by: Andrii Sultanov <[email protected]>
Sync master before merging to master. No confilct.
…pi-project#6405)

There is already a call to `assert_can_migrate_vdis` present in
`assert_can_migrate`, which checks that none of the VDIs that *are going
to be moved* have CBT enabled. There is no need to additionally check
that none of the VDIs *in general* have CBT enabled.

Some clients, like XenOrchestra, will turn off CBT on VDIs and retry
migration after getting the `VDI_CBT_ENABLED` error on live migration.
Dropping this overly strict check allows not stripping CBT when VDI will
not be moved (when it's on a shared SR).
In addition, during rolling pool upgrades, disabling CBT is not allowed,
hence the live migration operation wouldn't be able to continue.
Avoiding the strict check fixes that as well.

Closes xapi-project#6400
We want to handle version strings reliably and not re-doscovering their
problems over and over. Add a simple library together with tests.

Signed-off-by: Christian Lindig <[email protected]>
An SM plugin might become unavailable; we have to remove its record on
xapi startup. The canonical case is an upgrade from XS8 to XS9.

Signed-off-by: Christian Lindig <[email protected]>
An SM plugin might become unavailable; we have to remove its record on
xapi startup. The canonical case is an upgrade from XS8 to XS9.
…visualise potential optimisations (xapi-project#6379)

Also simplify two tracing functions which could just use
Span.get_trace_context directly
If smapi observer was previously enabled and then added to
observer_experimental_components, when the toolstack is restarted the
smapi observer will no longer exist. The observer will therefore not be
"destroyed" by normal means so we must remove the config manually.

This is important for xapi-storage-script as it cannot read the DB and
so relies on this config file to know whether to trace SMAPIv3 or not.

Signed-off-by: Steven Woods <[email protected]>
…_where

Messages are not stored in the database, so their handling is done through
Custom_actions. The custom implementation of get_all_records_where, in
particular, used to ignore the query expression altogether (as if expr = true).

Expand the existing handler to evaluate the query and match against the messages
retrieved and parsed from disk.

An alternative approach would have instead special cased filtering code in the
database itself, but in my attempts this was awkward and duplicated quite a lot
of code.

With this it is possible to filter messages properly now:
```
>>> session.xenapi.message.get_all_records_where('field "name" = "VM_STARTED"')
<list of messages with name = VM_STARTED>
>>> session.xenapi.message.get_all_records_where('field "name" = "VM_STARTED" and field "obj_uuid" = "3c61111f-1d67-7db4-95a5-f0287aff57bf"')
<list of messages with name = VM_STARTED and obj_uuid with such value>
```

Also update the documentation to match new behaviour.

Closes xapi-project#6340

Signed-off-by: Andrii Sultanov <[email protected]>
…al (xapi-project#6410)

If smapi observer was previously enabled and then added to
observer_experimental_components, when the toolstack is restarted the
smapi observer will no longer exist. The observer will therefore not be
"destroyed" by normal means so we must remove the config manually.

This is important for xapi-storage-script as it cannot read the DB and
so relies on this config file to know whether to trace SMAPIv3 or not.
…_where (xapi-project#6411)

Messages are not stored in the database, so their handling is done
through Custom_actions. The custom implementation of
get_all_records_where, in particular, used to ignore the query
expression altogether (as if expr = true).

Expand the existing handler to evaluate the query and match against the
messages retrieved and parsed from disk.

An alternative approach would have instead special cased filtering code
in the database itself, but in my attempts this was awkward and
duplicated quite a lot of code.

With this it is possible to filter messages properly now:
```
>>> session.xenapi.message.get_all_records_where('field "name" = "VM_STARTED"')
<list of messages with name = VM_STARTED>
>>> session.xenapi.message.get_all_records_where('field "name" = "VM_STARTED" and field "obj_uuid" = "3c61111f-1d67-7db4-95a5-f0287aff57bf"')
<list of messages with name = VM_STARTED and obj_uuid with such value>
```

Also update the documentation to match new behaviour.

Closes xapi-project#6340
edwintorok and others added 27 commits April 11, 2025 16:08
It is now a dependency of Threadext, which causes linking to fail if
this isn't installed.
Only a problem when RPM is used, not when OPAM is used.
The destroy operation is missing in the allowed operations for HA
statefile VDI after HA is disable. It is because when
update_allowed_operations for the VDI, it is found that
ha_disable_in_progress is true.

Xapi_ha.disable_internal will also be called when HA is not enabled
successfully, in that case, ha_enable_in_progress is true.

This fix removes `ha_enable or `ha_disable from pool's current
operations to allow update_allowed_operations to enable destory
operation for the VDI.

Signed-off-by: Gang Ji <[email protected]>
Currently, we are only checking that no VDIs have been removed during
the SR scan performed by the SM plugin. However, there are situations
where a VDI has been added, and if this VDI is not present in the list
obtained from SR.scan, it will be forgotten. The checks only prevent
this in the case where the VDI was added during the scan. There is still
a TOCTOU situation if the issue happens after the scan, and there is
room for that.
Reorder the operations in _pci_add:

1) So that Xen can verify calls to grant ioport and iomem permissions,
reorder the calls so that the device is assigned to the domain before
granting permissions for the resources. When Secure Boot is enabled, Xen
will enforce that ioport/iomem permissions can be granted to a domain
only when the corresponding device is assigned to that domain.

2) Add the device to QEMU after assigning to the domain. Rather than
accessing the PCI config space through dom0 sysfs which is blocked when
Secure Boot is enabled, QEMU in XS9 has been updated to use a hypercall
to access the PCI config space of a device assigned to a domain.
Therefore, add the device to QEMU after assigning it to the domain
rather than before so that the config space accesses it performs during
the QMP call succeed.

Signed-off-by: Ross Lagerwall <[email protected]>
This will be important to have a dual-stack mode

Signed-off-by: Pau Ruiz Safont <[email protected]>
This is done for host certificates only

Signed-off-by: Pau Ruiz Safont <[email protected]>
…api-project#6419)

Adds IPv6 addresses to host certificates when their management
interfaces are configured to use Dual stack.

I've tested manually on IPv4-only hosts, and the certificates are
generated in the same way as before.
I've run the smoke and verification tests (Suite Run 215863), all of
them passed

@last-genius you probably want to test these changes, run `openssl x509
-text -in /etc/xensource/xapi-ssl.pem` on a dual host, and see that it
contains the IPv6 and IPv4 addresses.
Now it's generated by dune and includes the dependency on xapi-client and
xapi-types.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Now it's generated by dune and includes the dependency on xapi-client
and xapi-types.
With this PR, `xapi` and `xcp-networkd` now properly handle emergency
network resets on an IPv6-only host (when it's an individual host as
well as when it's in a pool as a supporter and as a coordinator)

Tested manually, see also the required `xsconsole` PR:
xapi-project/xsconsole#55.
As xha_metadata_vdi is of type `redo_log.

This is why the destroy operation was not missing for the HA metadata VDI
after HA is disabled.

Signed-off-by: Gang Ji <[email protected]>
Reorder the operations in _pci_add:

1) So that Xen can verify calls to grant ioport and iomem permissions,
reorder the calls so that the device is assigned to the domain before
granting permissions for the resources. When Secure Boot is enabled, Xen
will enforce that ioport/iomem permissions can be granted to a domain
only when the corresponding device is assigned to that domain.

2) Add the device to QEMU after assigning to the domain. Rather than
accessing the PCI config space through dom0 sysfs which is blocked when
Secure Boot is enabled, QEMU in XS9 has been updated to use a hypercall
to access the PCI config space of a device assigned to a domain.
Therefore, add the device to QEMU after assigning it to the domain
rather than before so that the config space accesses it performs during
the QMP call succeed.
We define different failures for SXM, for different stages of the SXM:

- preparation failure: raised when preparing the environment for SXM,
  for example, when the dest host creates VDIs for data mirroring
    (SMAPIv1 and v3);
- mirror_fd_failure: happens when passing fds to tapdisks for mirroring
  (SMAPIv1 only);
- mirror_snapshot_failure: raised when taking a snapshot as the base image
  before copying it over to the destination (SMAPIv1 only);
- mirror_copy_failure: raised when copying of the base image fails;
- mirror_failure: raised when there is any issues that causes the mirror
  to crash during SXM (SMAPIv1 and v3)

Signed-off-by: Vincent Liu <[email protected]>
Although copying is not exclusively used by SMAPIv1, e.g.
DATA.copy maps to Storage_migrate.copy which maps to
Storage_smapiv1_migrate.copy. And this is used when a VDI does not need
to be migrated live (attached read-only). It's a bit easier in terms of
dependency to leave them in storage_smapiv1_migrate.ml

And some related helper functions such as `progress_callback`.

Signed-off-by: Vincent Liu <[email protected]>
Because the latter needs to use the former.

Signed-off-by: Vincent Liu <[email protected]>
The original Storage_migrate.start function is quite fat and does lots
of things. For better readability and maintainability, split it into
multiple functions that represent different stages of the SXM:
- prepare: makes a remote call to the dest host and asks it to prepare
  for the VDIs for receiving data
- mirror_pass_fds: passes the fd to tapdisk to establish connection
  between two tapdisk processes (sending & receiving)
- mirror_snapshot: takes a snapshot of the VDI to be mirrored, because
  the tapdisk mirroring only mirrors data written after the mirror is
  initiated
- mirror_copy: copy the snapshot from the source to destination

As these operations are all specific to SMAPIv1, move them to
`storage_smapiv1_migrate` as well.

Also restructure the way clean up is done. As there are multiple points
at which the migration might fail, previously a `on_fail` list ref is
constructed and populated as we go. Now that we have explicitly defined
the different errors during the migration, we could handle these errors
accordingly.

The way to reason about cleanups is: we have three possible clean up
actions:

- receive_cancel: which will undo the preparation work done on the dest
  host. This is added once preparation is done;
- destroy snapshot: which will destroy the snapshot. This is called
  after the snapshot is created;
- stop: which will disable mirroring, delete snapshot and also do what
  receive_cancel does, which is basically clean up everything. And this
  will be called after the mirror has been started.

Note many of these functions are best-effort, for example, the stop
function won't delete a snapshot if it does not exist, so it's generally
safe to call them.

We put these clean up functions into the appropriate places, see the
actual arrangement of these functions in the code, they should be
self-explanatory.

Signed-off-by: Vincent Liu <[email protected]>
Now that the split is done, we can implement the `send_start` function
defined earlier for SMAPIv1, by combining all the different stages into
one function and invoke it from `Storage_migrate.start`.

At this point the refacotring for SMAPIv1 should be finished, and there
should still be no functional change.

Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
…ject#6423)

This is a continuation of xapi-project#6404, which completes the refactoring of SXM
code for the new architecture.

I expect there to be no functional change, although there is a
significant change of how error handling is done.

The last couple of commits contain the design for this PR.

To be continued...
@BengangY BengangY merged commit 779f8f7 into xapi-project:feature/configure-ssh-phase2 Apr 22, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.