Skip to content

Conversation

@dkhater-redhat
Copy link
Contributor

@dkhater-redhat dkhater-redhat commented Oct 15, 2025

- What I did

  1. Taught the MCS to write image annotations at bootstrap.
  2. Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
    a. machineconfiguration.openshift.io/currentImage
    b. machineconfiguration.openshift.io/desiredImage
    c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
  3. Added the image into the node-annotations appender.
    a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

  1. Create a Quay push secret in the MCO namespace
  2. Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.
  3. Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).
  4. Scale up a worker so a new node is provisioned:
    oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1
  5. Observe the new node- it should pull and deploy the rendered image without reboot
  6. Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

  1. /etc/machine-config-daemon/currentimage contains your Quay digest.
  2. rpm-ostree status shows the same digest as the booted deployment.
  3. No node reboot occurs

Example:

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
  Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
  Version: 9.6.20251013-1

please note, if you use the internal image registry, you will see the legacy two node boots occur
- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 15, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2025
@dkhater-redhat dkhater-redhat force-pushed the minimize-reboot branch 3 times, most recently from ebca2ba to a4fbb2c Compare October 15, 2025 20:58
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

  • ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Version: 9.6.20251013-1

Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

  • ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Version: 9.6.20251013-1

Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

  • ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Version: 9.6.20251013-1

Description for the changelog

<Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

  • ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Version: 9.6.20251013-1

Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

Taught the MCS to write image annotations at bootstrap.
Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:

machineconfiguration.openshift.io/currentImage

machineconfiguration.openshift.io/desiredImage
alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).

Plumbed the image into the node-annotations appender.
The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Behavioral impact:

New nodes pivot/validate directly against the intended layered image during bootstrap (no need to read the encapsulated MC for image discovery).

In image mode, fresh nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

Create a Quay push secret in the MCO namespace (redacted example below).

Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.

Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).

Scale up a worker so a new node is provisioned:

oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1

Observe the new node: it should pull and deploy the rendered image without reboot if no other changes require one.

Confirm on the node:

oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70

$ rpm-ostree status

  • ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
    Version: 9.6.20251013-1

- Description for the changelog

Machine Config Server now embeds current/desired image annotations in the initial node annotations at bootstrap.
This makes the MCD pivot/validate directly against the rendered layered OS image, improving determinism for image mode (MOSC/MOSB/OCB) and avoiding reliance on the encapsulated MC for image discovery. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

  1. Taught the MCS to write image annotations at bootstrap.
  2. Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
    a. machineconfiguration.openshift.io/currentImage
    b. machineconfiguration.openshift.io/desiredImage
    c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
  3. Added the image into the node-annotations appender.
    a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

  1. Create a Quay push secret in the MCO namespace
  2. Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.
  3. Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).
  4. Scale up a worker so a new node is provisioned:
    oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1
  5. Observe the new node- it should pull and deploy the rendered image without reboot
  6. Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

/etc/machine-config-daemon/currentimage contains your Quay digest.

rpm-ostree status shows the same digest as the booted deployment.

My run (example output):

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Version: 9.6.20251013-1

- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 15, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did

  1. Taught the MCS to write image annotations at bootstrap.
  2. Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
    a. machineconfiguration.openshift.io/currentImage
    b. machineconfiguration.openshift.io/desiredImage
    c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
  3. Added the image into the node-annotations appender.
    a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

  1. Create a Quay push secret in the MCO namespace
  2. Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.
  3. Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).
  4. Scale up a worker so a new node is provisioned:
    oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1
  5. Observe the new node- it should pull and deploy the rendered image without reboot
  6. Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

  1. /etc/machine-config-daemon/currentimage contains your Quay digest.
  2. rpm-ostree status shows the same digest as the booted deployment.
  3. No node reboot occurs

Example:

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Version: 9.6.20251013-1

- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dkhater-redhat dkhater-redhat force-pushed the minimize-reboot branch 2 times, most recently from 7023ffe to c35accf Compare October 16, 2025 16:21
Copy link
Contributor

@djoshy djoshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good! Just left a few questions/cleanups that we could do 😄

Could we also squash the second commit?

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 17, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

In response to this:

- What I did

  1. Taught the MCS to write image annotations at bootstrap.
  2. Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
    a. machineconfiguration.openshift.io/currentImage
    b. machineconfiguration.openshift.io/desiredImage
    c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
  3. Added the image into the node-annotations appender.
    a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

  1. Create a Quay push secret in the MCO namespace
  2. Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.
  3. Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).
  4. Scale up a worker so a new node is provisioned:
    oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1
  5. Observe the new node- it should pull and deploy the rendered image without reboot
  6. Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

  1. /etc/machine-config-daemon/currentimage contains your Quay digest.
  2. rpm-ostree status shows the same digest as the booted deployment.
  3. No node reboot occurs

Example:

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Version: 9.6.20251013-1

please note, if you use the internal image registry, you will see the legacy two node boots occur
- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@dkhater-redhat
Copy link
Contributor Author

Generally looks good! Just left a few questions/cleanups that we could do 😄

Could we also squash the second commit?

yes! my plan is to squash everything when the team is ready to give an LGTM (incase i need to go back and debug) 😄

Copy link
Contributor

@pablintino pablintino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change, thanks for considering the feeback around the image registry URLs detection.

@dkhater-redhat dkhater-redhat force-pushed the minimize-reboot branch 2 times, most recently from 11ccab8 to c17ebcd Compare October 22, 2025 20:18
@dkhater-redhat dkhater-redhat force-pushed the minimize-reboot branch 2 times, most recently from 247953f to 7f91952 Compare October 23, 2025 19:41
@dkhater-redhat
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

@dkhater-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/bootstrap-unit 6b4d1a4 link false /test bootstrap-unit
ci/prow/okd-scos-e2e-aws-ovn 6b4d1a4 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dkhater-redhat
Copy link
Contributor Author

/test e2e-gcp-op-single-node

@ptalgulk01
Copy link

Verification Steps

Environment Setup

  • OpenShift Version: 4.21.0-0-2025-10-27-110611-test-ci-ln-8y8ddg2-latest
  • Platform: AWS

Pre-requisites

1. Create Image Registry Secret

oc create secret docker-registry layering-push-secret \
--docker-server=quay.io \
--docker-username=<username> \
--docker-password=<password> \
--docker-email="" \
-n openshift-machine-config-operator
secret/layering-push-secret created
  1. Verify MCO Namespace
$ oc get pods -n openshift-machine-config-operator
layering-push-secret                        kubernetes.io/dockerconfigjson        1      12s

Test Procedure

Step 1: Create MachineOSConfig

oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineOSConfig
metadata:
  name: worker
spec:
  machineConfigPool:
    name: worker
  imageBuilder:
    imageBuilderType: Job
  baseImagePullSecret:
    name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
  renderedImagePushSecret:
    name:  layering-push-secret
  renderedImagePushSpec: "quay.io/mcoqe/layering:ocl"
      
EOF
machineosconfig.machineconfiguration.openshift.io/worker created

Step 2: Wait for MachineOSBuild

oc get machineosbuild
NAME                                      PREPARED   BUILDING   SUCCEEDED   INTERRUPTED   FAILED   AGE
worker-fc7c674c0a04ecb490deb569f7c852e7   False      False      True        False         False    15h

Step 3: Verify MCP update:

oc get mcp worker 
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-456f017aacff4031b0f3ee2cd4f5f6f4   True      False      False      3              3                   3                     0                      18h
worker   rendered-worker-972335f65d1112207e042141f44da687   True      False      False      3              3                   3                     0                      18h

oc debug node/ip-10-0-48-250.us-east-2.compute.internal
sh-5.1# chroot /host
sh-5.1# cat /etc/machine-config-daemon/currentimage
quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644sh-5.1# cat /etc/macsh-5.1# cat /etc/machine-config-daemon/desiredImage
cat: /etc/machine-config-daemon/desiredImage: No such file or directory
sh-5.1# rpm-ostree status
State: idle
Deployments:
* ostree-unverified-registry:quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
                   Digest: sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
                  Version: 9.6.20251022-0 (2025-10-27T14:28:45Z)

Step 4: Scale Up New Worker Node

$ oc get machinesets.machine.openshift.io -n openshift-machine-api 
NAME                                DESIRED   CURRENT   READY   AVAILABLE   AGE
ppt-2710b-22dwx-worker-us-east-2a   1         1         1       1           18h
ppt-2710b-22dwx-worker-us-east-2b   1         1         1       1           18h
ppt-2710b-22dwx-worker-us-east-2c   1         1         1       1           18h

$ oc scale --replicas 2  machinesets.machine.openshift.io -n openshift-machine-api ppt-2710b-22dwx-worker-us-east-2a
machineset.machine.openshift.io/ppt-2710b-22dwx-worker-us-east-2a scaled

Monitor the MCD logs

oc logs machine-config-daemon-c6js8 -f
...
I1028 06:00:42.733392    2628 daemon.go:1775] Current image: quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
I1028 06:00:42.733400    2628 daemon.go:1776] Desired image: quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644

Step 5: Verify First-Boot Behavior
Access the new node during bootstrap (if possible via console/SSH):

oc debug node/ip-10-0-4-217.us-east-2.compute.internal -- chroot /host cat /etc/machine-config-daemon/currentimage 
Starting pod/ip-10-0-4-217us-east-2computeinternal-debug-xnqvs ...
To use host binaries, run `chroot /host`
quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
Removing debug pod ...

oc debug node/ip-10-0-4-217.us-east-2.compute.internal -- chroot /host rpm-ostree status
Starting pod/ip-10-0-4-217us-east-2computeinternal-debug-mjslk ...
To use host binaries, run `chroot /host`
State: idle
Deployments:
* ostree-unverified-registry:quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
                 Digest: sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
                Version: 9.6.20251022-0 (2025-10-27T14:28:45Z)

Removing debug pod ...

Verify No Unnecessary Reboots

oc get events --field-selector involvedObject.name=ip-10-0-94-34.us-east-2.compute.internal --sort-by='.lastTimestamp'
LAST SEEN   TYPE     REASON                      OBJECT                                          MESSAGE
62m         Normal   Uncordon                    node/ip-10-0-94-34.us-east-2.compute.internal   Update completed for config rendered-worker-972335f65d1112207e042141f44da687 and node has been uncordoned
62m         Normal   NodeDone                    node/ip-10-0-94-34.us-east-2.compute.internal   Setting node ip-10-0-94-34.us-east-2.compute.internal, currentConfig rendered-worker-972335f65d1112207e042141f44da687 to Done
62m         Normal   ConfigDriftMonitorStarted   node/ip-10-0-94-34.us-east-2.compute.internal   Config Drift Monitor started, watching against rendered-worker-972335f65d1112207e042141f44da687
50m         Normal   ConfigDriftMonitorStopped   node/ip-10-0-94-34.us-east-2.compute.internal   Config Drift Monitor stopped
50m         Normal   AddSigtermProtection        node/ip-10-0-94-34.us-east-2.compute.internal   Adding SIGTERM protection
50m         Normal   Cordon                      node/ip-10-0-94-34.us-east-2.compute.internal   Cordoned node to apply update
50m         Normal   Drain                       node/ip-10-0-94-34.us-east-2.compute.internal   Draining node to update config.
47m         Normal   InClusterUpgrade            node/ip-10-0-94-34.us-east-2.compute.internal   Updating from oscontainer quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
47m         Normal   OSUpdateStarted             node/ip-10-0-94-34.us-east-2.compute.internal   Updating to a target config with default kernel
47m         Normal   OSUpgradeSkipped            node/ip-10-0-94-34.us-east-2.compute.internal   OS upgrade skipped; new MachineConfig (rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b) has same OS image (quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644) as old MachineConfig (rendered-worker-972335f65d1112207e042141f44da687)
47m         Normal   RemoveSigtermProtection     node/ip-10-0-94-34.us-east-2.compute.internal   Removing SIGTERM protection
47m         Normal   Reboot                      node/ip-10-0-94-34.us-east-2.compute.internal   Node will reboot into config rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b
46m         Normal   Uncordon                    node/ip-10-0-94-34.us-east-2.compute.internal   Update completed for config rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b and node has been uncordoned
46m         Normal   NodeDone                    node/ip-10-0-94-34.us-east-2.compute.internal   Setting node ip-10-0-94-34.us-east-2.compute.internal, currentConfig rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b to Done
46m         Normal   ConfigDriftMonitorStarted   node/ip-10-0-94-34.us-east-2.compute.internal   Config Drift Monitor started, watching against rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b

Step 6: Delete the MOSC

$ oc delete machineosconfig worker
machineosconfig.machineconfiguration.openshift.io "worker" deleted
oc debug node/ip-10-0-2-167.us-east-2.compute.internal -- chroot /host rpm-ostree status
Starting pod/ip-10-0-2-167us-east-2computeinternal-debug-dvxwh ...
To use host binaries, run `chroot /host`
State: idle
Deployments:
* ostree-unverified-registry:registry.build10.ci.openshift.org/ci-ln-8y8ddg2/stable@sha256:bca4d514da5152f284c7ae974eeb33e8af845d599e9c699d6d8a0ea11a2f5819
                   Digest: sha256:bca4d514da5152f284c7ae974eeb33e8af845d599e9c699d6d8a0ea11a2f5819
                  Version: 9.6.20251022-0 (2025-10-22T15:07:01Z)

Removing debug pod ...

oc debug node/ip-10-0-2-167.us-east-2.compute.internal -- chroot /host cat /etc/machine-config-daemon/currentimage
Starting pod/ip-10-0-2-167us-east-2computeinternal-debug-zslkl ...
To use host binaries, run `chroot /host`

Removing debug pod ...

Results Summary

✅ Success Criteria Met

  • New node pivoted to layered image during first boot
  • Only one reboot occurred
  • Node annotations show correct currentImage and desiredImage
  • Image digest matches MOSC/MOSB output
  • No additional reboots after initial provisioning

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 28, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 28, 2025

@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue.

In response to this:

- What I did

  1. Taught the MCS to write image annotations at bootstrap.
  2. Extended appendNodeAnnotations(...) in pkg/server/server.go to optionally include:
    a. machineconfiguration.openshift.io/currentImage
    b. machineconfiguration.openshift.io/desiredImage
    c. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
  3. Added the image into the node-annotations appender.
    a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.

Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.

- How to verify it

  1. Create a Quay push secret in the MCO namespace
  2. Apply a MachineOSConfig with renderedImagePushSpec pointing at your Quay repo/tag.
  3. Opt nodes into image mode (e.g., via the pool opt-in label/annotation you use).
  4. Scale up a worker so a new node is provisioned:
    oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1
  5. Observe the new node- it should pull and deploy the rendered image without reboot
  6. Confirm on the node:
oc debug node/ip-10-0-85-7.ec2.internal
chroot /host
cat /etc/machine-config-daemon/currentimage
rpm-ostree status

Expected:

  1. /etc/machine-config-daemon/currentimage contains your Quay digest.
  2. rpm-ostree status shows the same digest as the booted deployment.
  3. No node reboot occurs

Example:

$ cat /etc/machine-config-daemon/currentimage
quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
$ rpm-ostree status
* ostree-unverified-registry:quay.io/dkhater/image-mode-testing@sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Digest: sha256:d9dcf52c08a659561c36bd62a0a57fcec34aec2f71bce88cd68fd314ddf8db70
 Version: 9.6.20251013-1

please note, if you use the internal image registry, you will see the legacy two node boots occur
- Description for the changelog

MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ptalgulk01
Copy link

/verified by @ptalgulk01

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 28, 2025
@openshift-ci-robot
Copy link
Contributor

@ptalgulk01: This PR has been marked as verified by @ptalgulk01.

In response to this:

/verified by @ptalgulk01

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@pablintino pablintino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments that are not blockers, specially given that QE already validated the current state of the PR.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkhater-redhat, pablintino

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [dkhater-redhat,pablintino]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD d482796 and 2 for PR HEAD 6b4d1a4 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 07bdde1 into openshift:main Oct 28, 2025
13 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants