-
Notifications
You must be signed in to change notification settings - Fork 457
MCO-1898: MCS serves image-aware first-boot config #5357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-1898: MCS serves image-aware first-boot config #5357
Conversation
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ebca2ba to
a4fbb2c
Compare
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
7023ffe to
c35accf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good! Just left a few questions/cleanups that we could do 😄
Could we also squash the second commit?
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
yes! my plan is to squash everything when the team is ready to give an LGTM (incase i need to go back and debug) 😄 |
580bc72 to
cb5982e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change, thanks for considering the feeback around the image registry URLs detection.
11ccab8 to
c17ebcd
Compare
247953f to
7f91952
Compare
7f91952 to
6b4d1a4
Compare
|
/retest-required |
|
@dkhater-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-gcp-op-single-node |
Verification StepsEnvironment Setup
Pre-requisites1. Create Image Registry Secretoc create secret docker-registry layering-push-secret \
--docker-server=quay.io \
--docker-username=<username> \
--docker-password=<password> \
--docker-email="" \
-n openshift-machine-config-operator
secret/layering-push-secret created
$ oc get pods -n openshift-machine-config-operator
layering-push-secret kubernetes.io/dockerconfigjson 1 12sTest Procedure Step 1: Create MachineOSConfig oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineOSConfig
metadata:
name: worker
spec:
machineConfigPool:
name: worker
imageBuilder:
imageBuilderType: Job
baseImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSecret:
name: layering-push-secret
renderedImagePushSpec: "quay.io/mcoqe/layering:ocl"
EOF
machineosconfig.machineconfiguration.openshift.io/worker createdStep 2: Wait for MachineOSBuild oc get machineosbuild
NAME PREPARED BUILDING SUCCEEDED INTERRUPTED FAILED AGE
worker-fc7c674c0a04ecb490deb569f7c852e7 False False True False False 15hStep 3: Verify MCP update: oc get mcp worker
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-456f017aacff4031b0f3ee2cd4f5f6f4 True False False 3 3 3 0 18h
worker rendered-worker-972335f65d1112207e042141f44da687 True False False 3 3 3 0 18h
oc debug node/ip-10-0-48-250.us-east-2.compute.internal
sh-5.1# chroot /host
sh-5.1# cat /etc/machine-config-daemon/currentimage
quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644sh-5.1# cat /etc/macsh-5.1# cat /etc/machine-config-daemon/desiredImage
cat: /etc/machine-config-daemon/desiredImage: No such file or directory
sh-5.1# rpm-ostree status
State: idle
Deployments:
* ostree-unverified-registry:quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
Digest: sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
Version: 9.6.20251022-0 (2025-10-27T14:28:45Z)
Step 4: Scale Up New Worker Node $ oc get machinesets.machine.openshift.io -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
ppt-2710b-22dwx-worker-us-east-2a 1 1 1 1 18h
ppt-2710b-22dwx-worker-us-east-2b 1 1 1 1 18h
ppt-2710b-22dwx-worker-us-east-2c 1 1 1 1 18h
$ oc scale --replicas 2 machinesets.machine.openshift.io -n openshift-machine-api ppt-2710b-22dwx-worker-us-east-2a
machineset.machine.openshift.io/ppt-2710b-22dwx-worker-us-east-2a scaledMonitor the MCD logs oc logs machine-config-daemon-c6js8 -f
...
I1028 06:00:42.733392 2628 daemon.go:1775] Current image: quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
I1028 06:00:42.733400 2628 daemon.go:1776] Desired image: quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644Step 5: Verify First-Boot Behavior oc debug node/ip-10-0-4-217.us-east-2.compute.internal -- chroot /host cat /etc/machine-config-daemon/currentimage
Starting pod/ip-10-0-4-217us-east-2computeinternal-debug-xnqvs ...
To use host binaries, run `chroot /host`
quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
Removing debug pod ...
oc debug node/ip-10-0-4-217.us-east-2.compute.internal -- chroot /host rpm-ostree status
Starting pod/ip-10-0-4-217us-east-2computeinternal-debug-mjslk ...
To use host binaries, run `chroot /host`
State: idle
Deployments:
* ostree-unverified-registry:quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
Digest: sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
Version: 9.6.20251022-0 (2025-10-27T14:28:45Z)
Removing debug pod ...Verify No Unnecessary Reboots oc get events --field-selector involvedObject.name=ip-10-0-94-34.us-east-2.compute.internal --sort-by='.lastTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
62m Normal Uncordon node/ip-10-0-94-34.us-east-2.compute.internal Update completed for config rendered-worker-972335f65d1112207e042141f44da687 and node has been uncordoned
62m Normal NodeDone node/ip-10-0-94-34.us-east-2.compute.internal Setting node ip-10-0-94-34.us-east-2.compute.internal, currentConfig rendered-worker-972335f65d1112207e042141f44da687 to Done
62m Normal ConfigDriftMonitorStarted node/ip-10-0-94-34.us-east-2.compute.internal Config Drift Monitor started, watching against rendered-worker-972335f65d1112207e042141f44da687
50m Normal ConfigDriftMonitorStopped node/ip-10-0-94-34.us-east-2.compute.internal Config Drift Monitor stopped
50m Normal AddSigtermProtection node/ip-10-0-94-34.us-east-2.compute.internal Adding SIGTERM protection
50m Normal Cordon node/ip-10-0-94-34.us-east-2.compute.internal Cordoned node to apply update
50m Normal Drain node/ip-10-0-94-34.us-east-2.compute.internal Draining node to update config.
47m Normal InClusterUpgrade node/ip-10-0-94-34.us-east-2.compute.internal Updating from oscontainer quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644
47m Normal OSUpdateStarted node/ip-10-0-94-34.us-east-2.compute.internal Updating to a target config with default kernel
47m Normal OSUpgradeSkipped node/ip-10-0-94-34.us-east-2.compute.internal OS upgrade skipped; new MachineConfig (rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b) has same OS image (quay.io/mcoqe/layering@sha256:94a71773b23a961243d3c43a1c74d86ba0b9be0ddc8fc4482ae5f5b70246b644) as old MachineConfig (rendered-worker-972335f65d1112207e042141f44da687)
47m Normal RemoveSigtermProtection node/ip-10-0-94-34.us-east-2.compute.internal Removing SIGTERM protection
47m Normal Reboot node/ip-10-0-94-34.us-east-2.compute.internal Node will reboot into config rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b
46m Normal Uncordon node/ip-10-0-94-34.us-east-2.compute.internal Update completed for config rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b and node has been uncordoned
46m Normal NodeDone node/ip-10-0-94-34.us-east-2.compute.internal Setting node ip-10-0-94-34.us-east-2.compute.internal, currentConfig rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9b to Done
46m Normal ConfigDriftMonitorStarted node/ip-10-0-94-34.us-east-2.compute.internal Config Drift Monitor started, watching against rendered-worker-ce9ca102c6043c4e0b06d3c701fe9e9bStep 6: Delete the MOSC $ oc delete machineosconfig worker
machineosconfig.machineconfiguration.openshift.io "worker" deletedoc debug node/ip-10-0-2-167.us-east-2.compute.internal -- chroot /host rpm-ostree status
Starting pod/ip-10-0-2-167us-east-2computeinternal-debug-dvxwh ...
To use host binaries, run `chroot /host`
State: idle
Deployments:
* ostree-unverified-registry:registry.build10.ci.openshift.org/ci-ln-8y8ddg2/stable@sha256:bca4d514da5152f284c7ae974eeb33e8af845d599e9c699d6d8a0ea11a2f5819
Digest: sha256:bca4d514da5152f284c7ae974eeb33e8af845d599e9c699d6d8a0ea11a2f5819
Version: 9.6.20251022-0 (2025-10-22T15:07:01Z)
Removing debug pod ...
oc debug node/ip-10-0-2-167.us-east-2.compute.internal -- chroot /host cat /etc/machine-config-daemon/currentimage
Starting pod/ip-10-0-2-167us-east-2computeinternal-debug-zslkl ...
To use host binaries, run `chroot /host`
Removing debug pod ...Results Summary ✅ Success Criteria Met
/label qe-approved |
|
@dkhater-redhat: This pull request references MCO-1898 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by @ptalgulk01 |
|
@ptalgulk01: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few comments that are not blockers, specially given that QE already validated the current state of the PR.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dkhater-redhat, pablintino The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
07bdde1
into
openshift:main
- What I did
a.
machineconfiguration.openshift.io/currentImageb.
machineconfiguration.openshift.io/desiredImagec. alongside the existing MC/state keys when generating the initial node-annotations file (/etc/machine-config-daemon/node-annotations.json).
a. The server can now pass the resolved rendered OS image (e.g., MOSC/MOSB output) into appendNodeAnnotations(...) so new nodes start with authoritative image annotations.
Impact:
New nodes pivot/validate directly against the intended layered image during bootstrap. In image mode, new nodes successfully deploy the rendered image and can complete without an extra reboot when no reboot-requiring changes are present.
- How to verify it
MachineOSConfigwithrenderedImagePushSpecpointing at your Quay repo/tag.oc scale machineset.machine.openshift.io -n openshift-machine-api dkhater-10-15-2025-a-cpb2n-worker-us-east-1c --replicas=1Expected:
/etc/machine-config-daemon/currentimagecontains your Quay digest.rpm-ostree statusshows the same digest as the booted deployment.Example:
please note, if you use the internal image registry, you will see the legacy two node boots occur
- Description for the changelog
MCS now embeds current/desired image annotations in the initial node annotations at bootstrap. This makes the MCD pivot/validate directly against the rendered layered OS image. New nodes can complete image-mode provisioning without an unnecessary reboot when no reboot-requiring changes are present.