[release-4.19] OCPBUGS-86233: Fix ssh and password rollbacks#6069
Conversation
This change fixes the issue in SSH keys and user passwords that made the rollback useless as it tried to apply the new configuration instead of the previous one. Signed-off-by: Pablo Rodriguez Nava <git@amail.pablintino.eu>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: openshift-cherrypick-robot The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@openshift-cherrypick-robot: Jira Issue OCPBUGS-85078 has been cloned as Jira Issue OCPBUGS-86233. Will retitle bug to link to clone. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-86233, which is valid. The bug has been moved to the POST state. 7 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Pre-merge Verification: PASSED
Cluster version: 4.19.0-0-2026-05-25-040823-test-ci-ln-n6csc3b-latest
Steps Performed:
- Apply the first MC
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ cat << 'EOF' | oc apply -f -
# MC v1: The "good" config. Applied first and should succeed.
# It creates a file, a unit, and a dropin, plus sets a password and SSH key
# for the core user. All values contain "v1" as a marker so we can later
# verify which version is on disk.
#
# It also installs monitoring units (mco-drift-snapshot.*) that watch the
# tracked files and take timestamped snapshots to /tmp/drift-snapshots/
# whenever they change. This lets us inspect the file content during the
# brief rollback window when MC v2 fails.
#
# Expected flow:
# 1. Apply this MC -> node reaches Done
# 2. Apply MC v2 (test-mc-full-v2) -> OS update fails, MCD rolls back
# 3. The snapshot units capture the file states during rollback
# 4. Inspect /tmp/drift-snapshots/ on the node to verify rollback correctness
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: mc-tc-88821-rollback
spec:
config:
ignition:
version: 3.5.0
passwd:
users:
- name: core
# Password hash with "v1" marker at the end
passwordHash: $6$rounds=5000$saltsalt$abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSv1
# SSH key with "v1" marker in the email
sshAuthorizedKeys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDtest1234567890 testv1@mco
storage:
files:
# Test file with "v1" marker in content
- contents:
source: data:,mco-test-content-v1%0A
mode: 0644
overwrite: true
path: /etc/mco-test-file.conf
systemd:
units:
# Test unit with "v1" marker in environment variable
- contents: |
[Unit]
Description=MCO test unit
[Service]
Type=oneshot
ExecStart=/bin/true
Environment="MCO_TEST_UNIT=v1"
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-test.service
# Test dropin on an existing unit with "v1" marker
- dropins:
- contents: |
[Service]
Environment="MCO_TEST=v1"
name: 99-mco-test.conf
name: wait-for-ipsec-connect.service
#
# --- Monitoring units below ---
# These units watch the tracked files and snapshot them to
# /tmp/drift-snapshots/<timestamp>/ whenever a change is detected.
# Each snapshot directory contains one file per tracked path.
# If a file is absent, the snapshot contains "ABSENT" instead.
#
# Snapshot service: copies all tracked files into a timestamped directory.
# The actual work runs in a background subshell so bash exits immediately,
# allowing PathChanged to re-arm and catch rapid successive writes.
- contents: |
[Unit]
Description=Snapshot all tracked files on change
[Service]
Type=forking
ExecStart=/bin/bash -c '\
( dir=/tmp/drift-snapshots/$(date +%%Y%%m%%d-%%H%%M%%S.%%N); \
mkdir -p $dir; \
for f in \
/etc/mco-test-file.conf \
/etc/mco-test-new-file.conf \
/etc/systemd/system/mco-test.service \
/etc/systemd/system/mco-test-new.service \
/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test.conf \
/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test-new.conf \
/home/core/.ssh/authorized_keys.d/ignition \
/etc/shadow; \
do \
name=$(echo $f | sed "s|/|_|g"); \
if [ -f "$f" ]; then \
cp "$f" "$dir/$name"; \
else \
echo "ABSENT" > "$dir/$name"; \
fi; \
done ) &'
name: mco-drift-snapshot.service
# Path unit: watches files that already exist for modifications.
# When any of them changes, triggers mco-drift-snapshot.service.
- contents: |
[Unit]
Description=Watch existing tracked files for changes
[Path]
PathChanged=/etc/mco-test-file.conf
PathChanged=/etc/systemd/system/mco-test.service
PathChanged=/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test.conf
PathChanged=/home/core/.ssh/authorized_keys.d/ignition
PathChanged=/etc/shadow
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-drift-snapshot.path
# Path unit: watches for new files that don't exist yet in v1 but
# will be created by v2. Uses PathExists to trigger when they appear.
- contents: |
[Unit]
Description=Watch for new files being created
[Path]
PathExists=/etc/mco-test-new-file.conf
PathExists=/etc/systemd/system/mco-test-new.service
PathExists=/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test-new.conf
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-drift-snapshot-new.path
# Service triggered when new files appear. Takes a snapshot and then
# starts the removal watcher so we also capture when they disappear.
- contents: |
[Unit]
Description=Snapshot when new files appear then watch for removal
[Service]
Type=oneshot
ExecStart=/bin/bash -c '\
/usr/bin/systemctl start mco-drift-snapshot.service; \
/usr/bin/systemctl start mco-drift-snapshot-removal.path'
name: mco-drift-snapshot-new.service
# Path unit: watches the new files for removal (PathChanged triggers
# when a watched file is deleted or modified after it existed).
- contents: |
[Unit]
Description=Watch for new files being removed
[Path]
PathChanged=/etc/mco-test-new-file.conf
PathChanged=/etc/systemd/system/mco-test-new.service
PathChanged=/etc/systemd/system/wait-for-ipsec-connect.service.d/99-mco-test-new.conf
[Install]
WantedBy=multi-user.target
name: mco-drift-snapshot-removal.path
# Service triggered when new files are removed. Takes a snapshot
# so we can verify they were correctly cleaned up during rollback.
- contents: |
[Unit]
Description=Snapshot when new files are removed
[Service]
Type=oneshot
ExecStart=/usr/bin/systemctl start mco-drift-snapshot.service
name: mco-drift-snapshot-removal.service
EOF
machineconfig.machineconfiguration.openshift.io/mc-tc-88821-rollback created
- Wait for mcp to get updated
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-0e6a0a208d6ca044cb7229e1fe17e470 True False False 3 3 3 0 113m
worker rendered-worker-86cc7e94318fb2b2005740e7a02d548b True False False 3 3 3 0 113m
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mc mc-tc-88821-rollback
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
mc-tc-88821-rollback 3.5.0 16m
- Apply the second MC
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ cat << 'EOF' | oc apply -f -
# MC v2: The "bad" config. Applied after v1 to trigger a rollback.
# It modifies every field from v1 (changing "v1" markers to "v2") and also
# adds new resources that don't exist in v1 (a new file, a new unit, and
# a new dropin). This lets us test rollback of both modified and newly
# created resources.
#
# The osImageURL points to a non-bootable image that will pull successfully
# but fail during rpm-ostree rebase, triggering the MCD rollback path.
#
# After rollback, the monitoring units from v1 should capture snapshots
# showing whether files were correctly reverted to v1 content and whether
# newly created files were properly removed.
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: mc-tc-88821-rollback-v2
spec:
config:
ignition:
version: 3.5.0
passwd:
users:
- name: core
# Password hash changed: "v1" -> "v2" marker
passwordHash: $6$rounds=5000$saltsalt$zyxwvutsrqponmlkjihgfedcba9876543210ZYXWVUTSRQPONMLKJIv2
# SSH key changed: "v1" -> "v2" marker in email
sshAuthorizedKeys:
- ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDmodified9876543210 testv2@mco
storage:
files:
# Same file as v1, content changed: "v1" -> "v2"
- contents:
source: data:,mco-test-content-v2%0A
mode: 0644
overwrite: true
path: /etc/mco-test-file.conf
# New file that does not exist in v1.
# Should be removed during rollback.
- contents:
source: data:,mco-test-new-file-v2%0A
mode: 0644
overwrite: true
path: /etc/mco-test-new-file.conf
systemd:
units:
# Same unit as v1, environment changed: "v1" -> "v2"
- contents: |
[Unit]
Description=MCO test unit
[Service]
Type=oneshot
ExecStart=/bin/true
Environment="MCO_TEST_UNIT=v2"
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-test.service
# New unit that does not exist in v1.
# Should be removed during rollback.
- contents: |
[Unit]
Description=MCO test new unit
[Service]
Type=oneshot
ExecStart=/bin/true
Environment="MCO_TEST_NEW_UNIT=v2"
[Install]
WantedBy=multi-user.target
enabled: true
name: mco-test-new.service
- dropins:
# Same dropin as v1, environment changed: "v1" -> "v2"
- contents: |
[Service]
Environment="MCO_TEST=v2"
name: 99-mco-test.conf
# New dropin that does not exist in v1.
# Should be removed during rollback.
- contents: |
[Service]
Environment="MCO_TEST_NEW_DROPIN=v2"
name: 99-mco-test-new.conf
name: wait-for-ipsec-connect.service
extensions: []
kernelArguments: []
osImageURL: quay.io/openshifttest/tc88821fakeimage:latest
EOF
machineconfig.machineconfiguration.openshift.io/mc-tc-88821-rollback-v2 created
- Wait for mcp to degrade
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-0e6a0a208d6ca044cb7229e1fe17e470 True False False 3 3 3 0 174m
worker rendered-worker-86cc7e94318fb2b2005740e7a02d548b False True True 3 0 0 1 174m
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp -oyaml
---
- lastTransitionTime: "2026-05-25T06:31:39Z"
message: 'Node ip-10-0-28-47.us-east-2.compute.internal is reporting: "Failed
to update OS to quay.io/openshifttest/tc88821fakeimage:latest after retries:
timed out waiting for the condition"'
reason: ""
status: "True"
type: Degraded
---
- Check all rollback snapshots
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ NODE=${1:-$(oc get nodes -l node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.name}')}
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== /etc/mco-test-file.conf ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(cat $d/_etc_mco-test-file.conf 2>/dev/null)"; done' 2>/dev/null
=== /etc/mco-test-file.conf ===
---
20260525-065228.563578464: mco-test-content-v2
20260525-065229.264861482: mco-test-content-v2
20260525-065350.322576662: mco-test-content-v1
20260525-065351.342148316: mco-test-content-v1
20260525-065352.043775719: mco-test-content-v2
20260525-065352.405369300: mco-test-content-v2
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== mco-test.service ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST_UNIT $d/_etc_systemd_system_mco-test.service 2>/dev/null || cat $d/_etc_systemd_system_mco-test.service 2>/dev/null)"; done' 2>/dev/null
=== mco-test.service ===
---
20260525-064122.430325160: Environment="MCO_TEST_UNIT=v2"
20260525-064243.726262914: Environment="MCO_TEST_UNIT=v1"
20260525-064244.347936623: Environment="MCO_TEST_UNIT=v1"
20260525-064245.045266439: Environment="MCO_TEST_UNIT=v1"
20260525-064245.742318690: Environment="MCO_TEST_UNIT=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== 99-mco-test.conf dropin ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test.conf 2>/dev/null || cat $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test.conf 2>/dev/null)"; done' 2>/dev/null
=== 99-mco-test.conf dropin ===
---
20260525-065228.563578464: Environment="MCO_TEST=v2"
20260525-065229.264861482: Environment="MCO_TEST=v2"
20260525-065350.322576662: Environment="MCO_TEST=v1"
20260525-065351.342148316: Environment="MCO_TEST=v1"
20260525-065352.043775719: Environment="MCO_TEST=v1"
20260525-065352.405369300: Environment="MCO_TEST=v2"
20260525-065352.721480554: Environment="MCO_TEST=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== /etc/mco-test-new-file.conf (new in v2) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(cat $d/_etc_mco-test-new-file.conf 2>/dev/null)"; done' 2>/dev/null
=== /etc/mco-test-new-file.conf (new in v2) ===
---
20260525-063712.004899458: mco-test-new-file-v2
20260525-063832.996352879: mco-test-new-file-v2
20260525-063833.980572878: ABSENT
20260525-063834.660253924: mco-test-new-file-v2
20260525-063835.346475505: mco-test-new-file-v2
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo ""
echo "=== mco-test-new.service (new in v2) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST_NEW_UNIT $d/_etc_systemd_system_mco-test-new.service 2>/dev/null || cat $d/_etc_systemd_system_mco-test-new.service 2>/dev/null)"; done' 2>/dev/null
=== mco-test-new.service (new in v2) ===
---
20260525-063135.167334671: Environment="MCO_TEST_NEW_UNIT=v2"
20260525-063135.872326523: Environment="MCO_TEST_NEW_UNIT=v2"
20260525-063256.804610679: Environment="MCO_TEST_NEW_UNIT=v2"
20260525-063257.818194593: ABSENT
20260525-063258.482456911: Environment="MCO_TEST_NEW_UNIT=v2"
20260525-063259.168154146: Environment="MCO_TEST_NEW_UNIT=v2"
20260525-063420.177067490: Environment="MCO_TEST_NEW_UNIT=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== 99-mco-test-new.conf dropin (new in v2) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep MCO_TEST_NEW_DROPIN $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test-new.conf 2>/dev/null || cat $d/_etc_systemd_system_wait-for-ipsec-connect.service.d_99-mco-test-new.conf 2>/dev/null)"; done' 2>/dev/null
=== 99-mco-test-new.conf dropin (new in v2) ===
---
20260525-064655.624162046: Environment="MCO_TEST_NEW_DROPIN=v2"
20260525-064817.069618817: Environment="MCO_TEST_NEW_DROPIN=v2"
20260525-064817.690478231: ABSENT
20260525-064818.397978322: Environment="MCO_TEST_NEW_DROPIN=v2"
20260525-064819.091092539: Environment="MCO_TEST_NEW_DROPIN=v2"
20260525-064940.501833525: Environment="MCO_TEST_NEW_DROPIN=v2"
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== authorized_keys (ssh) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep -o "test[^ ]*@mco[^ ]*" $d/_home_core_.ssh_authorized_keys.d_ignition 2>/dev/null)"; done' 2>/dev/null
=== authorized_keys (ssh) ===
---
20260525-072146.602398384: testv1@mco
testv2@mco
20260525-072307.498317247: testv1@mco
testv2@mco
20260525-072308.431360266: testv1@mco
testv2@mco
---
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ echo "=== shadow (core password) ==="
oc debug -q node/$NODE -- chroot /host bash -c 'for d in $(ls -d /tmp/drift-snapshots/* 2>/dev/null | sort); do echo "$(basename $d): $(grep core $d/_etc_shadow 2>/dev/null | cut -d: -f2 | grep -o "..$")"; done' 2>/dev/null
=== shadow (core password) ===
---
20260525-063002.442174401: v1
20260525-063003.116503127: v1
20260525-063003.523581425: v2
20260525-063003.555410190: v2
20260525-063003.605997825: v2
20260525-063003.647746035: v2
20260525-063003.690508080: v2
20260525-063133.864891869: v2
20260525-063134.485040551: v2
---
- Cleanup
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc delete mc mc-tc-88821-rollback-v2
machineconfig.machineconfiguration.openshift.io "mc-tc-88821-rollback-v2" deleted
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc patch node "$NODE" -p "{\"metadata\":{\"annotations\":{\"machineconfiguration.openshift.io/desiredConfig\":\"$(oc get node "$NODE" -o jsonpath='{.metadata.annotations.machineconfiguration\.openshift\.io/currentConfig}')\"}}}"
node/ip-10-0-28-47.us-east-2.compute.internal patched
harshpat@harshpat-thinkpadp1gen4i:~/Downloads$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-0e6a0a208d6ca044cb7229e1fe17e470 True False False 3 3 3 0 3h2m
worker rendered-worker-86cc7e94318fb2b2005740e7a02d548b True False False 3 3 3 0 3h2m
|
/verified by @HarshwardhanPatil07 here |
|
@HarshwardhanPatil07: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@pablintino: This pull request references Jira Issue OCPBUGS-86233, which is valid. 7 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is an automated cherry-pick of #6002
/assign pablintino