Skip to content

test: discard_unpacked_layers = true#7920

Draft
awesomenix wants to merge 1 commit intomainfrom
nishp/discard_containerd_layers
Draft

test: discard_unpacked_layers = true#7920
awesomenix wants to merge 1 commit intomainfrom
nishp/discard_containerd_layers

Conversation

@awesomenix
Copy link
Collaborator

@awesomenix awesomenix commented Feb 21, 2026

This PR sets discard_unpacked_layers = true in containerd before pulling images, which removes all the compressed files

i checked the VM and it doesnt save much all it saves is 1GB

20G     /var/lib/containerd
root@0uo4-2026-02-20-ubuntu2204messageoftheday000000:~# sudo du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/
14G     /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/
root@0uo4-2026-02-20-ubuntu2204messageoftheday000000:~# sudo du -sh /var/lib/containerd/io.containerd.content.v1.content/blobs/
6.1G    /var/lib/containerd/io.containerd.content.v1.content/blobs/

I checked that we dont store .tar files anymore using content ls

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Copilot AI review requested due to automatic review settings February 21, 2026 01:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables discard_unpacked_layers = true in containerd's config during VHD build to reduce VHD size by removing compressed layer data after images are pulled. According to the PR description, this saves approximately 1GB of disk space, reducing /var/lib/containerd from 21GB to 20GB.

Changes:

  • Enables discard_unpacked_layers in containerd config before pulling container images during VHD build
  • Removes the temporary containerd config after VHD build completes, before generating the VHD
  • Adds logging of containerd directory space usage to VHD logs

# CSE generates the production config at provisioning time.
if [ -f /etc/containerd/config.toml ]; then
rm -f /etc/containerd/config.toml
systemctl restart containerd || true
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The containerd restart at line 1014 should check the exit status. Currently it uses || true which silently ignores failures. After removing the config.toml, if containerd fails to restart, the VHD build should fail rather than continue with a broken containerd service. Consider using proper error handling or at least logging the failure before continuing.

Suggested change
systemctl restart containerd || true
if ! systemctl restart containerd; then
echo "Failed to restart containerd after removing /etc/containerd/config.toml" >&2
exit 1
fi

Copilot uses AI. Check for mistakes.
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed command on line 536 uses a regular expression pattern that may not be portable across all OS distributions. The [[:space:]]*$ pattern might behave differently on Ubuntu vs Azure Linux/Mariner. Additionally, the pattern assumes a specific formatting of the TOML file from containerd config default, which could vary across containerd versions. Consider testing this with multiple containerd versions to ensure it works reliably, or use a more robust TOML parser like tomlq or yq to manipulate the config file.

Suggested change
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
sed -i '/^\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\]/a\ discard_unpacked_layers = true' /etc/containerd/config.toml

Copilot uses AI. Check for mistakes.
Comment on lines +527 to +544
# Enable discard_unpacked_layers during VHD build to reduce VHD size.
# Containerd will discard unpacked layer data after it's been committed to a snapshot.
# This config is only used during VHD build; CSE overwrites it at provisioning time.
echo "Enabling discard_unpacked_layers for VHD build"
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
fi
systemctl restart containerd
for i in $(seq 1 30); do
if ctr version >/dev/null 2>&1; then
break
fi
sleep 1
done
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change lacks test coverage. The VHD build process now enables discard_unpacked_layers, which fundamentally changes how containerd handles layer data during VHD build. This should have:

  1. E2E tests to verify VHD images work correctly after this change
  2. Tests to ensure the CoreDNS binary extraction still works (since it depends on image mount operations)
  3. Tests to verify the containerd config cleanup doesn't break provisioning

Consider adding test cases in the e2e test suite to validate these behaviors.

Copilot uses AI. Check for mistakes.
fi
sleep 1
done

Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The containerd restart wait loop lacks error handling for the timeout case. After 30 seconds (lines 539-544), if ctr version never succeeds, the script continues anyway. This could lead to subsequent operations failing in confusing ways. Add explicit error handling after the loop:

if ! ctr version >/dev/null 2>&1; then
  echo "ERROR: containerd failed to become ready after 30 seconds"
  exit $ERR_CONTAINERD_INSTALL_TIMEOUT
fi
Suggested change
if ! ctr version >/dev/null 2>&1; then
echo "ERROR: containerd failed to become ready after 30 seconds"
exit $ERR_CONTAINERD_INSTALL_TIMEOUT
fi

Copilot uses AI. Check for mistakes.
Comment on lines +1010 to +1015
# Remove the temporary VHD-build-time containerd config so it doesn't persist in the VHD.
# CSE generates the production config at provisioning time.
if [ -f /etc/containerd/config.toml ]; then
rm -f /etc/containerd/config.toml
systemctl restart containerd || true
fi
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "CSE generates the production config at provisioning time" but the config removal (lines 1012-1015) happens unconditionally. This means if any future code in this script tries to use containerd after line 1015, it will be running with no config file or potentially stale config. Consider:

  1. Verifying no code after this point depends on containerd operations
  2. Adding a comment warning that no containerd operations should happen after this point
  3. Or keeping the config and letting CSE overwrite it (as the comment at line 529 claims happens anyway)

Copilot uses AI. Check for mistakes.
Comment on lines +533 to +536
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed replacement on line 533 only works if containerd's default config contains the exact text "discard_unpacked_layers = false". If the default config format changes in future containerd versions (e.g., uses different spacing, "= false" becomes "=false", or the setting is commented out), this replacement will silently fail, and the fallback insertion at line 536 will be needed. However, if both fail, the script continues without the intended optimization and without any error indication.

Consider making the logic more robust:

  1. Always check after both sed commands whether the setting is present
  2. Exit with an error if neither method succeeded
  3. Log which method succeeded for debugging purposes
Suggested change
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
# First, try to flip the default value from false to true if present.
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# Check if the setting is now present and enabled (allowing for flexible whitespace).
if grep -q 'discard_unpacked_layers[[:space:]]*=[[:space:]]*true' /etc/containerd/config.toml; then
echo "discard_unpacked_layers enabled via replacement in default containerd config"
else
echo "discard_unpacked_layers not found after replacement; attempting insertion into containerd config"
# If discard_unpacked_layers was not in the default config, add it under the correct section.
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
# Verify that the insertion succeeded.
if grep -q 'discard_unpacked_layers[[:space:]]*=[[:space:]]*true' /etc/containerd/config.toml; then
echo "discard_unpacked_layers enabled via insertion into containerd config"
else
echo "Error: failed to enable discard_unpacked_layers in /etc/containerd/config.toml" >&2
exit 1
fi

Copilot uses AI. Check for mistakes.
systemctl restart containerd || true
fi

echo "containerd directory space: $(du -sh /var/lib/containerd)" >> ${VHD_LOGS_FILEPATH}
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states this change "saves 1GB" but doesn't specify what the baseline was. The log at line 1017 will help track this metric going forward, but consider also logging the space usage BEFORE enabling discard_unpacked_layers to better quantify the actual savings in VHD build logs. This would help validate whether the complexity and risk of this change is worth the space savings, especially given the critical bug with CoreDNS binary extraction.

Copilot uses AI. Check for mistakes.
Comment on lines +527 to +545
# Enable discard_unpacked_layers during VHD build to reduce VHD size.
# Containerd will discard unpacked layer data after it's been committed to a snapshot.
# This config is only used during VHD build; CSE overwrites it at provisioning time.
echo "Enabling discard_unpacked_layers for VHD build"
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
fi
systemctl restart containerd
for i in $(seq 1 30); do
if ctr version >/dev/null 2>&1; then
break
fi
sleep 1
done

Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Setting discard_unpacked_layers = true before pulling images will cause the extractAndCacheCoreDnsBinary function to fail. The function at line 976 uses ctr -n k8s.io images mount to extract the CoreDNS binary, but this operation requires the unpacked layer data that containerd will have discarded. When discard_unpacked_layers is enabled, containerd removes unpacked layer data after committing to snapshots, making it impossible to mount images for filesystem access later.

This configuration should be enabled AFTER extractAndCacheCoreDnsBinary completes (after line 1008), not before image pulls. Move this entire block to after line 1008 to avoid breaking the CoreDNS binary extraction.

Suggested change
# Enable discard_unpacked_layers during VHD build to reduce VHD size.
# Containerd will discard unpacked layer data after it's been committed to a snapshot.
# This config is only used during VHD build; CSE overwrites it at provisioning time.
echo "Enabling discard_unpacked_layers for VHD build"
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
fi
systemctl restart containerd
for i in $(seq 1 30); do
if ctr version >/dev/null 2>&1; then
break
fi
sleep 1
done

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants