Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions vhdbuilder/packer/install-dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,25 @@ fi

capture_benchmark "${SCRIPT_NAME}_install_artifact_streaming"

# Enable discard_unpacked_layers during VHD build to reduce VHD size.
# Containerd will discard unpacked layer data after it's been committed to a snapshot.
# This config is only used during VHD build; CSE overwrites it at provisioning time.
echo "Enabling discard_unpacked_layers for VHD build"
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed command on line 536 uses a regular expression pattern that may not be portable across all OS distributions. The [[:space:]]*$ pattern might behave differently on Ubuntu vs Azure Linux/Mariner. Additionally, the pattern assumes a specific formatting of the TOML file from containerd config default, which could vary across containerd versions. Consider testing this with multiple containerd versions to ensure it works reliably, or use a more robust TOML parser like tomlq or yq to manipulate the config file.

Suggested change
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
sed -i '/^\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\]/a\ discard_unpacked_layers = true' /etc/containerd/config.toml

Copilot uses AI. Check for mistakes.
Comment on lines +533 to +536
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sed replacement on line 533 only works if containerd's default config contains the exact text "discard_unpacked_layers = false". If the default config format changes in future containerd versions (e.g., uses different spacing, "= false" becomes "=false", or the setting is commented out), this replacement will silently fail, and the fallback insertion at line 536 will be needed. However, if both fail, the script continues without the intended optimization and without any error indication.

Consider making the logic more robust:

  1. Always check after both sed commands whether the setting is present
  2. Exit with an error if neither method succeeded
  3. Log which method succeeded for debugging purposes
Suggested change
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
# First, try to flip the default value from false to true if present.
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# Check if the setting is now present and enabled (allowing for flexible whitespace).
if grep -q 'discard_unpacked_layers[[:space:]]*=[[:space:]]*true' /etc/containerd/config.toml; then
echo "discard_unpacked_layers enabled via replacement in default containerd config"
else
echo "discard_unpacked_layers not found after replacement; attempting insertion into containerd config"
# If discard_unpacked_layers was not in the default config, add it under the correct section.
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
# Verify that the insertion succeeded.
if grep -q 'discard_unpacked_layers[[:space:]]*=[[:space:]]*true' /etc/containerd/config.toml; then
echo "discard_unpacked_layers enabled via insertion into containerd config"
else
echo "Error: failed to enable discard_unpacked_layers in /etc/containerd/config.toml" >&2
exit 1
fi

Copilot uses AI. Check for mistakes.
fi
systemctl restart containerd
for i in $(seq 1 30); do
if ctr version >/dev/null 2>&1; then
break
fi
sleep 1
done
Comment on lines +527 to +544
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change lacks test coverage. The VHD build process now enables discard_unpacked_layers, which fundamentally changes how containerd handles layer data during VHD build. This should have:

  1. E2E tests to verify VHD images work correctly after this change
  2. Tests to ensure the CoreDNS binary extraction still works (since it depends on image mount operations)
  3. Tests to verify the containerd config cleanup doesn't break provisioning

Consider adding test cases in the e2e test suite to validate these behaviors.

Copilot uses AI. Check for mistakes.

Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The containerd restart wait loop lacks error handling for the timeout case. After 30 seconds (lines 539-544), if ctr version never succeeds, the script continues anyway. This could lead to subsequent operations failing in confusing ways. Add explicit error handling after the loop:

if ! ctr version >/dev/null 2>&1; then
  echo "ERROR: containerd failed to become ready after 30 seconds"
  exit $ERR_CONTAINERD_INSTALL_TIMEOUT
fi
Suggested change
if ! ctr version >/dev/null 2>&1; then
echo "ERROR: containerd failed to become ready after 30 seconds"
exit $ERR_CONTAINERD_INSTALL_TIMEOUT
fi

Copilot uses AI. Check for mistakes.
Comment on lines +527 to +545
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Setting discard_unpacked_layers = true before pulling images will cause the extractAndCacheCoreDnsBinary function to fail. The function at line 976 uses ctr -n k8s.io images mount to extract the CoreDNS binary, but this operation requires the unpacked layer data that containerd will have discarded. When discard_unpacked_layers is enabled, containerd removes unpacked layer data after committing to snapshots, making it impossible to mount images for filesystem access later.

This configuration should be enabled AFTER extractAndCacheCoreDnsBinary completes (after line 1008), not before image pulls. Move this entire block to after line 1008 to avoid breaking the CoreDNS binary extraction.

Suggested change
# Enable discard_unpacked_layers during VHD build to reduce VHD size.
# Containerd will discard unpacked layer data after it's been committed to a snapshot.
# This config is only used during VHD build; CSE overwrites it at provisioning time.
echo "Enabling discard_unpacked_layers for VHD build"
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
sed -i 's/discard_unpacked_layers = false/discard_unpacked_layers = true/' /etc/containerd/config.toml
# If discard_unpacked_layers was not in the default config, add it under the correct section
if ! grep -q "discard_unpacked_layers = true" /etc/containerd/config.toml; then
sed -i '/\[plugins\."io\.containerd\.grpc\.v1\.cri"\.containerd\][[:space:]]*$/a\ discard_unpacked_layers = true' /etc/containerd/config.toml
fi
systemctl restart containerd
for i in $(seq 1 30); do
if ctr version >/dev/null 2>&1; then
break
fi
sleep 1
done

Copilot uses AI. Check for mistakes.
# k8s will use images in the k8s.io namespaces - create it
ctr namespace create k8s.io
cliTool="ctr"
Expand Down Expand Up @@ -988,6 +1007,15 @@ extractAndCacheCoreDnsBinary() {

extractAndCacheCoreDnsBinary

# Remove the temporary VHD-build-time containerd config so it doesn't persist in the VHD.
# CSE generates the production config at provisioning time.
if [ -f /etc/containerd/config.toml ]; then
rm -f /etc/containerd/config.toml
systemctl restart containerd || true
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The containerd restart at line 1014 should check the exit status. Currently it uses || true which silently ignores failures. After removing the config.toml, if containerd fails to restart, the VHD build should fail rather than continue with a broken containerd service. Consider using proper error handling or at least logging the failure before continuing.

Suggested change
systemctl restart containerd || true
if ! systemctl restart containerd; then
echo "Failed to restart containerd after removing /etc/containerd/config.toml" >&2
exit 1
fi

Copilot uses AI. Check for mistakes.
fi
Comment on lines +1010 to +1015
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "CSE generates the production config at provisioning time" but the config removal (lines 1012-1015) happens unconditionally. This means if any future code in this script tries to use containerd after line 1015, it will be running with no config file or potentially stale config. Consider:

  1. Verifying no code after this point depends on containerd operations
  2. Adding a comment warning that no containerd operations should happen after this point
  3. Or keeping the config and letting CSE overwrite it (as the comment at line 529 claims happens anyway)

Copilot uses AI. Check for mistakes.

echo "containerd directory space: $(du -sh /var/lib/containerd)" >> ${VHD_LOGS_FILEPATH}
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states this change "saves 1GB" but doesn't specify what the baseline was. The log at line 1017 will help track this metric going forward, but consider also logging the space usage BEFORE enabling discard_unpacked_layers to better quantify the actual savings in VHD build logs. This would help validate whether the complexity and risk of this change is worth the space savings, especially given the critical bug with CoreDNS binary extraction.

Copilot uses AI. Check for mistakes.

rm -f ./azcopy # cleanup immediately after usage will return in two downloads
echo "install-dependencies step completed successfully"

Expand Down
Loading