Skip to content

test: setup packet capture steps for debugging secure TLS bootstrapping flakes#7939

Draft
cameronmeissner wants to merge 4 commits intomainfrom
cameissner/stls-pcap-debug
Draft

test: setup packet capture steps for debugging secure TLS bootstrapping flakes#7939
cameronmeissner wants to merge 4 commits intomainfrom
cameissner/stls-pcap-debug

Conversation

@cameronmeissner
Copy link
Contributor

@cameronmeissner cameronmeissner commented Feb 24, 2026

What this PR does / why we need it:

for debugging only - NOT FOR MERGE

Which issue(s) this PR fixes:

Fixes #

Copilot AI review requested due to automatic review settings February 24, 2026 22:26
@cameronmeissner cameronmeissner changed the title test: setup packet capture steps for debugging secure TLS bootstrappi… test: setup packet capture steps for debugging secure TLS bootstrapping flakes Feb 24, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a systemd-based tcpdump capture step intended to help debug secure TLS bootstrapping flakes, wiring it into the Linux cloud-init/CSE flow and updating generated CustomData test snapshots.

Changes:

  • Add aks-pcap.service to Linux node custom data and start it during secure TLS bootstrapping configuration.
  • Install tcpdump on Mariner/AzureLinux images.
  • Modify aks-log-collector.sh behavior, including max-zip-size handling and (notably) reducing the default set of collected artifacts.

Reviewed changes

Copilot reviewed 22 out of 70 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData Regenerated CustomData snapshot reflecting added unit and other cloud-init changes.
pkg/agent/testdata/CustomizedImage/CustomData Regenerated CustomData snapshot reflecting added unit and other cloud-init changes.
parts/linux/cloud-init/nodecustomdata.yml Writes aks-pcap.service into the node via cloud-init.
parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh Adds tcpdump to Mariner/AzureLinux dependency install list.
parts/linux/cloud-init/artifacts/cse_helpers.sh Introduces a new error code for aks-pcap startup failure.
parts/linux/cloud-init/artifacts/cse_config.sh Starts/enables aks-pcap when configuring secure TLS bootstrapping.
parts/linux/cloud-init/artifacts/aks-log-collector.sh Updates zip-size enforcement logic and changes what gets collected by default.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 78 changed files in this pull request and generated 6 comments.

Comment on lines +163 to +168
# adding this last file made the zip go over that size, remove that file and try the next one.
# Using continue instead of break ensures smaller subsequent files can still be included.
FILE_SIZE=$(stat --printf "%s" "${ZIP}")
if [ "$FILE_SIZE" -ge "$MAX_SIZE" ]; then
echo "WARNING: ZIP file size $FILE_SIZE >= $MAX_SIZE after adding $file; removing it and trying next file."
zip -d "${ZIP}" "$file"
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ENFORCE_MAX_ZIP_SIZE logic change introduces a subtle behavioral difference that could cause issues:

Old behavior (lines removed): When zip exceeded MAX_SIZE, the script would:

  1. Remove the last file that pushed it over
  2. STOP processing (break)

New behavior (lines 150-169): When zip exceeds MAX_SIZE:

  1. Skip adding more files until a file is added
  2. If that file pushes over limit, remove it and CONTINUE trying next files
  3. Only stops when hitting the pre-check at line 153

Problem: The continue logic (line 165-169) is inside the "if test -e" block, so smaller files after a large one can still be added. However, the pre-check (line 150-157) will prevent ANY files from being added once the limit is hit, creating inconsistent behavior depending on file ordering.

Recommendation: Simplify to always break when limit is exceeded (restore original behavior), OR move the pre-check outside the file loop for consistent "stop when full" behavior.

Suggested change
# adding this last file made the zip go over that size, remove that file and try the next one.
# Using continue instead of break ensures smaller subsequent files can still be included.
FILE_SIZE=$(stat --printf "%s" "${ZIP}")
if [ "$FILE_SIZE" -ge "$MAX_SIZE" ]; then
echo "WARNING: ZIP file size $FILE_SIZE >= $MAX_SIZE after adding $file; removing it and trying next file."
zip -d "${ZIP}" "$file"
# adding this last file made the zip go over that size, remove that file and stop adding more.
FILE_SIZE=$(stat --printf "%s" "${ZIP}")
if [ "$FILE_SIZE" -ge "$MAX_SIZE" ]; then
echo "WARNING: ZIP file size $FILE_SIZE >= $MAX_SIZE after adding $file; removing it and stopping further additions."
zip -d "${ZIP}" "$file"
break

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants