test: setup packet capture steps for debugging secure TLS bootstrapping flakes#7939
test: setup packet capture steps for debugging secure TLS bootstrapping flakes#7939cameronmeissner wants to merge 4 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a systemd-based tcpdump capture step intended to help debug secure TLS bootstrapping flakes, wiring it into the Linux cloud-init/CSE flow and updating generated CustomData test snapshots.
Changes:
- Add
aks-pcap.serviceto Linux node custom data and start it during secure TLS bootstrapping configuration. - Install
tcpdumpon Mariner/AzureLinux images. - Modify
aks-log-collector.shbehavior, including max-zip-size handling and (notably) reducing the default set of collected artifacts.
Reviewed changes
Copilot reviewed 22 out of 70 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/agent/testdata/CustomizedImageLinuxGuard/CustomData | Regenerated CustomData snapshot reflecting added unit and other cloud-init changes. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated CustomData snapshot reflecting added unit and other cloud-init changes. |
| parts/linux/cloud-init/nodecustomdata.yml | Writes aks-pcap.service into the node via cloud-init. |
| parts/linux/cloud-init/artifacts/mariner/cse_install_mariner.sh | Adds tcpdump to Mariner/AzureLinux dependency install list. |
| parts/linux/cloud-init/artifacts/cse_helpers.sh | Introduces a new error code for aks-pcap startup failure. |
| parts/linux/cloud-init/artifacts/cse_config.sh | Starts/enables aks-pcap when configuring secure TLS bootstrapping. |
| parts/linux/cloud-init/artifacts/aks-log-collector.sh | Updates zip-size enforcement logic and changes what gets collected by default. |
| # adding this last file made the zip go over that size, remove that file and try the next one. | ||
| # Using continue instead of break ensures smaller subsequent files can still be included. | ||
| FILE_SIZE=$(stat --printf "%s" "${ZIP}") | ||
| if [ "$FILE_SIZE" -ge "$MAX_SIZE" ]; then | ||
| echo "WARNING: ZIP file size $FILE_SIZE >= $MAX_SIZE after adding $file; removing it and trying next file." | ||
| zip -d "${ZIP}" "$file" |
There was a problem hiding this comment.
The ENFORCE_MAX_ZIP_SIZE logic change introduces a subtle behavioral difference that could cause issues:
Old behavior (lines removed): When zip exceeded MAX_SIZE, the script would:
- Remove the last file that pushed it over
- STOP processing (break)
New behavior (lines 150-169): When zip exceeds MAX_SIZE:
- Skip adding more files until a file is added
- If that file pushes over limit, remove it and CONTINUE trying next files
- Only stops when hitting the pre-check at line 153
Problem: The continue logic (line 165-169) is inside the "if test -e" block, so smaller files after a large one can still be added. However, the pre-check (line 150-157) will prevent ANY files from being added once the limit is hit, creating inconsistent behavior depending on file ordering.
Recommendation: Simplify to always break when limit is exceeded (restore original behavior), OR move the pre-check outside the file loop for consistent "stop when full" behavior.
| # adding this last file made the zip go over that size, remove that file and try the next one. | |
| # Using continue instead of break ensures smaller subsequent files can still be included. | |
| FILE_SIZE=$(stat --printf "%s" "${ZIP}") | |
| if [ "$FILE_SIZE" -ge "$MAX_SIZE" ]; then | |
| echo "WARNING: ZIP file size $FILE_SIZE >= $MAX_SIZE after adding $file; removing it and trying next file." | |
| zip -d "${ZIP}" "$file" | |
| # adding this last file made the zip go over that size, remove that file and stop adding more. | |
| FILE_SIZE=$(stat --printf "%s" "${ZIP}") | |
| if [ "$FILE_SIZE" -ge "$MAX_SIZE" ]; then | |
| echo "WARNING: ZIP file size $FILE_SIZE >= $MAX_SIZE after adding $file; removing it and stopping further additions." | |
| zip -d "${ZIP}" "$file" | |
| break |
What this PR does / why we need it:
for debugging only - NOT FOR MERGE
Which issue(s) this PR fixes:
Fixes #