Summary
When AWF containers fail to start (e.g., Squid crashes on startup in DinD environments), we currently have no diagnostic information because application-level logs (access.log, audit.jsonl) are never written. This makes debugging customer issues require multiple rounds of back-and-forth to gather basic info like docker logs output.
Motivation: A customer running ARC runners with DinD sidecars hit a Squid container crash (exit code 1) where the root cause was invisible — the squid access logs were empty because Squid never started. Diagnosing this required asking the customer to manually add debug steps to their workflow. See #18385.
Proposal
Add a --diagnostic-logs flag (off by default) that collects Docker operational logs on failure and includes them in the firewall-audit-logs artifact under a diagnostics/ subdirectory.
What to collect on failure
| Data |
Command |
Why |
| Container logs |
docker logs <container> for squid, agent, api-proxy, iptables-init |
Captures entrypoint stderr/stdout — shows WHY a container crashed |
| Container exit codes |
docker inspect --format '{{.State.ExitCode}}' |
Quick triage signal |
| Mount inspection |
docker inspect --format '{{json .Mounts}}' |
Shows what Docker actually mounted vs. what was requested (critical for DinD debugging) |
| Sanitized docker-compose.yml |
Strip env vars containing tokens/keys |
Shows the full container config without leaking secrets |
What NOT to collect (even with the flag)
- Raw environment variables (may contain API keys)
- Full
docker inspect output (contains env vars)
- Host filesystem contents
Feature flag behavior
--diagnostic-logs: Opt-in flag, off by default
- When enabled and AWF exits with a non-zero code, collect the above and write to
${auditDir}/diagnostics/ or ${workDir}/diagnostics/
- When disabled (default), no additional data is collected — current behavior preserved
- Consider making this default-on in a future release once validated
Implementation notes
- Collection should happen in the cleanup/error path (
src/cli.ts catch block and signal handlers)
- Use
docker logs with --tail 200 to cap output size
- Sanitize docker-compose.yml by redacting any env var value containing
token, key, secret, password (case-insensitive)
- If a container doesn't exist (already cleaned up), skip gracefully
- Bundle into existing
firewall-audit-logs artifact upload path
Acceptance criteria
Summary
When AWF containers fail to start (e.g., Squid crashes on startup in DinD environments), we currently have no diagnostic information because application-level logs (access.log, audit.jsonl) are never written. This makes debugging customer issues require multiple rounds of back-and-forth to gather basic info like
docker logsoutput.Motivation: A customer running ARC runners with DinD sidecars hit a Squid container crash (exit code 1) where the root cause was invisible — the squid access logs were empty because Squid never started. Diagnosing this required asking the customer to manually add debug steps to their workflow. See #18385.
Proposal
Add a
--diagnostic-logsflag (off by default) that collects Docker operational logs on failure and includes them in thefirewall-audit-logsartifact under adiagnostics/subdirectory.What to collect on failure
docker logs <container>for squid, agent, api-proxy, iptables-initdocker inspect --format '{{.State.ExitCode}}'docker inspect --format '{{json .Mounts}}'What NOT to collect (even with the flag)
docker inspectoutput (contains env vars)Feature flag behavior
--diagnostic-logs: Opt-in flag, off by default${auditDir}/diagnostics/or${workDir}/diagnostics/Implementation notes
src/cli.tscatch block and signal handlers)docker logswith--tail 200to cap output sizetoken,key,secret,password(case-insensitive)firewall-audit-logsartifact upload pathAcceptance criteria
--diagnostic-logsflag added to AWF CLIdiagnostics/subdirectory alongside existing audit artifacts