Skip to content

feat(localdns): add systemd-based Prometheus-format metrics exporter#7917

Open
saewoni wants to merge 54 commits intomainfrom
sakwa/add_localdns_metrics__systemd
Open

feat(localdns): add systemd-based Prometheus-format metrics exporter#7917
saewoni wants to merge 54 commits intomainfrom
sakwa/add_localdns_metrics__systemd

Conversation

@saewoni
Copy link
Contributor

@saewoni saewoni commented Feb 20, 2026

Adds CPU and memory metrics for localdns.service using systemd accounting and socket activation for efficient, zero-overhead monitoring.

Also export IP addresses configured for forward plugins for kubednsoverrides and vnetdnsoverrides

Implementation:

  • localdns_exporter.sh: Scrapes systemd CPUUsageNSec and MemoryCurrent
  • localdns-exporter.socket: Socket activation on port 9353
  • localdns-exporter@.service: Instantiated service per connection
  • Integrated into all VHD builders (Ubuntu, Mariner, Flatcar, all arches)

Metrics exposed:

  • localdns_cpu_usage_seconds_total (counter)
  • localdns_memory_usage_mb (gauge)

Test coverage:

  • e2e/test-localdns-exporter.sh validates exporter functionality

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Adds CPU and memory metrics for localdns.service using systemd
accounting and socket activation for efficient, zero-overhead monitoring.

Implementation:
- localdns_exporter.sh: Scrapes systemd CPUUsageNSec and MemoryCurrent
- localdns-exporter.socket: Socket activation on port 9353
- localdns-exporter@.service: Instantiated service per connection
- Integrated into all VHD builders (Ubuntu, Mariner, Flatcar, all arches)

Metrics exposed:
- localdns_cpu_usage_seconds_total (counter)
- localdns_memory_usage_mb (gauge)

Test coverage:
- e2e/test-localdns-exporter.sh validates exporter functionality

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a lightweight Prometheus-compatible metrics exporter for the localdns.service using systemd socket activation. The exporter exposes CPU and memory metrics on port 9353 with zero overhead when not being scraped, making it suitable for production monitoring.

Changes:

  • Added localdns_exporter.sh bash script that queries systemd accounting metrics and formats them as Prometheus metrics
  • Added systemd socket (localdns-exporter.socket) and service (localdns-exporter@.service) units for on-demand activation
  • Integrated the exporter into all Linux VHD builds (Ubuntu, Mariner, Flatcar, ARM64/x64)
  • Added VHD content validation tests and a standalone test script

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
parts/linux/cloud-init/artifacts/localdns_exporter.sh Bash script that scrapes systemd metrics (CPUUsageNSec, MemoryCurrent) and outputs Prometheus-formatted HTTP response
parts/linux/cloud-init/artifacts/localdns-exporter@.service Systemd template service for per-connection worker instances with security hardening
parts/linux/cloud-init/artifacts/localdns-exporter.socket Systemd socket unit listening on port 9353 with Accept=yes for on-demand activation
vhdbuilder/packer/vhd-image-builder-*.json Added file provisioners to copy exporter artifacts to all Linux VHD variants
vhdbuilder/packer/packer_source.sh Added file copying logic and systemctl enable command for the socket
vhdbuilder/packer/imagecustomizer/azlosguard/azlosguard.yml Added exporter files to OSGuard VHD build but missing socket enablement
vhdbuilder/packer/test/linux-vhd-content-test.sh Added validation for exporter files and permissions
e2e/test-localdns-exporter.sh Standalone test script for manual validation

@saewoni saewoni changed the title feat(localdns): add systemd-based Prometheus metrics exporter feat(localdns): add systemd-based Prometheus-format metrics exporter Feb 20, 2026
Previously, localdns_exporter.sh parsed the corefile on every metrics
scrape (every 15-30s) to extract forward IP addresses. This commit
optimizes the process by:

1. Generating forward IP metrics once when localdns.sh creates the
   updated corefile (replace_azurednsip_in_corefile function)
2. Writing pre-formatted Prometheus metrics to forward_ips.prom
3. Having localdns_exporter.sh simply cat the .prom file instead of
   parsing the corefile

Benefits:
- Eliminates redundant parsing on every metrics scrape
- Reduces localdns_exporter.sh from 137 to 60 lines
- Single source of truth for forward IP extraction
- Faster metrics export (file read vs awk parsing)

Tests added to verify .prom file creation, content format, and
permissions (644). All 488 shellspec tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Let's make sure we have proper test coverages

Use systemctlEnableAndStartNoBlock instead of systemctlEnableAndStart
for localdns-exporter.socket to avoid blocking provisioning for up to
30 seconds on an optional observability component.

The exporter socket is for metrics collection and should not add
latency to node provisioning. Error handling already allows graceful
failure with a warning message.

- Update enableLocalDNS to use systemctlEnableAndStartNoBlock
- Add test coverage for non-blocking behavior
- Add test case for graceful failure when exporter socket fails

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Comment on lines +1189 to +1196
# Enable localdns metrics exporter socket for Prometheus scraping
# This is optional observability - don't block provisioning if it fails
echo "Enabling localdns-exporter.socket for metrics collection."
if systemctlEnableAndStartNoBlock localdns-exporter.socket 30; then
echo "Enable localdns-exporter.socket succeeded."
else
echo "WARNING: Failed to enable localdns-exporter.socket. Metrics will not be available but continuing provisioning."
fi
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change updates parts/linux/cloud-init/artifacts/cse_config.sh, which is snapshot-tested via pkg/agent/testdata/** (CustomData/CSECommand). The PR currently doesn’t include regenerated testdata artifacts; make generate should be run and the updated snapshot files committed, otherwise unit tests comparing generated CustomData/CSE output are likely to fail in CI.

Copilot uses AI. Check for mistakes.
kwaksaewon and others added 2 commits February 28, 2026 01:27
Add comprehensive security validation to localdns-exporter e2e tests
to verify systemd security directives (lines 14-29 of service file):

Validates:
- DynamicUser: Process runs as unprivileged dynamic user, not root
- RestrictAddressFamilies=AF_UNIX: No network sockets (IPv4/IPv6)
- Namespace isolation: Process has proper namespace separation
- ProtectSystem=strict: Read-only filesystem protection
- Additional hardening: PrivateTmp, ProtectHome, NoNewPrivileges, etc.

Changes:
- Extended validate_localdns_exporter_metrics.go with security checks
- Extended test-localdns-exporter.sh with security validation
- Tests spawn worker instances via socket activation
- Gracefully skip checks if instances aren't running

Testing approach:
1. Trigger scrape to spawn template instance (@.service)
2. Get active instance PID from systemd
3. Verify runtime properties (user, sockets, namespaces)
4. Verify systemd security properties configuration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Improvements to localdns-exporter security tests:

1. **Better test ordering**: Check systemd config FIRST (fast), then
   runtime enforcement (requires spawning instances)
   - Fail-fast if configuration is wrong
   - Avoid unnecessary instance spawns

2. **More precise pattern matching**:
   - Use ^...$ anchors for exact matches (prevents partial matches)
   - ReadOnlyPaths=/: Use regex to match "=/" or "=/ " (not "=/something")
   - RestrictAddressFamilies: Check for AF_UNIX presence AND verify
     AF_INET/AF_INET6 absence

3. **Increased reliability**:
   - Bump sleep from 1s to 2s for socket activation
   - Better retry logic for instance discovery

4. **Clearer test output**:
   - Separate "configuration" vs "runtime enforcement" sections
   - More descriptive messages (e.g., "DynamicUser runtime enforcement")
   - Final summary shows what was validated

5. **Better error context**:
   - Runtime checks explicitly state what's being enforced
   - Configuration checks show all 16 directives upfront

Testing approach now validates two layers:
- Layer 1: Systemd configuration (are directives set?)
- Layer 2: Runtime enforcement (are restrictions actually applied?)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

kwaksaewon and others added 8 commits February 28, 2026 03:19
Move shell test scripts to e2e/localdns/ for better organization:
- e2e/test-localdns-exporter.sh -> e2e/localdns/test-localdns-exporter.sh
- e2e/run-localdns-test.sh -> e2e/localdns/run-localdns-test.sh

Go test files remain in e2e/ root to access e2e package types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add e2e test coverage for localdns-exporter@.service on all
distributions that support localdns:
- Ubuntu 24.04 (Test_Ubuntu2404LocalDns_ExporterMetrics)
- Azure Linux V2 (Test_AzureLinuxV2LocalDns_ExporterMetrics)
- CBL-Mariner V2 (Test_MarinerV2LocalDns_ExporterMetrics)

These tests verify that the localdns exporter metrics endpoint works
correctly and validates the security hardening directives on each distro.

Existing tests already covered Ubuntu 22.04, Azure Linux V3, and Flatcar.
With these additions, all 6 localdns-supported distributions now have
complete e2e test coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the inline bash script from validate_localdns_exporter_metrics.go
to a separate file e2e/localdns/validate-localdns-exporter-metrics.sh
and use go:embed to load it. This improves code organization and makes
the validation script easier to maintain and test independently.

The validation script checks:
- Port 9353 listener and HTTP 200 response
- Required metrics: cpu_usage, memory_usage, vnetdns_forward_info, kubedns_forward_info
- All 16 systemd security directives (configuration and runtime enforcement)
- DynamicUser, RestrictAddressFamilies, and namespace isolation at runtime

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete e2e/localdns/test-localdns-exporter.sh as it duplicates
validate-localdns-exporter-metrics.sh functionality. The e2e validation
script provides comprehensive coverage:

Coverage in validate-localdns-exporter-metrics.sh:
✓ Port 9353 listening check (implies socket is active)
✓ HTTP 200 status validation
✓ Metrics body validation (cpu, memory, forward IPs)
✓ VnetDNS/KubeDNS forward IP parsing and validation
✓ All 16 systemd security directives
✓ Runtime enforcement (DynamicUser, RestrictAddressFamilies, namespaces)

The manual test script only added:
✗ systemctl availability check (not critical, e2e fails clearly if missing)
✗ localdns.service existence check (redundant, covered by metrics validation)
✗ Direct stdin test of exporter script (not valuable, e2e tests full stack)
✗ Socket enabled/active checks (redundant, port listening implies active)

Result: Eliminates ~184 lines of duplicate test code while maintaining
full e2e test coverage via validate-localdns-exporter-metrics.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…efile

The forward-IP extraction was incorrectly treating the '{' token as an IP
when the corefile uses the common CoreDNS syntax `forward . <ip> { ... }`.

Updated the awk parser to filter tokens using an IPv4 regex pattern
`/^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/` to only capture actual IP addresses
and skip braces and other non-IP tokens.

Also updated test fixtures in localdns_spec.sh to use the brace syntax
matching the production localdns corefile template, and added assertions
to verify that '{' is NOT captured as an IP label in the metrics output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… unit

Removed IPAddressAllow=localhost and IPAddressDeny=any from the socket unit
as they are redundant. ListenStream=127.0.0.1:9353 already restricts the
socket binding to loopback only.

The IPAddressAllow/Deny directives apply firewall-like filtering to the
spawned service process, not to socket binding. Since the socket is already
bound to 127.0.0.1, remote connections cannot reach it regardless of these
directives, making them unnecessary and potentially misleading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Regenerated pkg/agent/testdata after merging main branch to ensure
CustomData snapshots reflect the latest changes from both main and
the localdns metrics feature branch.

Also fixed shellcheck SC2012 warning in validate-localdns-exporter-metrics.sh
by replacing 'ls -1' with 'find' for better handling of filenames when
counting namespace entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 89 changed files in this pull request and generated 4 comments.

Comment on lines +195 to +214
# Fetch all security-related properties in batches (systemctl has limits)
SECURITY_PROPS_1=$(systemctl show localdns-exporter@.service \
--property=DynamicUser,PrivateTmp,ProtectSystem,ProtectHome,ReadOnlyPaths,NoNewPrivileges \
2>/dev/null || true)
SECURITY_PROPS_2=$(systemctl show localdns-exporter@.service \
--property=ProtectKernelTunables,ProtectKernelModules,ProtectControlGroups,RestrictAddressFamilies \
2>/dev/null || true)
SECURITY_PROPS_3=$(systemctl show localdns-exporter@.service \
--property=RestrictNamespaces,LockPersonality,RestrictRealtime,RestrictSUIDSGID,RemoveIPC,PrivateMounts \
2>/dev/null || true)

SECURITY_PROPS="$SECURITY_PROPS_1
$SECURITY_PROPS_2
$SECURITY_PROPS_3"

echo " Retrieved security properties:"
echo "$SECURITY_PROPS" | sed 's/^/ /'
echo ""

# Check all 16 security directives
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security-directive assertions are brittle across OS/systemd versions: systemctl show --property=... may omit unsupported properties (and the unit may ignore unknown directives), but the script hard-fails expecting all 16 keys (e.g., PrivateMounts=yes). To avoid false negatives across Ubuntu/Azure Linux/Flatcar variants, detect whether each property is supported before asserting, or validate by parsing the unit file instead of requiring systemctl show to return every property.

Suggested change
# Fetch all security-related properties in batches (systemctl has limits)
SECURITY_PROPS_1=$(systemctl show localdns-exporter@.service \
--property=DynamicUser,PrivateTmp,ProtectSystem,ProtectHome,ReadOnlyPaths,NoNewPrivileges \
2>/dev/null || true)
SECURITY_PROPS_2=$(systemctl show localdns-exporter@.service \
--property=ProtectKernelTunables,ProtectKernelModules,ProtectControlGroups,RestrictAddressFamilies \
2>/dev/null || true)
SECURITY_PROPS_3=$(systemctl show localdns-exporter@.service \
--property=RestrictNamespaces,LockPersonality,RestrictRealtime,RestrictSUIDSGID,RemoveIPC,PrivateMounts \
2>/dev/null || true)
SECURITY_PROPS="$SECURITY_PROPS_1
$SECURITY_PROPS_2
$SECURITY_PROPS_3"
echo " Retrieved security properties:"
echo "$SECURITY_PROPS" | sed 's/^/ /'
echo ""
# Check all 16 security directives
# Fetch all security-related properties individually so we can detect unsupported ones
SECURITY_PROPERTIES=(
DynamicUser
PrivateTmp
ProtectSystem
ProtectHome
ReadOnlyPaths
NoNewPrivileges
ProtectKernelTunables
ProtectKernelModules
ProtectControlGroups
RestrictAddressFamilies
RestrictNamespaces
LockPersonality
RestrictRealtime
RestrictSUIDSGID
RemoveIPC
PrivateMounts
)
SECURITY_PROPS=""
SUPPORTED_SECURITY_PROPS=()
UNSUPPORTED_SECURITY_PROPS=()
for prop in "${SECURITY_PROPERTIES[@]}"; do
# systemctl show prints "<Prop>=<value>" when the property is known; for
# unknown/unsupported properties it typically prints nothing.
value="$(systemctl show localdns-exporter@.service --property="$prop" 2>/dev/null || true)"
# Normalize whitespace and ignore empty output (treated as unsupported).
if [[ -n "${value//[[:space:]]/}" ]]; then
SECURITY_PROPS+="${value}"$'\n'
SUPPORTED_SECURITY_PROPS+=("$prop")
else
UNSUPPORTED_SECURITY_PROPS+=("$prop")
fi
done
echo " Retrieved security properties (supported on this systemd version):"
echo "$SECURITY_PROPS" | sed 's/^/ /'
if ((${#UNSUPPORTED_SECURITY_PROPS[@]} > 0)); then
echo ""
echo " Note: the following security directives are not supported by this systemd version and will be skipped:"
for prop in "${UNSUPPORTED_SECURITY_PROPS[@]}"; do
echo " - $prop"
done
fi
echo ""
# Check all 16 security directives (only those supported will have entries in SECURITY_PROPS)

Copilot uses AI. Check for mistakes.
kwaksaewon and others added 2 commits March 4, 2026 00:56
The forward_ips.prom generation is for observability metrics and should not
crash the localdns service if it fails. Changed from fatal errors to best-effort
warnings so that filesystem issues (readonly, disk full, permission denied) don't
take down DNS resolution.

Before: Any mktemp/chmod/mv failure returned non-zero, causing
replace_azurednsip_in_corefile to fail and localdns service to exit with
ERR_LOCALDNS_FAIL.

After: Metrics export failures log warnings and continue, localdns service
stays up and serves DNS traffic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add e2e tests for localdns exporter metrics on all distros that have
localdns files copied via packer configs. This ensures complete test
coverage across all supported architectures (x86_64 and ARM64).

New tests added:
- Ubuntu 22.04 ARM64
- Ubuntu 24.04 Gen1
- Ubuntu 24.04 ARM64
- AzureLinux V2 ARM64
- Mariner V2 ARM64
- Flatcar ARM64

All packer configs (vhd-image-builder-*.json) copy localdns-exporter
files, so all these distros should be tested.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 89 changed files in this pull request and generated 3 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +219 to +220
vnetdns_ips=($(awk '/bind 169.254.10.10/,/^}/' "${UPDATED_LOCALDNS_CORE_FILE}" | awk '/forward \. / {for(i=3; i<=NF; i++) if ($i ~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/) print $i}'))
kubedns_ips=($(awk '/bind 169.254.10.11/,/^}/' "${UPDATED_LOCALDNS_CORE_FILE}" | awk '/forward \. / {for(i=3; i<=NF; i++) if ($i ~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/) print $i}'))
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The awk range end regex ^} only matches a closing brace at column 1. If the Corefile closes blocks with indentation (e.g., }), the range won’t terminate and may capture subsequent server blocks, producing incorrect forward IP metrics. Use an end pattern that tolerates whitespace (e.g., ^[[:space:]]*}) for both VnetDNS and KubeDNS ranges.

Suggested change
vnetdns_ips=($(awk '/bind 169.254.10.10/,/^}/' "${UPDATED_LOCALDNS_CORE_FILE}" | awk '/forward \. / {for(i=3; i<=NF; i++) if ($i ~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/) print $i}'))
kubedns_ips=($(awk '/bind 169.254.10.11/,/^}/' "${UPDATED_LOCALDNS_CORE_FILE}" | awk '/forward \. / {for(i=3; i<=NF; i++) if ($i ~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/) print $i}'))
vnetdns_ips=($(awk '/bind 169.254.10.10/,/^[[:space:]]*}/' "${UPDATED_LOCALDNS_CORE_FILE}" | awk '/forward \. / {for(i=3; i<=NF; i++) if ($i ~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/) print $i}'))
kubedns_ips=($(awk '/bind 169.254.10.11/,/^[[:space:]]*}/' "${UPDATED_LOCALDNS_CORE_FILE}" | awk '/forward \. / {for(i=3; i<=NF; i++) if ($i ~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/) print $i}'))

Copilot uses AI. Check for mistakes.
kwaksaewon and others added 3 commits March 4, 2026 19:04
Update localdns_spec.sh to use the production corefile format with brace
syntax in the setup() function. This ensures all tests validate against
realistic input and can catch regressions where the parser incorrectly
captures '{' as an IP address.

Changes:
- Modified setup() to create corefiles with `forward . <ip> { ... }` syntax
  instead of simplified `forward . <ip>` format
- Added explicit assertions to basic tests to verify braces are not
  exported as IP labels in Prometheus metrics
- Ensures defense in depth: even early/simple tests now catch brace-parsing bugs

All 502 shellspec tests pass with these changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace lsof-based socket inspection with /proc filesystem approach for
cross-distro compatibility. lsof is only installed on Mariner/Azure Linux,
not on Ubuntu or Flatcar, which would cause e2e test failures.

Changes:
- Use /proc/$PID/fd to enumerate file descriptors
- Use readlink to identify socket file descriptors
- Use ss -tupn to distinguish network sockets (TCP/UDP) from Unix sockets
- Add graceful fallback with warning if /proc is inaccessible
- Provide detailed error messages showing socket count

This approach works on all supported distros without additional dependencies,
using only standard Linux /proc filesystem and ss (iproute2, universally
available).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace localhost:9353 with 127.0.0.1:9353 to match the socket binding
and avoid IPv6 resolution issues.

Problem:
- The localdns-exporter.socket binds to 127.0.0.1:9353 (IPv4 only)
- On systems where localhost resolves to ::1 (IPv6) first, curl would
  attempt IPv6 connection and fail with HTTP code 000
- This would cause false negatives in e2e validation

Solution:
- Use explicit 127.0.0.1:9353 address in all curl commands
- Ensures curl connects via IPv4 to match ListenStream binding
- Eliminates dependency on /etc/hosts localhost resolution order

Changed 3 curl invocations:
- HTTP status check (line 22)
- Metrics fetch (line 33)
- Socket activation trigger (line 362)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 89 changed files in this pull request and generated 1 comment.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +406 to +437
# Step 14: Verify no network sockets (RestrictAddressFamilies runtime enforcement)
echo "14. Verifying RestrictAddressFamilies runtime enforcement (no network sockets)..."

# Use /proc filesystem for portability (works on all distros without lsof)
# Check socket types by examining /proc/$PID/fd and using ss to inspect socket details
NETWORK_SOCKETS=0
if [ -d "/proc/$INSTANCE_PID/fd" ]; then
# Iterate through file descriptors to find sockets
for fd in /proc/"$INSTANCE_PID"/fd/*; do
if [ -L "$fd" ]; then
FD_TARGET=$(readlink "$fd" 2>/dev/null || echo "")
# Check if it's a socket (starts with "socket:[")
if [[ "$FD_TARGET" =~ ^socket:\[([0-9]+)\]$ ]]; then
SOCKET_INODE="${BASH_REMATCH[1]}"
# Use ss to check if this socket is TCP/UDP (network socket)
# ss -xpn shows unix sockets, ss -tupn shows TCP/UDP sockets
if ss -tupn 2>/dev/null | grep -q "inode:$SOCKET_INODE"; then
NETWORK_SOCKETS=$((NETWORK_SOCKETS + 1))
echo " Found network socket: inode=$SOCKET_INODE"
fi
fi
fi
done
else
echo " ⚠️ WARNING: Cannot access /proc/$INSTANCE_PID/fd, skipping socket inspection"
fi

if [ "$NETWORK_SOCKETS" != "0" ]; then
echo " ❌ ERROR: Instance has $NETWORK_SOCKETS network socket(s) (RestrictAddressFamilies not enforced)"
exit 1
fi
echo " ✓ No network sockets (AF_UNIX only, restriction enforced)"
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime check that the exporter worker has no TCP/UDP sockets will fail if the service instance is still running during the scrape: the accepted connection from localdns-exporter.socket is an AF_INET TCP socket that will appear in ss -tupn and is expected to be held open as stdin/stdout. Consider adjusting this assertion to allow the single inherited scrape socket (or dropping this process-level check and relying on the unit's RestrictAddressFamilies= configuration validation).

Copilot uses AI. Check for mistakes.
kwaksaewon and others added 3 commits March 4, 2026 19:16
…mplate

Remove the [Install] section from localdns-exporter@.service since it's
a socket-activated template unit with Accept=yes.

Why this change is needed:
- Socket-activated services with Accept=yes spawn instances on-demand
- Service instances are started by the socket, not by systemd at boot
- The [Install] section with WantedBy=multi-user.target is misleading
  because it suggests the template should be enabled directly
- Only the socket (localdns-exporter.socket) needs to be enabled

Current behavior (unchanged):
- CSE enables localdns-exporter.socket via systemctlEnableAndStartNoBlock
- When connections arrive on 127.0.0.1:9353, systemd spawns instances
  of localdns-exporter@N.service (where N is the connection number)
- Service instances inherit settings from the template unit

The [Install] section in a template unit is never used and only causes
confusion about the activation mechanism.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…driven test

Replace 12 nearly-identical test functions with a single table-driven test
to reduce code duplication and improve maintainability.

Before:
- 12 separate test functions (Test_Ubuntu2204LocalDns_ExporterMetrics, etc.)
- 235 lines of duplicated code
- Hard to maintain: any behavior change requires updating 12 functions
- Easy to introduce inconsistencies across tests

After:
- 1 table-driven test (Test_LocalDns_ExporterMetrics) with 12 subtests
- 90 lines of code (62% reduction)
- Easy to add new VHDs: just add one table entry
- Single source of truth for test behavior
- Consistent validation across all VHDs

Test output remains the same:
- go test -v shows: Test_LocalDns_ExporterMetrics/Ubuntu2204
- Each VHD still runs as a distinct subtest with proper isolation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…heck

Exclude the inherited stdin/stdout socket when checking for network sockets
in socket-activated service instances.

Problem:
- Socket-activated services with Accept=yes inherit the accepted TCP connection
  as stdin (fd 0) and stdout (fd 1)
- This inherited socket is AF_INET (matches ListenStream=127.0.0.1:9353)
- The runtime check was flagging this expected socket as a violation
- This caused false positives when the service instance was still running
  during the test (curl still reading metrics response)

Why the inherited socket is expected:
- systemd passes the accepted connection to the service via stdin/stdout
- This is how socket activation works with Accept=yes
- The socket is inherited, not created by the service
- RestrictAddressFamilies=AF_UNIX prevents the service from creating NEW
  network sockets, but doesn't block inherited sockets

Solution:
- Identify the stdin socket inode before checking
- Skip the inherited socket when counting network sockets
- Only flag sockets the service creates itself as violations

This allows the test to correctly validate that the service cannot create
network connections while still allowing the expected inherited activation
socket.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 89 changed files in this pull request and generated no new comments.


You can also share your feedback on Copilot code review. Take the survey.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants