tools/stress/device-observer: scaffolding + eAPI device sampler#3802
Open
nikw9944 wants to merge 2 commits into
Open
tools/stress/device-observer: scaffolding + eAPI device sampler#3802nikw9944 wants to merge 2 commits into
nikw9944 wants to merge 2 commits into
Conversation
Lay down the device-observer binary skeleton and per-tick EOS sampler that writes five show-command snapshot files per sample interval. The Prometheus scrape, log tailers, and abort decider land as no-op collector stubs so the goroutine wiring is fixed for follow-up PRs (#3794, #3795, #3796). Refs #3793.
- Validate --sample-interval > 0 (security HIGH from review). - Constrain --abort-file under --working-dir to avoid arbitrary file write surfaces in PR #3796 (security MEDIUM). - Tighten file modes to 0o640/0o750 (security MEDIUM). - Wrap each eAPI call in a goroutine + select on ctx.Done() so SIGINT/SIGTERM cancels the observer even if goeapi is blocked in an HTTP call (arch HIGH). - Document JSON-fidelity and in-flight-call limitations in README for follow-up. Refs #3793.
01f3330 to
57b7a36
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Introduces a new tools/stress/device-observer Go binary intended to be run by an external orchestrator during GRE Tunnel Capacity Study sweeps. The tool scaffolds a multi-collector goroutine layout (with stub collectors for upcoming PRs) and implements an EOS eAPI sampler that periodically snapshots a fixed set of show commands into a working directory.
Changes:
- Add
device-observercommand with flag parsing, working-dir contract, and errgroup wiring for sampler + stub collectors. - Implement an Arista eAPI client wrapper and an EOS sampler that writes one file per command per tick (with unit tests).
- Document usage/output contract in a new README and add a CHANGELOG entry.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/stress/device-observer/cmd/device-observer/main.go | New binary entrypoint: flags, config writer, errgroup wiring for sampler + stub collectors |
| tools/stress/device-observer/internal/collector/collector.go | Collector interface + Noop stub implementation |
| tools/stress/device-observer/internal/eapi/client.go | Thin goeapi wrapper exposing RunShowJSON / RunShowText |
| tools/stress/device-observer/internal/eapi/client_test.go | Minimal test coverage for NewClient behavior |
| tools/stress/device-observer/internal/sample/eos.go | Per-tick sampler executing five commands and writing timestamped files |
| tools/stress/device-observer/internal/sample/eos_test.go | Sampler unit tests (file writing, failure tolerance, cancellation, timestamp format) |
| tools/stress/device-observer/internal/promscrape/scrape.go | Stub metrics scraper collector (to be implemented in #3794) |
| tools/stress/device-observer/internal/loggingtail/eos.go | Stub EOS logging collector (to be implemented in #3795) |
| tools/stress/device-observer/internal/loggingtail/agent.go | Stub agent log tail collector (to be implemented in #3795) |
| tools/stress/device-observer/internal/runlog/reader.go | Stub runlog collector (to be implemented in #3795) |
| tools/stress/device-observer/internal/abort/decider.go | Stub abort decider collector (to be implemented in #3796) |
| tools/stress/device-observer/README.md | Usage, flags, working-dir/file contract, known limitations |
| CHANGELOG.md | Unreleased entry announcing the new tool |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+114
to
+115
| func fileTimestamp(t time.Time) string { | ||
| return strings.ReplaceAll(t.Format("2006-01-02T15:04:05.000000000Z"), ":", "-") |
Comment on lines
+16
to
+17
| // NewClient dials the device's eAPI endpoint over HTTP. HTTPS support is | ||
| // deferred; see docs/work-plan-3793.md. |
Comment on lines
+5
to
+6
| // TestNewClientNoServer verifies NewClient surfaces a connection error | ||
| // when no eAPI server is reachable. (goeapi's Connect dials lazily for |
|
|
||
| | File | Owner | Description | | ||
| | ---------------------------------------- | --------- | ----------------------------------------------- | | ||
| | `observer-config.json` | observer | resolved flag values + PID + start timestamp | |
|
|
||
| ### Changes | ||
|
|
||
| - tools/stress/device-observer: initial scaffolding plus eAPI device sampler that writes per-tick snapshots of five `show` commands |
elitegreg
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
tools/stress/device-observerbinary scaffold with the per-tick eAPI sampler. On every--sample-interval(default 10s) the sampler issues fiveshowcommands against an Arista cEOS device and writes one timestamped file per command into--working-dir(show-hardware-capacity-*.json,show-gre-tunnel-static-*.json,show-processes-top-once-*.json,show-logging-errors-*.log,show-logging-critical-*.log), plus anobserver-config.jsoncontaining the resolved flags, observer PID, and start time.internal/promscrape/,internal/loggingtail/{eos,agent},internal/runlog/,internal/abort/) as no-opCollectorstubs wired intomain.gounder a singleerrgroup, so PRs 3747-2: agent Prometheus scrape (~150 LOC code) #3794, 3747-3: EOS syslog + tail-based readers (~250 LOC code) #3795, and 3747-4: abort decider + sentinel (~200 LOC code) #3796 can replace each stub without re-plumbing the goroutine layout.Implementation notes
tools/twamp/(cmd/<binary>/main.go+internal/<pkg>/). No newMakefiletarget; the workspacemake go-buildpicks it up via./....eapi.Clientis a thin wrapper aroundgoeapi.Connect+RunCommands, exposingRunShowJSON/RunShowText. HTTPS support is deferred per the operator-approved plan.eapi_passis intentionally not persisted intoobserver-config.json— the working dir may be archived (e.g. to S3) and credentials must not land there. The orchestrator already knows the password it supplied.Known limitations documented for follow-up
--eapi-passon the CLI flag is visible inps. A follow-up may add--eapi-pass-file/DZ_EAPI_PASS(security MEDIUM from review).Testing Verification
make go-buildsucceeds.go test ./tools/stress/device-observer/...passes (5 test cases covering happy path, single-command failure tolerance, two-tick non-collision, prompt cancellation under context cancel, and filesystem-safe timestamping).golangci-lint run -c ./.golangci.yaml ./tools/stress/device-observer/...reports 0 issues.dz-local-device-dz1has not been executed in this environment (no devnet available); the README's "Local devnet smoke test" section is the runbook for that verification.