tools/stress: orchestrator skeleton (CLI, sweep, runlog, abort)#3776
Open
elitegreg wants to merge 1 commit into
Open
tools/stress: orchestrator skeleton (CLI, sweep, runlog, abort)#3776elitegreg wants to merge 1 commit into
elitegreg wants to merge 1 commit into
Conversation
Adds tools/stress/device-orchestrator/, the device-stress orchestrator binary for the GRE Tunnel Capacity Study. The binary parses every flag from #3746's CLI list, dumps orchestrator-config.json on start, runs a provision-then- reverse-deprovision sweep against a live serviceability program, and emits the runlog row schema {run_id, user_index, user_pubkey, tunnel_id, event, t_ns, n_after_event} for each submit | confirm | activate | deprovision_* event. Packages: - pkg/reconcile — PlanFor() pure function (lifted from the part-1 SDK PR; now lives with the orchestrator as policy, not as an SDK primitive) - pkg/runlog — append-only JSONL writer for orchestrator-runlog.json - pkg/sweep — provision-then-deprovision loop driven by PlanFor; uses a Clock + Executor interface for testability; reverse-creation-order delete - pkg/abort — sentinel-file poller that cancels a derived ctx between user iterations so an in-flight Create/Delete completes before exit - pkg/agent — AgentRunner interface + noop impl; SSH runner lands in part 3 along with pre_commit_log / applied event emission - pkg/exec — Live impl of sweep.Executor over serviceability.{Client, Executor}; picks deterministic per-user IPs from --client-ip-base - cmd/device-orchestrator — flag parsing, config dump, signal + abort handling, sweep wiring The agent runner is stubbed behind an interface so this PR can land end-to-end functionality (provision/deprovision + runlog + abort) without the SSH plumbing. The SSH runner and the corresponding pre_commit_log / applied row generation land in part 3 of #3746. Part 2 of #3746. Closes #3771.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the device-stress orchestrator skeleton at
tools/stress/device-orchestrator/, for the GRE Tunnel Capacity Study. Stacked on top of #3774 (part 1, SDK user CRUD). Part 2 of #3746. Closes #3771.cmd/device-orchestrator— every flag from stress: implement tools/stress/device-orchestrator #3746's CLI list (--target-user-count,--users-per-batch,--hold-seconds,--dut-pubkey,--dut-ssh-host,--dut-ssh-key,--rpc-url,--program-id,--keypair,--controller,--abort-file,--working-dirplus--client-ip-base,--tunnel-endpoint,--tenant-pubkey,--run-id,--log-level,--dry-run). Dumpsorchestrator-config.jsonon start.pkg/reconcile—PlanFor(current, target, ownerFilter)returns a deterministicPlan{ToCreate, ToDelete}delta. Lifted from the part-1 SDK PR per the discussion — it's orchestrator policy, not an SDK primitive.pkg/sweep— provision-then-reverse-deprovision loop driven byPlanFor; batches of--users-per-batchwith--hold-secondsbetween batches; reverse-creation-order deprovision tracked by the sweep itself; emitssubmit | confirm | activate | deprovision_*runlog rows.pkg/runlog— append-only JSONL writer fororchestrator-runlog.jsonwith the row schema{run_id, user_index, user_pubkey, tunnel_id, event, t_ns, n_after_event}.pkg/abort— ticker-based watcher of--abort-file; cancels a derived ctx so the sweep finishes the in-flight user before exiting non-zero, then still tears down what was created.pkg/agent—Runnerinterface (Start(ctx) error; Events() <-chan Event) with a no-op implementation. The SSH-backed runner and thepre_commit_log/appliedrow generation land in part 3.pkg/exec—Liveimpl ofsweep.Executorwrappingserviceability.{Client, Executor}; picks deterministic per-user IPs (base + idx) and forwardsDevicePubkey/TenantPubkeytoUserCreateArgs.Makefilemirrorstools/twamp/Makefile(build, test, lint).Testing Verification
pkg/sweep: fakeExecutor+ fakeClock+ no-opAgentdrive a 0→4 sweep in batches of 2. Asserts orderedsubmit/confirm/activatex4, reverse-orderdeprovision_submit/deprovision_confirm/deprovision_activatex4,Holdfires exactly once (between batches, not after reaching target), andn_after_eventincrements atactivate/ decrements atdeprovision_activate.pkg/sweepabort case: failing the 3rd create still drives deprovision over the first two users so the orchestrator never leaks state on abort.pkg/abort: tempdir + touch the sentinel + assert the derived ctx cancels within 1s; empty-path watch is a no-op that still propagates parent cancellation.pkg/runlog: round-trip rows, auto-fillt_ns, reject writes afterClose,Open(path)truncates existing content.pkg/reconcile: table-driven 0→N / N→0 / partial / foreign-only / mixed / negative / tie-break-by-pubkey.make buildproducesbin/device-orchestrator;./bin/device-orchestrator --dry-run --target-user-count 4 --users-per-batch 2 --working-dir /tmp/orchwrites a validorchestrator-config.jsonwithout contacting RPC.make go-build go-lint go-testall green.Out of scope
Committing config session due to diffs detected: <diff>and the commit-success line intopre_commit_log/appliedevents. Lands in part 3 of stress: implement tools/stress/device-orchestrator #3746.dz-localdevnet).