feat: unified CLI, compare mode, metrics collection by Frando · Pull Request #9 · n0-computer/patchbay

Frando · 2026-03-26T13:13:39Z

Unified CLI

We consolidated patchbay-runner and patchbay-vm into libraries behind a single patchbay-cli crate. The patchbay binary auto-detects native vs VM backends, with VM commands nested under patchbay vm. All paths consolidate under .patchbay/ — work output, VM state, and worktrees each get their own subdirectory.

Compare mode

patchbay compare test <ref> [ref2] builds and runs tests in git worktrees, then diffs per-test pass/fail results with a regression score. Cached runs are matched by commit SHA so repeated comparisons skip the build. The CLI parses both cargo test and cargo nextest output formats.

Per-device metrics

Device::record() and the builder pattern Device::metrics().record(k, v).emit() write timestamped JSONL via tracing, routed to per-device metrics.jsonl files. An optional iroh-metrics feature enables direct MetricsGroup emission.

Run manifests

A single RunManifest in patchbay-utils replaces three prior definitions. Every test and sim run writes run.json with git context, outcome, and per-test results. patchbay test --persist copies testdir output into .patchbay/work/ for later comparison or upload.

Server

The server discovers leaf runs by events.jsonl and groups by run.json, recursing into group directories to find children. GET /api/runs supports project, kind, limit, and offset filtering. The "batch" concept is replaced by "group" throughout, with serde aliases for backward compatibility.

UI

The runs index shows collapsible groups with manifest metadata, and child runs display their test name rather than inherited info. Checkbox selection enables comparing any two runs. The compare view at /compare/:left/:right computes diffs client-side with clickable test names that drill into individual run comparison. Split-screen co-navigation shares tab state and filter controls between both panels. Metrics render as a pivoted table with per-device columns, sparklines, and a filter input. The logs sidebar is collapsible.

CI and tooling

patchbay upload replaces raw tar | curl, prints the direct run URL for PR comments, and creates run.json from CI environment variables when missing. The CI template installs via cargo binstall with --git-url and falls back to building from source.

Rename the grouping concept from "invocation" to "batch" in all Rust, TypeScript, and test code. Switch UI from HashRouter to BrowserRouter with server-side SPA fallback routes. Backward compatibility: - /api/invocations/{name}/combined-results kept as alias - /inv/* routes redirect to /batch/* Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ate patchbay-cli Move all CLI parsing and dispatch from patchbay-runner and patchbay-vm into a new patchbay-cli crate. The runner and vm crates become pure libraries. VM commands are nested under `patchbay vm <subcommand>`. - patchbay-runner: remove binary, expose sim module as pub - patchbay-vm: add lib.rs re-exporting Backend, RunVmArgs, TestVmArgs - patchbay-cli: unified CLI with feature flags (native, vm, serve) Zero behavior change — same subcommands, same defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add device.record(key, value), device.metrics() builder, and device.enter_tracing() for lightweight per-device metric collection. Metrics are written to device.<name>.metrics.jsonl as JSONL lines with format {"t":<epoch>,"m":{"key":value}}. Uses tracing infrastructure with a patchbay::_metrics target, routed to per-namespace metrics files. - Device/Router hold a clone of their namespace's tracing::Dispatch - MetricsBuilder allows batch emission of multiple metrics - patchbay-server recognizes .metrics.jsonl as LogKind::Metrics Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Delegates to cargo nextest (preferred) or cargo test on native Linux, and to patchbay-vm's test flow when --vm is specified. Sets RUSTFLAGS="--cfg patchbay_test" on all cargo invocations. Supports --ignored, --ignored-only, package/test selectors, and extra args. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Compare test results between git refs using worktrees. Supports one ref (compare against worktree) or two refs. - Creates git worktrees in .patchbay/tree/ - Runs tests sequentially in each - Parses pass/fail/ignored results from cargo test output - Prints summary table with fixes, regressions, and score - Writes compare manifest to .patchbay/work/compare-{timestamp}/ - Cleans up worktrees if unchanged Scoring: +3 per fix, -5 per regression, +/-1 for time improvement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract RunView component from App.tsx for reuse in compare mode - MetricsTab: fetch/parse device.*.metrics.jsonl, show key/value table with inline SVG sparklines for multi-point metrics - CompareView: load summary.json from compare batches, show summary bar (fixes, regressions, score) and per-test comparison table - Add /compare/* route, auto-detect compare batches via summary.json - Add metrics tab to run detail views Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…fest Add Upload subcommand that tars a directory and pushes to a patchbay-server instance via the existing POST /api/push/{project} endpoint. Uses PATCHBAY_URL, PATCHBAY_API_KEY, and PATCHBAY_PROJECT env vars (also accepted as --url, --api-key, --project flags). Also adds optional `project` field to CompareManifest, populated from PATCHBAY_PROJECT env var when set. CI usage: patchbay compare test --ref main patchbay upload .patchbay/work/compare-*/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Code review fixes across all commits: handles.rs: - Extract record_metric() shared fn, eliminate Device/Router duplication tracing.rs: - Record visitor fields once at top of write_event_to_files compare.rs + test.rs: - Extract patchbay_rustflags() to util.rs, share between modules - Extract test_index() and merged_names() helpers - Remove redundant starts_with guard in parse_test_output main.rs: - Extract resolve_project_root() helper Upload rewrite: - Use reqwest + tar + flate2 instead of shelling out - Auto-create run.json manifest from CI env vars (GITHUB_*) - Move to dedicated upload.rs module Other: - Add cargo-binstall metadata to patchbay-cli and patchbay-server - Fix CompareView sort to use localeCompare - Fix MetricsTab React key to use device:key - Fix App.tsx double runs.find() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract TestArgs as a clap::Args struct, flatten into both Command::Test and CompareCommand::Test. Move VM dispatch logic to test::run_vm(). Update compare::run_tests_in_dir to accept &TestArgs directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…args Path consolidation: - .patchbay-work → .patchbay/work (all 7 CLI defaults + vm fallbacks) - .qemu-vm → .patchbay/vm, .container-vm → .patchbay/vm VmOps trait: - Add VmOps trait in patchbay-vm with QemuBackend/ContainerBackend ZSTs - Rewrite dispatch_vm to use trait methods instead of per-command match - resolve_ops() factory returns Box<dyn VmOps> DRY cargo args: - Add TestArgs::into_vm_args() method for TestArgs → TestVmArgs conversion - Use in both test::run_vm() and VmCommand::Test dispatch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixture crate (patchbay-cli/tests/fixtures/counter/): - Two tests: udp_counter (sends/receives packets, records metric) and udp_threshold (asserts PACKET_COUNT >= THRESHOLD) Integration test (compare_integration.rs, #[ignore]): - Creates temp git repo with passing and regressing commits - Runs patchbay compare test, asserts regressions detected E2E test (ui/e2e/compare.spec.ts): - Mock compare data, spawns patchbay serve, verifies CompareView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r copy CompareSummary: - Split flat fields into left/right RunStats structs - Use std::time::Duration with custom serde (serialize as ms) - Update compare_results, print_summary, UI, and test mocks VmOps → Backend methods: - Remove VmOps trait, QemuBackend/ContainerBackend ZSTs, Box<dyn> - Implement dispatch methods directly on Backend enum - backend.resolve().up(recreate) instead of resolve_ops(b).up(recreate) testdir: - Copy target/testdir-current to .patchbay/work/testdir after native tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copy fixture source + write Cargo.toml with absolute patchbay path instead of trying to reconstruct the crate from scratch. Remove -p flag since compare runs all tests in the workspace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

VmOps trait: - Trait with ZST impls (Qemu, Container) that delegate to module fns - Backend enum implements VmOps by dispatching to resolved ZST - Backend::resolve() replaces free resolve_backend function iroh-metrics: - Optional feature iroh-metrics in patchbay crate - device.record_iroh_metrics(&dyn MetricsGroup) iterates counters/gauges and emits via MetricsBuilder inspect/run-in: - Feature-gated on cfg(target_os = "linux") since they require namespaces Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tions Use workspace.dependencies in root Cargo.toml for all internal crates. Member crates now use { workspace = true } instead of relative paths. Enhanced compare integration test with assertions for: - Summary output in stdout - Left/right RunStats (pass/fail/total counts) - Per-test results (udp_threshold fails on right, passes on left) - Worktree cleanup verification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Set PATCHBAY_OUTDIR so the fixture's Lab writes device metrics. After compare, find *.metrics.jsonl files and assert they contain packet_count values matching the fixture's PACKET_COUNT constants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…metadata Fixture now uses testdir!() + Lab::with_opts(outdir) matching iroh test patterns. Integration test uses cargo metadata to find target_dir/testdir-current and validates device.sender.metrics.jsonl contains packet_count values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stream stdout/stderr from cargo test in real time when -v is passed. Uses piped stdio with reader threads so output is both printed live and captured for result parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…st flags - TestArgs.cargo_test_cmd_in() builds the full cargo test Command - compare::run_tests_in_dir uses it instead of duplicating arg expansion - Rename --ignored-only → --ignored, --ignored → --include-ignored to match cargo test flag names exactly - Remove filter as positional; goes through -- like cargo test - -v/--verbose is now global on Cli, works for all subcommands - CompareCommand refs are positional: `patchbay compare test main [ref2]` - Fixture uses ctor + init_userns() (safe version) for namespace bootstrap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Set CARGO_TARGET_DIR per worktree in compare to prevent shared binary cache from masking source changes between refs - Use --force in worktree cleanup to handle untracked target/ dirs - Fixture uses ctor + init_userns() for namespace bootstrap - Align --ignored/--include-ignored with cargo test flag names - Refs are positional: `patchbay compare test main [ref2] [-- filter]` - -v/--verbose is global on Cli struct Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move RunManifest to patchbay-utils with RunKind enum, chrono dates, Duration-as-ms serde, TestResult/TestStatus types, git_context(), resolve_ref(), find_run_for_commit(), parse_test_output() - Rename runner's RunManifest → SimRunReport, dir prefix sim- → run- - patchbay test: pipe output, write run.json to testdir, --persist flag - Compare: check cached runs by commit SHA, --force-build/--no-ref-build - Server: discover runs by run.json OR events.jsonl, batch→group rename, query params (project, kind, limit, offset), /compare/* SPA route - UI: RunsIndex with filters/pagination/checkbox compare selection, CompareView at /compare/:left/:right with client-side diff, split-screen co-navigation, ComparePage route wrapper - Backend::Auto panics if not resolved (was silent Qemu fallback) - E2e tests updated for new UI layout and data model - All 6 e2e tests pass, zero clippy warnings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gthen e2e Naming consistency: - Rename batch→group in all internal code (variables, types, functions, comments) - Add /group/* and /api/groups/ as primary routes, keep /batch/* as alias - Update docs and CI templates (.invocation → .group) UI refactor: - Split App.tsx (mode prop) into RunPage.tsx and BatchPage.tsx - Extract RunSelector component with Selection type helpers - Extract groupByGroup/simLabel to shared utils.ts - Fix any types in CompareView (proper LabState, LabEvent[], SimResults) - Add useMemo for availableTabs in RunView, useCallback in LogsTab - Add dead-flag cleanup in MetricsTab fetch effect E2e test improvements: - 3 compare tests (regression, fix scenario, checkbox selection flow) - Stronger push test assertions (verify manifest data in UI) - Multi-sim test clicks through to verify topology - Perf tab asserts data row presence - All 8 tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

testdir-current is a symlink to testdir-N. cp -r copies the symlink itself; cp -rL dereferences it and copies the actual directory contents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- parse_test_output now handles nextest format (PASS/FAIL/TIMEOUT/IGNORE with duration extraction) in addition to cargo test format - Deduplicate test results by name (nextest reprints failures in summary) - Server scan_runs_recursive: directories with run.json but no events.jsonl are groups (recurse into them), not leaf runs - Fix push e2e test to match by group instead of name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Groups in RunsIndex are collapsed by default with expand/collapse toggle - Child runs show their test/sim name (from label or path), not inherited manifest info (branch@commit which was repeated for every child) - Group header shows manifest summary inline (project, branch, commit, kind, outcome, pass/fail counts) - Group header has checkbox for group-level compare selection - E2e tests updated to expand groups before clicking child runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Compare view: - Concise single-line header for individual runs (name + pass/fail icons) - Concise header for groups (refs + pass/fail counts + regression/fix count) - Clickable test names in per-test table navigate to individual compare Metrics tab: - Pivoted table: rows=metric keys, columns=devices - Filter input to search by metric key name - Accepts shared filter prop for compare mode Shared controls architecture: - Compare split-screen lifts log filter/levels and metrics filter state - SharedControlsBar renders once above both panels - LogsTab and MetricsTab use external state when provided, hide internal controls Logs tab: - Collapsible sidebar with toggle button Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- CI template uses cargo binstall + patchbay test --persist + patchbay upload - Fix binstall bin-dir to match release asset naming ({bin}-{target}) - Fix release workflow to build patchbay-cli (was patchbay-runner) - Add patchbay-serve build to release workflow - Update docs/guide/testing.md with simplified upload example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Individual lab runs don't have run.json so outcome showed as "unknown". Just show the test name — the side-by-side view below has all details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Upload now prints the full view URL (e.g. https://pb.example.com/run/name) to stdout. The CI template captures it via GITHUB_OUTPUT and links directly to the run in the PR comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The pkg-url used { version } which resolves to 0.1.0, but releases use a force-moved rolling tag. Hardcode rolling in the download path. CI template installs cargo-binstall first, then uses --git-url since the crate is not published on crates.io. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- git_context() now checks both unstaged and staged changes for dirty flag - dir_size() handles file_type() errors gracefully instead of panicking - test.rs warns on run.json write failures instead of silently ignoring - Remove duplicate TestResult interface in api.ts - Remove unused statusIcon function in CompareView.tsx - Fix stale comment in InvRedirect Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- cargo fmt across all changed Rust files - Remove /batch/ SPA and API routes, /group/ is now canonical - Rename BatchPage → GroupPage - Harden push tar extraction: manual entry iteration with path checks, reject absolute paths and .. components, disable permissions/xattrs - Individual run compare shows both sides + back link to group compare Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The topbar now shows context like "iroh · main@fc654dd vs main@abc1234" so you know which project and refs you're comparing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Frando and others added 30 commits March 25, 2026 23:16

fix: copy fixture Cargo.toml and replace path dep instead of rewriting

30dfa81

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use SideStats.fail in compare summary output

d85913d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: persist copies testdir contents, not symlink

c6feeb9

testdir-current is a symlink to testdir-N. cp -r copies the symlink itself; cp -rL dereferences it and copies the actual directory contents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: clean compare header for individual runs

4814be9

Individual lab runs don't have run.json so outcome showed as "unknown". Just show the test name — the side-by-side view below has all details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Frando and others added 7 commits March 26, 2026 14:06

fix: show project and refs in compare page topbar

abe927f

The topbar now shows context like "iroh · main@fc654dd vs main@abc1234" so you know which project and refs you're comparing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use is_some_and instead of map_or(false, ..) per clippy

9ab99b1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: fmt

a0d0d44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unified CLI, compare mode, metrics collection#9

feat: unified CLI, compare mode, metrics collection#9
Frando wants to merge 37 commits intomainfrom
feat/compare

Frando commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Frando commented Mar 26, 2026

Compare mode

Per-device metrics

Run manifests

Server

UI

CI and tooling

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant