Skip to content

feat: unified CLI, compare mode, metrics collection#9

Open
Frando wants to merge 37 commits intomainfrom
feat/compare
Open

feat: unified CLI, compare mode, metrics collection#9
Frando wants to merge 37 commits intomainfrom
feat/compare

Conversation

@Frando
Copy link
Member

@Frando Frando commented Mar 26, 2026

Unified CLI

We consolidated patchbay-runner and patchbay-vm into libraries behind a single patchbay-cli crate. The patchbay binary auto-detects native vs VM backends, with VM commands nested under patchbay vm. All paths consolidate under .patchbay/ — work output, VM state, and worktrees each get their own subdirectory.

Compare mode

patchbay compare test <ref> [ref2] builds and runs tests in git worktrees, then diffs per-test pass/fail results with a regression score. Cached runs are matched by commit SHA so repeated comparisons skip the build. The CLI parses both cargo test and cargo nextest output formats.

Per-device metrics

Device::record() and the builder pattern Device::metrics().record(k, v).emit() write timestamped JSONL via tracing, routed to per-device metrics.jsonl files. An optional iroh-metrics feature enables direct MetricsGroup emission.

Run manifests

A single RunManifest in patchbay-utils replaces three prior definitions. Every test and sim run writes run.json with git context, outcome, and per-test results. patchbay test --persist copies testdir output into .patchbay/work/ for later comparison or upload.

Server

The server discovers leaf runs by events.jsonl and groups by run.json, recursing into group directories to find children. GET /api/runs supports project, kind, limit, and offset filtering. The "batch" concept is replaced by "group" throughout, with serde aliases for backward compatibility.

UI

The runs index shows collapsible groups with manifest metadata, and child runs display their test name rather than inherited info. Checkbox selection enables comparing any two runs. The compare view at /compare/:left/:right computes diffs client-side with clickable test names that drill into individual run comparison. Split-screen co-navigation shares tab state and filter controls between both panels. Metrics render as a pivoted table with per-device columns, sparklines, and a filter input. The logs sidebar is collapsible.

CI and tooling

patchbay upload replaces raw tar | curl, prints the direct run URL for PR comments, and creates run.json from CI environment variables when missing. The CI template installs via cargo binstall with --git-url and falls back to building from source.

Frando and others added 30 commits March 25, 2026 23:16
Rename the grouping concept from "invocation" to "batch" in all Rust,
TypeScript, and test code. Switch UI from HashRouter to BrowserRouter
with server-side SPA fallback routes.

Backward compatibility:
- /api/invocations/{name}/combined-results kept as alias
- /inv/* routes redirect to /batch/*

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ate patchbay-cli

Move all CLI parsing and dispatch from patchbay-runner and patchbay-vm
into a new patchbay-cli crate. The runner and vm crates become pure
libraries. VM commands are nested under `patchbay vm <subcommand>`.

- patchbay-runner: remove binary, expose sim module as pub
- patchbay-vm: add lib.rs re-exporting Backend, RunVmArgs, TestVmArgs
- patchbay-cli: unified CLI with feature flags (native, vm, serve)

Zero behavior change — same subcommands, same defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add device.record(key, value), device.metrics() builder, and
device.enter_tracing() for lightweight per-device metric collection.

Metrics are written to device.<name>.metrics.jsonl as JSONL lines with
format {"t":<epoch>,"m":{"key":value}}. Uses tracing infrastructure
with a patchbay::_metrics target, routed to per-namespace metrics files.

- Device/Router hold a clone of their namespace's tracing::Dispatch
- MetricsBuilder allows batch emission of multiple metrics
- patchbay-server recognizes .metrics.jsonl as LogKind::Metrics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delegates to cargo nextest (preferred) or cargo test on native Linux,
and to patchbay-vm's test flow when --vm is specified. Sets
RUSTFLAGS="--cfg patchbay_test" on all cargo invocations. Supports
--ignored, --ignored-only, package/test selectors, and extra args.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compare test results between git refs using worktrees. Supports one ref
(compare against worktree) or two refs.

- Creates git worktrees in .patchbay/tree/
- Runs tests sequentially in each
- Parses pass/fail/ignored results from cargo test output
- Prints summary table with fixes, regressions, and score
- Writes compare manifest to .patchbay/work/compare-{timestamp}/
- Cleans up worktrees if unchanged

Scoring: +3 per fix, -5 per regression, +/-1 for time improvement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract RunView component from App.tsx for reuse in compare mode
- MetricsTab: fetch/parse device.*.metrics.jsonl, show key/value table
  with inline SVG sparklines for multi-point metrics
- CompareView: load summary.json from compare batches, show summary bar
  (fixes, regressions, score) and per-test comparison table
- Add /compare/* route, auto-detect compare batches via summary.json
- Add metrics tab to run detail views

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fest

Add Upload subcommand that tars a directory and pushes to a
patchbay-server instance via the existing POST /api/push/{project}
endpoint. Uses PATCHBAY_URL, PATCHBAY_API_KEY, and PATCHBAY_PROJECT
env vars (also accepted as --url, --api-key, --project flags).

Also adds optional `project` field to CompareManifest, populated from
PATCHBAY_PROJECT env var when set.

CI usage:
  patchbay compare test --ref main
  patchbay upload .patchbay/work/compare-*/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code review fixes across all commits:

handles.rs:
- Extract record_metric() shared fn, eliminate Device/Router duplication

tracing.rs:
- Record visitor fields once at top of write_event_to_files

compare.rs + test.rs:
- Extract patchbay_rustflags() to util.rs, share between modules
- Extract test_index() and merged_names() helpers
- Remove redundant starts_with guard in parse_test_output

main.rs:
- Extract resolve_project_root() helper

Upload rewrite:
- Use reqwest + tar + flate2 instead of shelling out
- Auto-create run.json manifest from CI env vars (GITHUB_*)
- Move to dedicated upload.rs module

Other:
- Add cargo-binstall metadata to patchbay-cli and patchbay-server
- Fix CompareView sort to use localeCompare
- Fix MetricsTab React key to use device:key
- Fix App.tsx double runs.find()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract TestArgs as a clap::Args struct, flatten into both Command::Test
and CompareCommand::Test. Move VM dispatch logic to test::run_vm().
Update compare::run_tests_in_dir to accept &TestArgs directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…args

Path consolidation:
- .patchbay-work → .patchbay/work (all 7 CLI defaults + vm fallbacks)
- .qemu-vm → .patchbay/vm, .container-vm → .patchbay/vm

VmOps trait:
- Add VmOps trait in patchbay-vm with QemuBackend/ContainerBackend ZSTs
- Rewrite dispatch_vm to use trait methods instead of per-command match
- resolve_ops() factory returns Box<dyn VmOps>

DRY cargo args:
- Add TestArgs::into_vm_args() method for TestArgs → TestVmArgs conversion
- Use in both test::run_vm() and VmCommand::Test dispatch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixture crate (patchbay-cli/tests/fixtures/counter/):
- Two tests: udp_counter (sends/receives packets, records metric)
  and udp_threshold (asserts PACKET_COUNT >= THRESHOLD)

Integration test (compare_integration.rs, #[ignore]):
- Creates temp git repo with passing and regressing commits
- Runs patchbay compare test, asserts regressions detected

E2E test (ui/e2e/compare.spec.ts):
- Mock compare data, spawns patchbay serve, verifies CompareView

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r copy

CompareSummary:
- Split flat fields into left/right RunStats structs
- Use std::time::Duration with custom serde (serialize as ms)
- Update compare_results, print_summary, UI, and test mocks

VmOps → Backend methods:
- Remove VmOps trait, QemuBackend/ContainerBackend ZSTs, Box<dyn>
- Implement dispatch methods directly on Backend enum
- backend.resolve().up(recreate) instead of resolve_ops(b).up(recreate)

testdir:
- Copy target/testdir-current to .patchbay/work/testdir after native tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy fixture source + write Cargo.toml with absolute patchbay path
instead of trying to reconstruct the crate from scratch. Remove -p flag
since compare runs all tests in the workspace.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VmOps trait:
- Trait with ZST impls (Qemu, Container) that delegate to module fns
- Backend enum implements VmOps by dispatching to resolved ZST
- Backend::resolve() replaces free resolve_backend function

iroh-metrics:
- Optional feature iroh-metrics in patchbay crate
- device.record_iroh_metrics(&dyn MetricsGroup) iterates counters/gauges
  and emits via MetricsBuilder

inspect/run-in:
- Feature-gated on cfg(target_os = "linux") since they require namespaces

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tions

Use workspace.dependencies in root Cargo.toml for all internal crates.
Member crates now use { workspace = true } instead of relative paths.

Enhanced compare integration test with assertions for:
- Summary output in stdout
- Left/right RunStats (pass/fail/total counts)
- Per-test results (udp_threshold fails on right, passes on left)
- Worktree cleanup verification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set PATCHBAY_OUTDIR so the fixture's Lab writes device metrics.
After compare, find *.metrics.jsonl files and assert they contain
packet_count values matching the fixture's PACKET_COUNT constants.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…metadata

Fixture now uses testdir!() + Lab::with_opts(outdir) matching iroh
test patterns. Integration test uses cargo metadata to find
target_dir/testdir-current and validates device.sender.metrics.jsonl
contains packet_count values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stream stdout/stderr from cargo test in real time when -v is passed.
Uses piped stdio with reader threads so output is both printed live
and captured for result parsing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…st flags

- TestArgs.cargo_test_cmd_in() builds the full cargo test Command
- compare::run_tests_in_dir uses it instead of duplicating arg expansion
- Rename --ignored-only → --ignored, --ignored → --include-ignored
  to match cargo test flag names exactly
- Remove filter as positional; goes through -- like cargo test
- -v/--verbose is now global on Cli, works for all subcommands
- CompareCommand refs are positional: `patchbay compare test main [ref2]`
- Fixture uses ctor + init_userns() (safe version) for namespace bootstrap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set CARGO_TARGET_DIR per worktree in compare to prevent shared binary
  cache from masking source changes between refs
- Use --force in worktree cleanup to handle untracked target/ dirs
- Fixture uses ctor + init_userns() for namespace bootstrap
- Align --ignored/--include-ignored with cargo test flag names
- Refs are positional: `patchbay compare test main [ref2] [-- filter]`
- -v/--verbose is global on Cli struct

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move RunManifest to patchbay-utils with RunKind enum, chrono dates,
  Duration-as-ms serde, TestResult/TestStatus types, git_context(),
  resolve_ref(), find_run_for_commit(), parse_test_output()
- Rename runner's RunManifest → SimRunReport, dir prefix sim- → run-
- patchbay test: pipe output, write run.json to testdir, --persist flag
- Compare: check cached runs by commit SHA, --force-build/--no-ref-build
- Server: discover runs by run.json OR events.jsonl, batch→group rename,
  query params (project, kind, limit, offset), /compare/* SPA route
- UI: RunsIndex with filters/pagination/checkbox compare selection,
  CompareView at /compare/:left/:right with client-side diff,
  split-screen co-navigation, ComparePage route wrapper
- Backend::Auto panics if not resolved (was silent Qemu fallback)
- E2e tests updated for new UI layout and data model
- All 6 e2e tests pass, zero clippy warnings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gthen e2e

Naming consistency:
- Rename batch→group in all internal code (variables, types, functions, comments)
- Add /group/* and /api/groups/ as primary routes, keep /batch/* as alias
- Update docs and CI templates (.invocation → .group)

UI refactor:
- Split App.tsx (mode prop) into RunPage.tsx and BatchPage.tsx
- Extract RunSelector component with Selection type helpers
- Extract groupByGroup/simLabel to shared utils.ts
- Fix any types in CompareView (proper LabState, LabEvent[], SimResults)
- Add useMemo for availableTabs in RunView, useCallback in LogsTab
- Add dead-flag cleanup in MetricsTab fetch effect

E2e test improvements:
- 3 compare tests (regression, fix scenario, checkbox selection flow)
- Stronger push test assertions (verify manifest data in UI)
- Multi-sim test clicks through to verify topology
- Perf tab asserts data row presence
- All 8 tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
testdir-current is a symlink to testdir-N. cp -r copies the symlink
itself; cp -rL dereferences it and copies the actual directory contents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- parse_test_output now handles nextest format (PASS/FAIL/TIMEOUT/IGNORE
  with duration extraction) in addition to cargo test format
- Deduplicate test results by name (nextest reprints failures in summary)
- Server scan_runs_recursive: directories with run.json but no events.jsonl
  are groups (recurse into them), not leaf runs
- Fix push e2e test to match by group instead of name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Groups in RunsIndex are collapsed by default with expand/collapse toggle
- Child runs show their test/sim name (from label or path), not inherited
  manifest info (branch@commit which was repeated for every child)
- Group header shows manifest summary inline (project, branch, commit,
  kind, outcome, pass/fail counts)
- Group header has checkbox for group-level compare selection
- E2e tests updated to expand groups before clicking child runs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compare view:
- Concise single-line header for individual runs (name + pass/fail icons)
- Concise header for groups (refs + pass/fail counts + regression/fix count)
- Clickable test names in per-test table navigate to individual compare

Metrics tab:
- Pivoted table: rows=metric keys, columns=devices
- Filter input to search by metric key name
- Accepts shared filter prop for compare mode

Shared controls architecture:
- Compare split-screen lifts log filter/levels and metrics filter state
- SharedControlsBar renders once above both panels
- LogsTab and MetricsTab use external state when provided, hide internal controls

Logs tab:
- Collapsible sidebar with toggle button

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CI template uses cargo binstall + patchbay test --persist + patchbay upload
- Fix binstall bin-dir to match release asset naming ({bin}-{target})
- Fix release workflow to build patchbay-cli (was patchbay-runner)
- Add patchbay-serve build to release workflow
- Update docs/guide/testing.md with simplified upload example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Individual lab runs don't have run.json so outcome showed as "unknown".
Just show the test name — the side-by-side view below has all details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Frando and others added 7 commits March 26, 2026 14:06
Upload now prints the full view URL (e.g. https://pb.example.com/run/name)
to stdout. The CI template captures it via GITHUB_OUTPUT and links
directly to the run in the PR comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pkg-url used { version } which resolves to 0.1.0, but releases use
a force-moved rolling tag. Hardcode rolling in the download path.

CI template installs cargo-binstall first, then uses --git-url since the
crate is not published on crates.io.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- git_context() now checks both unstaged and staged changes for dirty flag
- dir_size() handles file_type() errors gracefully instead of panicking
- test.rs warns on run.json write failures instead of silently ignoring
- Remove duplicate TestResult interface in api.ts
- Remove unused statusIcon function in CompareView.tsx
- Fix stale comment in InvRedirect

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cargo fmt across all changed Rust files
- Remove /batch/ SPA and API routes, /group/ is now canonical
- Rename BatchPage → GroupPage
- Harden push tar extraction: manual entry iteration with path checks,
  reject absolute paths and .. components, disable permissions/xattrs
- Individual run compare shows both sides + back link to group compare

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The topbar now shows context like "iroh · main@fc654dd vs main@abc1234"
so you know which project and refs you're comparing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant