feat: unified CLI, compare mode, metrics collection#9
Open
Conversation
Rename the grouping concept from "invocation" to "batch" in all Rust,
TypeScript, and test code. Switch UI from HashRouter to BrowserRouter
with server-side SPA fallback routes.
Backward compatibility:
- /api/invocations/{name}/combined-results kept as alias
- /inv/* routes redirect to /batch/*
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ate patchbay-cli Move all CLI parsing and dispatch from patchbay-runner and patchbay-vm into a new patchbay-cli crate. The runner and vm crates become pure libraries. VM commands are nested under `patchbay vm <subcommand>`. - patchbay-runner: remove binary, expose sim module as pub - patchbay-vm: add lib.rs re-exporting Backend, RunVmArgs, TestVmArgs - patchbay-cli: unified CLI with feature flags (native, vm, serve) Zero behavior change — same subcommands, same defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add device.record(key, value), device.metrics() builder, and
device.enter_tracing() for lightweight per-device metric collection.
Metrics are written to device.<name>.metrics.jsonl as JSONL lines with
format {"t":<epoch>,"m":{"key":value}}. Uses tracing infrastructure
with a patchbay::_metrics target, routed to per-namespace metrics files.
- Device/Router hold a clone of their namespace's tracing::Dispatch
- MetricsBuilder allows batch emission of multiple metrics
- patchbay-server recognizes .metrics.jsonl as LogKind::Metrics
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delegates to cargo nextest (preferred) or cargo test on native Linux, and to patchbay-vm's test flow when --vm is specified. Sets RUSTFLAGS="--cfg patchbay_test" on all cargo invocations. Supports --ignored, --ignored-only, package/test selectors, and extra args. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compare test results between git refs using worktrees. Supports one ref
(compare against worktree) or two refs.
- Creates git worktrees in .patchbay/tree/
- Runs tests sequentially in each
- Parses pass/fail/ignored results from cargo test output
- Prints summary table with fixes, regressions, and score
- Writes compare manifest to .patchbay/work/compare-{timestamp}/
- Cleans up worktrees if unchanged
Scoring: +3 per fix, -5 per regression, +/-1 for time improvement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract RunView component from App.tsx for reuse in compare mode - MetricsTab: fetch/parse device.*.metrics.jsonl, show key/value table with inline SVG sparklines for multi-point metrics - CompareView: load summary.json from compare batches, show summary bar (fixes, regressions, score) and per-test comparison table - Add /compare/* route, auto-detect compare batches via summary.json - Add metrics tab to run detail views Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fest
Add Upload subcommand that tars a directory and pushes to a
patchbay-server instance via the existing POST /api/push/{project}
endpoint. Uses PATCHBAY_URL, PATCHBAY_API_KEY, and PATCHBAY_PROJECT
env vars (also accepted as --url, --api-key, --project flags).
Also adds optional `project` field to CompareManifest, populated from
PATCHBAY_PROJECT env var when set.
CI usage:
patchbay compare test --ref main
patchbay upload .patchbay/work/compare-*/
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code review fixes across all commits: handles.rs: - Extract record_metric() shared fn, eliminate Device/Router duplication tracing.rs: - Record visitor fields once at top of write_event_to_files compare.rs + test.rs: - Extract patchbay_rustflags() to util.rs, share between modules - Extract test_index() and merged_names() helpers - Remove redundant starts_with guard in parse_test_output main.rs: - Extract resolve_project_root() helper Upload rewrite: - Use reqwest + tar + flate2 instead of shelling out - Auto-create run.json manifest from CI env vars (GITHUB_*) - Move to dedicated upload.rs module Other: - Add cargo-binstall metadata to patchbay-cli and patchbay-server - Fix CompareView sort to use localeCompare - Fix MetricsTab React key to use device:key - Fix App.tsx double runs.find() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract TestArgs as a clap::Args struct, flatten into both Command::Test and CompareCommand::Test. Move VM dispatch logic to test::run_vm(). Update compare::run_tests_in_dir to accept &TestArgs directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…args Path consolidation: - .patchbay-work → .patchbay/work (all 7 CLI defaults + vm fallbacks) - .qemu-vm → .patchbay/vm, .container-vm → .patchbay/vm VmOps trait: - Add VmOps trait in patchbay-vm with QemuBackend/ContainerBackend ZSTs - Rewrite dispatch_vm to use trait methods instead of per-command match - resolve_ops() factory returns Box<dyn VmOps> DRY cargo args: - Add TestArgs::into_vm_args() method for TestArgs → TestVmArgs conversion - Use in both test::run_vm() and VmCommand::Test dispatch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixture crate (patchbay-cli/tests/fixtures/counter/): - Two tests: udp_counter (sends/receives packets, records metric) and udp_threshold (asserts PACKET_COUNT >= THRESHOLD) Integration test (compare_integration.rs, #[ignore]): - Creates temp git repo with passing and regressing commits - Runs patchbay compare test, asserts regressions detected E2E test (ui/e2e/compare.spec.ts): - Mock compare data, spawns patchbay serve, verifies CompareView Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r copy CompareSummary: - Split flat fields into left/right RunStats structs - Use std::time::Duration with custom serde (serialize as ms) - Update compare_results, print_summary, UI, and test mocks VmOps → Backend methods: - Remove VmOps trait, QemuBackend/ContainerBackend ZSTs, Box<dyn> - Implement dispatch methods directly on Backend enum - backend.resolve().up(recreate) instead of resolve_ops(b).up(recreate) testdir: - Copy target/testdir-current to .patchbay/work/testdir after native tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy fixture source + write Cargo.toml with absolute patchbay path instead of trying to reconstruct the crate from scratch. Remove -p flag since compare runs all tests in the workspace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VmOps trait: - Trait with ZST impls (Qemu, Container) that delegate to module fns - Backend enum implements VmOps by dispatching to resolved ZST - Backend::resolve() replaces free resolve_backend function iroh-metrics: - Optional feature iroh-metrics in patchbay crate - device.record_iroh_metrics(&dyn MetricsGroup) iterates counters/gauges and emits via MetricsBuilder inspect/run-in: - Feature-gated on cfg(target_os = "linux") since they require namespaces Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tions
Use workspace.dependencies in root Cargo.toml for all internal crates.
Member crates now use { workspace = true } instead of relative paths.
Enhanced compare integration test with assertions for:
- Summary output in stdout
- Left/right RunStats (pass/fail/total counts)
- Per-test results (udp_threshold fails on right, passes on left)
- Worktree cleanup verification
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set PATCHBAY_OUTDIR so the fixture's Lab writes device metrics. After compare, find *.metrics.jsonl files and assert they contain packet_count values matching the fixture's PACKET_COUNT constants. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…metadata Fixture now uses testdir!() + Lab::with_opts(outdir) matching iroh test patterns. Integration test uses cargo metadata to find target_dir/testdir-current and validates device.sender.metrics.jsonl contains packet_count values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stream stdout/stderr from cargo test in real time when -v is passed. Uses piped stdio with reader threads so output is both printed live and captured for result parsing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…st flags - TestArgs.cargo_test_cmd_in() builds the full cargo test Command - compare::run_tests_in_dir uses it instead of duplicating arg expansion - Rename --ignored-only → --ignored, --ignored → --include-ignored to match cargo test flag names exactly - Remove filter as positional; goes through -- like cargo test - -v/--verbose is now global on Cli, works for all subcommands - CompareCommand refs are positional: `patchbay compare test main [ref2]` - Fixture uses ctor + init_userns() (safe version) for namespace bootstrap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set CARGO_TARGET_DIR per worktree in compare to prevent shared binary cache from masking source changes between refs - Use --force in worktree cleanup to handle untracked target/ dirs - Fixture uses ctor + init_userns() for namespace bootstrap - Align --ignored/--include-ignored with cargo test flag names - Refs are positional: `patchbay compare test main [ref2] [-- filter]` - -v/--verbose is global on Cli struct Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move RunManifest to patchbay-utils with RunKind enum, chrono dates, Duration-as-ms serde, TestResult/TestStatus types, git_context(), resolve_ref(), find_run_for_commit(), parse_test_output() - Rename runner's RunManifest → SimRunReport, dir prefix sim- → run- - patchbay test: pipe output, write run.json to testdir, --persist flag - Compare: check cached runs by commit SHA, --force-build/--no-ref-build - Server: discover runs by run.json OR events.jsonl, batch→group rename, query params (project, kind, limit, offset), /compare/* SPA route - UI: RunsIndex with filters/pagination/checkbox compare selection, CompareView at /compare/:left/:right with client-side diff, split-screen co-navigation, ComparePage route wrapper - Backend::Auto panics if not resolved (was silent Qemu fallback) - E2e tests updated for new UI layout and data model - All 6 e2e tests pass, zero clippy warnings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…gthen e2e Naming consistency: - Rename batch→group in all internal code (variables, types, functions, comments) - Add /group/* and /api/groups/ as primary routes, keep /batch/* as alias - Update docs and CI templates (.invocation → .group) UI refactor: - Split App.tsx (mode prop) into RunPage.tsx and BatchPage.tsx - Extract RunSelector component with Selection type helpers - Extract groupByGroup/simLabel to shared utils.ts - Fix any types in CompareView (proper LabState, LabEvent[], SimResults) - Add useMemo for availableTabs in RunView, useCallback in LogsTab - Add dead-flag cleanup in MetricsTab fetch effect E2e test improvements: - 3 compare tests (regression, fix scenario, checkbox selection flow) - Stronger push test assertions (verify manifest data in UI) - Multi-sim test clicks through to verify topology - Perf tab asserts data row presence - All 8 tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
testdir-current is a symlink to testdir-N. cp -r copies the symlink itself; cp -rL dereferences it and copies the actual directory contents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- parse_test_output now handles nextest format (PASS/FAIL/TIMEOUT/IGNORE with duration extraction) in addition to cargo test format - Deduplicate test results by name (nextest reprints failures in summary) - Server scan_runs_recursive: directories with run.json but no events.jsonl are groups (recurse into them), not leaf runs - Fix push e2e test to match by group instead of name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Groups in RunsIndex are collapsed by default with expand/collapse toggle - Child runs show their test/sim name (from label or path), not inherited manifest info (branch@commit which was repeated for every child) - Group header shows manifest summary inline (project, branch, commit, kind, outcome, pass/fail counts) - Group header has checkbox for group-level compare selection - E2e tests updated to expand groups before clicking child runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compare view: - Concise single-line header for individual runs (name + pass/fail icons) - Concise header for groups (refs + pass/fail counts + regression/fix count) - Clickable test names in per-test table navigate to individual compare Metrics tab: - Pivoted table: rows=metric keys, columns=devices - Filter input to search by metric key name - Accepts shared filter prop for compare mode Shared controls architecture: - Compare split-screen lifts log filter/levels and metrics filter state - SharedControlsBar renders once above both panels - LogsTab and MetricsTab use external state when provided, hide internal controls Logs tab: - Collapsible sidebar with toggle button Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CI template uses cargo binstall + patchbay test --persist + patchbay upload
- Fix binstall bin-dir to match release asset naming ({bin}-{target})
- Fix release workflow to build patchbay-cli (was patchbay-runner)
- Add patchbay-serve build to release workflow
- Update docs/guide/testing.md with simplified upload example
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Individual lab runs don't have run.json so outcome showed as "unknown". Just show the test name — the side-by-side view below has all details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upload now prints the full view URL (e.g. https://pb.example.com/run/name) to stdout. The CI template captures it via GITHUB_OUTPUT and links directly to the run in the PR comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pkg-url used { version } which resolves to 0.1.0, but releases use
a force-moved rolling tag. Hardcode rolling in the download path.
CI template installs cargo-binstall first, then uses --git-url since the
crate is not published on crates.io.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- git_context() now checks both unstaged and staged changes for dirty flag - dir_size() handles file_type() errors gracefully instead of panicking - test.rs warns on run.json write failures instead of silently ignoring - Remove duplicate TestResult interface in api.ts - Remove unused statusIcon function in CompareView.tsx - Fix stale comment in InvRedirect Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cargo fmt across all changed Rust files - Remove /batch/ SPA and API routes, /group/ is now canonical - Rename BatchPage → GroupPage - Harden push tar extraction: manual entry iteration with path checks, reject absolute paths and .. components, disable permissions/xattrs - Individual run compare shows both sides + back link to group compare Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The topbar now shows context like "iroh · main@fc654dd vs main@abc1234" so you know which project and refs you're comparing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Unified CLI
We consolidated
patchbay-runnerandpatchbay-vminto libraries behind a singlepatchbay-clicrate. Thepatchbaybinary auto-detects native vs VM backends, with VM commands nested underpatchbay vm. All paths consolidate under.patchbay/— work output, VM state, and worktrees each get their own subdirectory.Compare mode
patchbay compare test <ref> [ref2]builds and runs tests in git worktrees, then diffs per-test pass/fail results with a regression score. Cached runs are matched by commit SHA so repeated comparisons skip the build. The CLI parses bothcargo testandcargo nextestoutput formats.Per-device metrics
Device::record()and the builder patternDevice::metrics().record(k, v).emit()write timestamped JSONL via tracing, routed to per-devicemetrics.jsonlfiles. An optionaliroh-metricsfeature enables directMetricsGroupemission.Run manifests
A single
RunManifestinpatchbay-utilsreplaces three prior definitions. Every test and sim run writesrun.jsonwith git context, outcome, and per-test results.patchbay test --persistcopies testdir output into.patchbay/work/for later comparison or upload.Server
The server discovers leaf runs by
events.jsonland groups byrun.json, recursing into group directories to find children.GET /api/runssupports project, kind, limit, and offset filtering. The "batch" concept is replaced by "group" throughout, with serde aliases for backward compatibility.UI
The runs index shows collapsible groups with manifest metadata, and child runs display their test name rather than inherited info. Checkbox selection enables comparing any two runs. The compare view at
/compare/:left/:rightcomputes diffs client-side with clickable test names that drill into individual run comparison. Split-screen co-navigation shares tab state and filter controls between both panels. Metrics render as a pivoted table with per-device columns, sparklines, and a filter input. The logs sidebar is collapsible.CI and tooling
patchbay uploadreplaces rawtar | curl, prints the direct run URL for PR comments, and createsrun.jsonfrom CI environment variables when missing. The CI template installs viacargo binstallwith--git-urland falls back to building from source.