Skip to content

Commit 47cede6

Browse files
noahgiftclaude
andcommitted
docs: add Rule 8 (CI Gate Completeness) + update binary exceptions (pv, ttop)
Rule 8 addresses the five-whys root cause of PR #726 breaking main: the spec said "CI must pass" but ci/gate silently skipped security. Now all 4 quality dimensions (test, lint, coverage, security) block merge. Also: workspace-test added to branch protection required checks. Binary audit: pv and ttop are permanent standalone tool exceptions, not legacy-to-migrate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 678272f commit 47cede6

3 files changed

Lines changed: 94 additions & 15 deletions

File tree

.github/workflows/ci.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
repo: ${{ github.event.repository.name }}
2929
secrets: inherit
3030

31-
# APR-MONO: Workspace-wide test (all 70 crates)
31+
# APR-MONO: Workspace-wide test (all 75 crates)
3232
workspace-test:
3333
runs-on: self-hosted
3434
container:
@@ -48,6 +48,14 @@ jobs:
4848
cargo test -p aprender-core --test monorepo_invariants
4949
cargo test -p aprender-core --test readme_contract
5050
cargo test -p apr-cli --test cli_commands
51+
- name: Fix file ownership (container runs as root, runner as noah:1000)
52+
if: always()
53+
run: |
54+
# Five-whys: Docker container creates files as root on bind-mounted
55+
# workspace. Runner (noah:1000) can't git-clean them on next run
56+
# → checkout fails → CI breaks. This runs inside the container
57+
# (as root) restoring host ownership for subsequent bare-metal jobs.
58+
chown -R 1000:1000 "$GITHUB_WORKSPACE" || true
5159
5260
mutants:
5361
runs-on: self-hosted

contracts/apr-mono-binary-rule-v1.yaml

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,37 +17,38 @@ equations:
1717
For all crates C in workspace:
1818
C.has_user_facing_binary => C == apr-cli
1919
"User-facing" = installed by `cargo install aprender`, has --help
20-
"Build-tool" = standalone dev tooling (pv)
20+
"Standalone" = independent tool with its own install/use case (pv, ttop)
2121
"Internal" = called by apr-cli as subprocess, or dev-only
2222
"Legacy" = pre-merge binary, functionality subsumed by `apr` subcommand
2323
domain: all workspace Cargo.toml files
2424
codomain: bool
2525
invariants:
26-
- "apr (from apr-cli) is the ONLY user-facing binary"
27-
- "pv (from aprender-contracts-cli) is the only build-tool exception"
26+
- "apr (from apr-cli) is the ONLY user-facing ML binary"
27+
- "pv (aprender-contracts-cli) is a standalone build tool — not an apr subcommand"
28+
- "ttop (aprender-viz-ttop) is a standalone system monitor — not an apr subcommand"
2829
- "Internal binaries must document their justification"
2930
- "Legacy binaries must have a migration plan to apr subcommand"
3031

3132
binary_audit_2026_04_10:
3233
formula: |
33-
22 crates with [[bin]], 24 total binary targets.
34+
20 crates with [[bin]], 21 total binary targets.
3435
Classification:
3536
3637
USER-FACING (1 crate, 1 binary):
37-
apr-cli -> apr # THE user-facing binary
38+
apr-cli -> apr # THE user-facing ML binary
3839
39-
BUILD-TOOL (1 crate, 1 binary):
40-
aprender-contracts-cli -> pv # Contract validation (Rule 2 exception)
40+
STANDALONE TOOLS (2 crates, 2 binaries — permanent exceptions):
41+
aprender-contracts-cli -> pv # Build tool: contract validation, pre-commit, CI
42+
aprender-viz-ttop -> aprender-viz-ttop # System monitor: GPU/CPU, independent of ML workflow
4143
42-
INTERNAL-HELPER (8 crates, 9 binaries):
44+
INTERNAL-HELPER (7 crates, 8 binaries):
4345
aprender-cbtop -> aprender-cbtop # GPU monitor, called by `apr monitor`
4446
aprender-present-cli -> presentar # TUI server, called by `apr present`
4547
aprender-present-terminal -> score, ptop # TUI widgets, used by presentar
4648
aprender-ptx-debug -> aprender-ptx-debug # CUDA PTX debugger (dev-only)
4749
aprender-test-cli -> aprender-test-cli # WASM test runner (dev-only)
4850
aprender-train-bench -> aprender-train-bench # Training benchmark harness
4951
aprender-train-shell -> aprender-train-shell # Interactive training REPL
50-
aprender-viz-ttop -> aprender-viz-ttop # Standalone ttop (excluded from workspace)
5152
5253
QA-TOOL (2 crates, 2 binaries — from Phase 2g port):
5354
aprender-qa-cli -> apr-qa # QA playbook runner (to wire into `apr qa`)

docs/specifications/aprender-monorepo-consolidation.md

Lines changed: 75 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@
8484
| ~~`#[contract]` on CLI commands~~ | 172 annotations workspace-wide (70 in apr-cli). PMAT-543 closed. |
8585
| ~~Phase 2g: QA playbook~~ | 5 crates ported, 2,792 tests, 256 playbooks. PMAT-532 closed. |
8686
| ~~Model type taxonomy~~ | `is_llm()` + 3 new Architecture variants + import guards. PMAT-526 closed. |
87-
| ~~24 unauthorized binaries~~ | 20 crates, 21 binaries. 2 migrated to `[[example]]` (serve, train). 8 legacy remain. PMAT-545. |
87+
| ~~24 unauthorized binaries~~ | 20 crates, 21 binaries: 1 user-facing (`apr`), 2 standalone (`pv`, `ttop`), 8 legacy. PMAT-545. |
8888
| ~~ratatui migration~~ | 0 deps remain. PMAT-539 closed. |
8989

9090
**Open gaps (2 of 9):**
@@ -262,12 +262,19 @@ Every `aprender-*` crate is a **library**. They expose `pub fn` APIs consumed
262262
by `apr-cli` or by external Rust code via `use aprender_compute::*;`.
263263
They do NOT produce binaries, CLIs, or executables.
264264

265-
Exception: `aprender-contracts-cli` may produce a `pv` binary for standalone
266-
contract validation (build tooling, not user-facing ML tooling).
265+
**Exceptions** (2 standalone binaries that MUST remain separate):
267266

268-
**Status (2026-04-10)**: AUDITED — 22 crates have `[[bin]]` targets (24 total).
267+
1. **`pv`** (`aprender-contracts-cli`) — standalone contract validation tool.
268+
Used in build.rs, pre-commit hooks, and CI pipelines independently of `apr`.
269+
Cannot be an `apr` subcommand because it's a build dependency, not a runtime tool.
270+
271+
2. **`ttop`** (`aprender-viz-ttop`) — standalone terminal system monitor.
272+
Used independently for GPU/CPU monitoring outside the ML workflow.
273+
Separate binary provides zero-dep install for sysadmin use cases.
274+
275+
**Status (2026-04-12)**: AUDITED — 20 crates have `[[bin]]` targets (21 total).
269276
Classified in `apr-mono-binary-rule-v1.yaml` v2.0: 1 user-facing (`apr`),
270-
1 build-tool (`pv`), 9 internal-helpers, 2 QA-tools, 11 legacy-to-migrate.
277+
2 standalone tools (`pv`, `ttop`), 9 internal-helpers, 2 QA-tools, 8 legacy-to-migrate.
271278
Legacy binaries have `apr` subcommand migration paths documented. PMAT-545 closed.
272279

273280
### Rule 3: CLI Contract Coverage
@@ -384,6 +391,69 @@ The CUDA executor tries to initialize; on failure, falls through to CPU.
384391

385392
**Contract**: `contracts/apr-cli-command-safety-v1.yaml``long_running_graceful` equation.
386393

394+
### Rule 8: CI Gate Completeness
395+
396+
**The `ci / gate` check MUST verify ALL quality dimensions. No silent skips.**
397+
398+
This rule was added after PR #726 exposed a spec-implementation gap: the spec
399+
said "CI must pass before merge" but `ci / gate` only checked 3 of 6 CI jobs,
400+
allowing security failures to merge silently.
401+
402+
#### Five-Whys (2026-04-12)
403+
404+
| # | Question | Answer |
405+
|---|----------|--------|
406+
| 1 | Why is main broken after PR merge? | `ci / security` fails on main |
407+
| 2 | Why did security-failing PR merge? | `ci / security` is not a required check |
408+
| 3 | Why isn't security required? | `ci / gate` doesn't check `needs.security.result` |
409+
| 4 | Why doesn't gate check security? | Security was `continue-on-error: true` ("informational") |
410+
| 5 | Why was this accepted? | Spec says "CI must pass" but doesn't define which checks |
411+
412+
**Root cause**: Spec-implementation gap. The spec promises "CI must pass before
413+
merge" but doesn't enumerate which jobs constitute "CI". Branch protection
414+
enforces `ci / gate` only, and the gate silently skips security.
415+
416+
#### Required CI jobs (MUST block merge)
417+
418+
| Job | What it validates | Runner | Required? |
419+
|-----|-------------------|--------|-----------|
420+
| `ci / test` | `cargo test --workspace --lib` | `[self-hosted, clean-room]` | **YES** |
421+
| `ci / lint` | `cargo clippy` + `cargo fmt` | `[self-hosted, clean-room]` | **YES** |
422+
| `ci / coverage` | Coverage report generation | `[self-hosted, clean-room]` | **YES** |
423+
| `ci / security` | `cargo audit` (exemptions via `.cargo/audit.toml`) | `[self-hosted, clean-room]` | **YES** — was ubuntu-latest + informational, now self-hosted + mandatory |
424+
| `ci / gate` | Aggregates test + lint + coverage + security | `[self-hosted, clean-room]` | **YES** — branch protection required check |
425+
| `workspace-test` | Full workspace test + integration tests | `self-hosted` + container | **YES** — branch protection required check |
426+
| `ci / provenance` | SLSA attestation | `ubuntu-latest` | NO — informational (no code execution) |
427+
| `ci / bench` | Performance benchmarks | `[self-hosted, clean-room]` | NO — main-only |
428+
429+
**Sovereign runner policy**: ALL jobs that execute repo code MUST run on
430+
`[self-hosted, clean-room]` or `self-hosted` with the sovereign container.
431+
Only metadata jobs (provenance attestation) may use `ubuntu-latest`.
432+
This ensures consistent hardware (Intel + GPU), local container registry
433+
(`localhost:5000/sovereign-ci:stable`), and NVMe RAID storage.
434+
435+
#### Implementation
436+
437+
The `ci / gate` job in `paiml/.github/sovereign-ci.yml` must check:
438+
439+
```yaml
440+
# BEFORE (broken — silently skips security):
441+
if needs.test.result == "failure" → fail
442+
if needs.lint.result == "failure" → fail
443+
if needs.coverage.result == "failure" → fail
444+
445+
# AFTER (complete — all quality dimensions):
446+
if needs.test.result == "failure" → fail
447+
if needs.lint.result == "failure" → fail
448+
if needs.coverage.result == "failure" → fail
449+
if needs.security.result == "failure" → fail # ADDED
450+
```
451+
452+
Branch protection must also add `workspace-test` as a required check
453+
(it's a local job in `ci.yml`, not part of the reusable workflow).
454+
455+
**Contract**: `contracts/ci-security-audit-v1.yaml` (FALSIFY-AUDIT-001, FALSIFY-AUDIT-002).
456+
387457
---
388458

389459
### Citations

0 commit comments

Comments
 (0)