pb-spec is a CLI tool that installs AI coding assistant skills into your project. It provides a structured workflow — init → plan → build — that turns natural-language requirements into implemented, tested code through AI agent prompts.
pb-spec follows a harness-first philosophy: reliability comes from process design, explicit checks, and recoverability, not from assuming one-shot model correctness.
| Source | Core Idea | How pb-spec Applies It |
|---|---|---|
| RPI Strategy | Separate research, planning, and implementation | /pb-init + /pb-plan precede /pb-build |
| Plan-and-Solve Prompting | Plan first to reduce missing-step errors | design.md + tasks.md are mandatory artifacts |
| ReAct | Interleave reasoning and actions with environment feedback | /pb-build executes task-by-task with test/tool feedback loops |
| Reflexion | Learn from failure signals via iterative retries | Retry/skip/abort and DCR flow in pb-build |
| Harness Engineering (OpenAI, 2026-02-11) | Treat runtime signals and checklists as first-class harness inputs | pb-plan requires runtime verification hooks; pb-build validates logs/health evidence before task closure |
| openai/symphony | Long-running agents need explicit observability and deterministic escalation | pb-build enforces bounded retries and emits standardized DCR packets for pb-refine |
| Effective Harnesses for Long-Running Agents | Grounding, context hygiene, recovery, observability | State checks, minimal context handoff, task-local rollback guidance |
| Building Effective Agents | Prefer simple composable workflows over framework complexity | Small adapter-based CLI + explicit workflow prompts |
| Stop Using /init for AGENTS.md | Keep AGENTS.md focused and maintainable | /pb-init updates a managed snapshot block in AGENTS.md while preserving all user-authored constraints outside that block |
| Ensuring Correctness Through the Type System | Use the type system to encode invariants and catch errors early | Encode contracts as type-level assertions in design.md and add type checks to verification; pb-plan adds type guidance and pb-build runs the type checker when applicable. |
- Context Before Code:
/pb-initand/pb-planestablish project and requirement context before implementation starts. - Type System: pb-spec recommends explicit type annotations and type-level contracts in Architecture Decisions and verification. For strongly-typed projects,
pb-plansuggests type contracts and adds the project's type-check command to verification sopb-buildruns it in Phase 0 and at task closure. - Behavior Before Code:
/pb-planturns user-visible requirements into Gherkin.featurescenarios before implementation begins. - Verification by Design: Planning requires explicit verification commands so completion is measurable.
- Observability as Context: Service-facing tasks must capture runtime evidence (log tails and/or health probes), not only test output.
- Architecture Before Implementation:
/pb-initsnapshots established architecture decisions,/pb-planrecords explicitArchitecture Decisions, and/pb-buildexecutes against that contract instead of inventing a new one. - Double-Loop Execution:
/pb-buildenforces a BDD outer loop plus a TDD inner loop with per-task status tracking. - Escalation Over Thrashing: Three consecutive failures suspend the current task and route a standardized DCR packet to
/pb-refine. - Safe Failure Recovery: Failed attempts use scoped recovery guidance to avoid polluting unrelated workspace state.
- Composable Architecture: Platform differences stay in adapters; workflow semantics stay in shared templates.
- 4 agent skills:
pb-init,pb-plan,pb-refine,pb-build— covering project analysis, Gherkin-first design planning, iterative refinement, and BDD+TDD implementation - 5 platforms: Claude Code, VS Code Copilot, OpenCode, Gemini CLI, Codex
- Zero config: run
pb-spec initand start using AI prompts immediately - Idempotent: safe to re-run; use
--forceto overwrite existing files - Built with: Python 3.12+, click, uv
# Recommended
uv tool install pb-spec
# Alternative
pipx install pb-spec# 1. Install skills/prompts for your AI tool
cd my-project
pb-spec init --ai claude # or: copilot, opencode, gemini, codex, all
pb-spec init --ai all -g # install globally to each agent's home/config dir
# 2. Open the project in your AI coding assistant and use the installed commands/prompts:
# /pb-init → Audit repo, append/update a managed AGENTS.md snapshot block (non-destructive)
# /pb-plan Add WebSocket auth → Generate design/tasks/features spec artifacts
# /pb-refine add-websocket-auth → (Optional) Refine design based on feedback
# /pb-build add-websocket-auth → Implement tasks via BDD outer loop + TDD inner loop
#
# Note for Codex: prompts are loaded from .codex/prompts and typically run via /prompts:<name>.| AI Tool | Target Directory | File Format |
|---|---|---|
| Claude Code | .claude/skills/pb-<name>/SKILL.md |
YAML frontmatter + Markdown |
| VS Code Copilot | .github/prompts/pb-<name>.prompt.md |
Markdown (no frontmatter) |
| OpenCode | .opencode/skills/pb-<name>/SKILL.md |
YAML frontmatter + Markdown |
| Gemini CLI | .gemini/commands/pb-<name>.toml |
TOML (description + prompt) |
| Codex | .codex/prompts/pb-<name>.md |
YAML frontmatter + Markdown |
pb-spec init --ai <platform> [-g, --global] [--force]
Install skill files into the current project, or into global agent config directories with -g.
--ai— Target platform:claude,copilot,opencode,gemini,codex, orall-g, --global— Install into each AI tool's home/config directory (instead of current project)--force— Overwrite existing files
pb-spec version
Print the installed pb-spec version.
pb-spec update
Update pb-spec to the latest version (requires uv).
four agent skills that chain together:
/pb-init → /pb-plan → [/pb-refine] → /pb-build
Audits your project and writes a pb-init snapshot into AGENTS.md using managed markers:
<!-- BEGIN PB-INIT MANAGED BLOCK --><!-- END PB-INIT MANAGED BLOCK -->
Merge behavior is non-destructive:
- If markers exist, only that managed block is replaced.
- If markers do not exist, the managed block is appended.
- All existing content outside the managed block is preserved verbatim.
This design avoids relying on any fixed AGENTS.md section layout and protects user-maintained constraints across re-runs.
The managed snapshot now also includes an Architecture Decision Snapshot so later agents inherit repo-level conventions instead of re-deciding them every run. Typical entries include established patterns, dependency-injection boundaries, error-handling conventions, and workflow/state-modeling rules.
This stronger contract does not add a new command or side-channel validator. The existing markdown workflow remains the source of truth, with AGENTS.md carrying repo-level constraints forward into planning, refinement, and execution.
Takes a natural-language requirement and produces a complete feature spec:
specs/<YYYY-MM-DD-NO-feature-name>/
├── design.md # Architecture, API contracts, data models
├── tasks.md # Ordered implementation tasks (logical units of work)
└── features/ # Gherkin acceptance artifacts
The spec directory follows the naming format YYYY-MM-DD-NO-feature-name (e.g., 2026-02-15-01-add-websocket-auth). The feature-name part must be unique across all specs. During planning, AGENTS.md is treated as read-only policy context (free-form, no fixed layout assumptions). pb-plan also maps the primary repo language to a BDD runner:
- TypeScript/JavaScript →
@cucumber/cucumber - Python →
behave - Rust →
cucumber
It also performs two additional planning audits before implementation starts:
- Template identity alignment: if the repo still contains generic crate/package/module names from a scaffold,
pb-planmust front-load renaming those identifiers to project-matching names. - Risk-based advanced testing: property testing is planned by default for broad input-domain logic, while fuzzing and benchmarks are added only when the feature profile justifies them. Tool selection follows repo language conventions:
Hypothesis/fast-check/proptest,Atheris/jazzer.js/cargo-fuzz, andpytest-benchmark/Vitest Bench/criterion.
It also adds an explicit Architecture Decisions section to design.md. For work that introduces a new boundary or is likely to exceed 200 lines, planning must evaluate SRP, DIP, and the classic patterns Factory, Strategy, Observer, Adapter, and Decorator. The chosen pattern must be justified against alternatives and checked against the code-simplification lens so the design stays simpler, not just more abstract.
The resulting design.md, tasks.md, and features/*.feature files are also the workflow's type carrier in plain markdown. pb-plan keeps the current artifact family and command surface, but those artifacts now need explicit contract fields so downstream stages can validate readiness without inventing a separate YAML or JSON schema.
Reads user feedback or Design Change Requests (from failed builds, including standardized 3-failure build-block packets) and intelligently updates design.md and tasks.md. It maintains a revision history and cascades design changes to the task list without overwriting completed work. AGENTS.md remains read-only in this phase.
/pb-refine stays on the same workflow and packet family. It now validates 🛑 Build Blocked and 🔄 Design Change Request markdown packets for required sections such as failure evidence and impact before it updates affected spec artifacts.
Reads specs/<YYYY-MM-DD-NO-feature-name>/tasks.md and implements each task sequentially. Every BDD+TDD task is executed by a fresh subagent following an outside-in double loop: run the Gherkin scenario first so the BDD outer loop is red, drive the implementation with TDD (Red → Green → Refactor) in the inner loop, then re-run the scenario until it passes. Runtime verification (log/health evidence when applicable) still applies. Supports Design Change Requests if the planned design proves infeasible during implementation, and auto-escalates to DCR after three consecutive task failures. Only the <feature-name> part is needed when invoking — the agent resolves the full directory automatically. AGENTS.md is read-only unless the user explicitly requests an AGENTS.md change.
/pb-build is now explicitly architecture-bound: it reads the repo's Architecture Decision Snapshot, follows the feature's Architecture Decisions, re-checks SRP and DIP during execution, and keeps external dependencies behind interfaces or abstract classes when the design requires that seam. It should not improvise a different Factory, Strategy, Observer, Adapter, or Decorator choice mid-build.
Before parsing tasks or spawning subagents, /pb-build now runs a mandatory Phase 0 validation gate against the existing markdown contract: required design sections, required Task X.Y fields, and at least one feature scenario. If any item is missing, the build stops before implementation work starts. Task closure also follows explicit state transitions, so DONE is only reachable after scenario, test, and verification evidence are satisfied.
| Skill | Trigger | Output | Description |
|---|---|---|---|
pb-init |
/pb-init |
AGENTS.md |
Audit repo and safely update/append a managed snapshot block without rewriting user-authored constraints |
pb-plan |
/pb-plan <requirement> |
specs/<YYYY-MM-DD-NO-feature-name>/design.md + tasks.md + features/*.feature |
Design proposal + Gherkin scenarios + ordered task breakdown |
pb-refine |
/pb-refine <feature> |
Revised spec files | Apply feedback or Design Change Requests |
pb-build |
/pb-build <feature-name> |
Code + tests | BDD outer loop + TDD inner loop via subagents |
pb-spec's prompt design is inspired by Anthropic's research on Effective Harnesses for Long-Running Agents. The core idea: place AI agents inside a strict, observable, recoverable execution environment — a "harness" — rather than relying on the agent's autonomous judgment alone.
| Principle | How pb-spec Implements It |
|---|---|
| State Grounding | Subagents must verify workspace state (ls, find, read_file) before writing any code — preventing path hallucination |
| Architecture Continuity | pb-init records an Architecture Decision Snapshot, pb-plan makes Architecture Decisions explicit, and pb-build verifies implementation still conforms to that contract |
| Error Quoting | Subagents must quote specific error messages before attempting fixes — preventing blind debugging |
| Context Hygiene | Orchestrator passes only minimal, relevant context to each subagent — preventing context window pollution |
| Recovery Loop | Failed tasks use pre-task snapshots + file-scoped recovery (git restore + task-local cleanup), and avoid workspace-wide restore in dirty trees |
| Verification Harness | Design docs define explicit verification commands at planning time — subagents execute, not invent, verification |
| Observability as Context | Task verification includes runtime signals (logs/health) for service-facing work, and build closure requires command-backed evidence |
| Escalation Loop | Three consecutive failures trigger task suspension + standardized DCR handoff to pb-refine |
| Agent Rules | AGENTS.md is treated as free-form policy context: pb-init manages only its marker block; pb-plan/pb-refine/pb-build read it without rewriting |
- Worker (Implementer):
implementer_prompt.mdenforces grounding-first workflow and error quoting - Architect (Planner):
design_template.md+tasks_template.mdenforce verification criteria, including runtime signals when applicable - Orchestrator (Builder):
pb-buildSKILL enforces context hygiene, runtime verification gates, bounded retries, and DCR escalation - Foundation (Init):
pb-initupdates only the managed marker block inAGENTS.md, preserving all external user-authored constraints
# Clone
git clone https://github.com/longcipher/pb-spec.git
cd pb-spec
# Install dependencies
uv sync --group dev
# Run tests
uv run pytest -v
# Install locally for testing
uv pip install -e .Apache-2.0 © 2025 Bob Liu