Write PRDs, run discovery cycles, plan launches, and facilitate strategy sessions — from your terminal.
Shipwright gives PMs a real operating system for product work: framework-backed skills, orchestrated workflows, and quality gates that produce artifacts teams can execute.
Under the hood, Shipwright includes 46 skills, 7 specialist agents, and 18 chained workflows. The counts matter less than the contract: evidence-first outputs, explicit decisions, pass/fail gating, deterministic recovery, and optional adversarial review for high-stakes artifacts.
The skills are plain markdown files, so they're compatible with any AI coding tool that reads skill files (Cursor, Codex, Gemini CLI, and others). Agents, commands, and the original /start orchestrator are Claude Code-specific. This repo also includes a Codex-native bridge via AGENTS.md so plain-language prompts in Codex can still route through Shipwright's bounded research and framework selection.
Shipwright is not "better prompting." It is a quality system around prompting.
| Dimension | Raw AI prompting | Shipwright |
|---|---|---|
| Consistency | Format shifts each run | Stable output signature via output standard |
| Decision quality | Often descriptive, not decisive | Required Decision Frame with recommendation + trade-off + owner/date |
| Evidence discipline | Easy to mix assumptions and facts | Sourced claims + explicit unknowns |
| Readiness gating | "Looks good" is subjective | Binary pass/fail gates before scoring |
| Adversarial pressure | Critique depends on the same prompt that produced the work | Optional /challenge workflow and red-team review for pressure-testing finished artifacts |
| Recovery path | Ad hoc rewrites | Deterministic recovery playbooks |
| Handoff quality | Varies by prompt quality | Repeatable workflows with role constraints and checks |
A PM wrote a PRD recommending enterprise expansion as the top priority. Before sending it to engineering, they ran /challenge to pressure-test it.
Input:
/challenge Review this PRD at Standard depth before I send it to the eng lead.
What the red-team agent found:
| Claim Challenged | Attack Vector | Severity | Why This Is Vulnerable | What Would Resolve It |
|---|---|---|---|---|
| "Enterprise is our highest-growth segment" | Evidence Integrity | Moderate | Cited market report covers the category, not this product. No enterprise-specific pipeline or win-rate data. | Cite enterprise pipeline metrics or downgrade to hypothesis. |
| "Minimal incremental engineering cost" | Structural Honesty | Critical | SSO, audit logging, and SLA requirements are listed in the appendix but not reflected in the cost estimate or timeline. | Reconcile appendix requirements with the effort estimate or scope them out explicitly. |
| "Self-serve onboarding will scale to enterprise" | Decision Courage | Moderate | The PRD hedges with "may require some customization" but doesn't commit to whether enterprise onboarding is self-serve or high-touch. | Make the call: self-serve with guardrails, or dedicated onboarding. State the trade-off. |
Verdict: DEFEND. The enterprise thesis may still be right, but the cost estimate contradicts the appendix and the growth claim lacks product-specific evidence. The PM should route findings back before treating the PRD as settled.
The PM sent findings back to the producing agent, which revised the cost section and downgraded the growth claim to a hypothesis. A second /challenge pass returned CLEAR.
Most PM work falls into one of three patterns. If you're unsure where to begin, pick the path that matches this week's job.
/discover → /write-prd → /tech-handoff
Start with customer evidence, convert it into a structured PRD, then generate the engineering handoff package. You end with: discovery report, Working Backwards PRD, tech spec, design review, epics, and stories. Typical effort: a few focused sessions.
/customer-review → /strategy → /okrs
Synthesize customer signals, set strategic bets and boundaries, then draft and audit OKRs against those bets. You end with: customer intelligence report, strategy doc with kill criteria, and audited OKRs. Typical effort: a few focused sessions.
/strategy → /plan-launch → /sprint
Lock positioning, build the GTM launch plan, then turn it into execution-ready sprint scope. You end with: strategy doc, GTM launch plan, and sprint plan with stories. Typical effort: a few focused sessions.
Each path chains 3 workflows; run them in separate sessions or back-to-back. For full path details, see the workflows guide.
Option A: Plugin install (recommended)
claude plugin marketplace add EdgeCaser/shipwright
claude plugin install shipwright@shipwrightOption B: Script install (recommended for manual)
git clone https://github.com/EdgeCaser/shipwright.git
bash shipwright/scripts/sync.sh --install your-project/This copies all skills, agents, commands, docs, and evals into your-project/.claude/, and drops a shipwright-sync.sh script you can re-run later to pull updates.
Option C: Manual install
git clone https://github.com/EdgeCaser/shipwright.git
cp -r shipwright/skills/ your-project/.claude/skills/
cp -r shipwright/agents/ your-project/.claude/agents/
cp -r shipwright/commands/ your-project/.claude/commands/
mkdir -p your-project/.claude/scripts/
cp -r shipwright/scripts/ your-project/.claude/scripts/Using a different tool? See the cross-tool install guide.
If you are running directly from this repo in Codex, you do not need slash commands. The project-level AGENTS.md tells Codex to treat plain-language PM prompts as Shipwright work and to use the local research collector before broad interactive browsing when it is available.
cp shipwright/examples/CLAUDE.md.example your-project/CLAUDE.md
# Fill in your product name, personas, metrics, and priorities — even rough answers helpThen open Claude Code in your project and run:
/start I'm a PM at [company] working on [brief context]
That's it. The orchestrator reads your CLAUDE.md, picks up your context, and routes you to the right workflow. If you skip the CLAUDE.md, Shipwright still works — but outputs will be generic instead of tailored to your product.
You can also run workflows directly:
/discover /write-prd /plan-launch /strategy /sprint /okrs /challenge /status /quality-check
For the full workflow list and behavior, see using workflows.
When you already know the job to be done, run the workflow directly instead of routing through /start. For example, use /competitive for competitive analysis or /pricing for pricing work.
When a task needs fresh public research, keep the first pass narrow:
- do market sizing first, then positioning
- do competitive landscape first, then battlecards
- ask for findings inline before asking for a polished memo or saved file
This keeps web-heavy work bounded and reduces timeout risk on broad requests.
If you want to reduce search latency further without changing the conversational UX, Shipwright also includes scripts/collect-research.mjs, which can build a compact evidence pack from programmatic web search before the model synthesizes it. The helper now escalates automatically from a small first pass to broader subqueries, caches fresh evidence packs under .shipwright/cache/research/v1/ for 24 hours by default, and only asks the model to browse interactively for the remaining gaps. It also emits a facts.json sidecar with deterministic pricing, review, product, date, and package-registry facts, including adapter-backed metadata from npm, PyPI, and crates.io when available. If no Brave or Tavily key is configured, it still degrades gracefully by writing a needs-interactive-followup pack instead of failing hard, and those no-provider fallback packs are not cached. To clear the local cache manually, run node scripts/collect-research.mjs --clear-cache.
You can use any skill directly without workflows, agents, or orchestrator:
Read skills/execution/prd-development/SKILL.md and write a PRD for [feature].
Use standalone mode for one framework and one question. Move up to workflows when you need repeatable, multi-step output quality.
Want proof before adoption? Start here:
- Case studies for real-world proof points from production use
- Golden outputs for side-by-side baseline vs Shipwright comparisons
- Pass/fail gates for binary readiness checks
- Eval rubrics for scored quality dimensions
- Adversarial review rubric for calibrating Challenge Reports
- Failure modes and recovery playbooks for deterministic fixes
Every Shipwright artifact closes with the same three blocks. Here's a real example from a competitive brief:
## Decision Frame
Recommendation: Lead the first discovery call with revenue cycle friction (documentation
accuracy, prior auth denial rate) before surfacing automation capabilities. Do not open with
technology.
Trade-off: A slower first meeting vs. a pitch that lands before the client has confirmed the pain.
Confidence: High — revenue impact is quantifiable from published industry benchmarks, and
competitor capability gap is sourced from press releases and analyst reports.
Decision owner/date: PM (2026-03-15). Revisit after first discovery call.
## Unknowns and Evidence Gaps
- EHR platform(s) in use — determines integration path
- Payer mix breakdown — affects whether the documented revenue gap is material at this client's scale
- Whether any value-based contracts are already in place — changes the urgency framing
## Pass/Fail Readiness
PASS — competitive claims are sourced, revenue impact is quantified, discovery entry points are
ranked by evidence quality, and unknowns are listed with resolution path (first call).
FAIL condition: if competitive capability claims are taken from positioning pages only with no
outcome data, or if revenue impact has no source.If you installed with scripts/sync.sh --install, your project has a shipwright-sync.sh script. After pulling new changes in the Shipwright repo, run it from your project directory:
bash shipwright-sync.sh # interactive — shows what changed, asks before updating
bash shipwright-sync.sh --yes # auto-update everything without promptingThe sync script compares every file against the Shipwright source and reports what's changed, what's new, and what's been removed. You can update all at once or file-by-file with diffs.
Shipwright includes an optional local Slack integration in slack-agent/. It lets you @mention a bot in Slack, route the message into Claude Code running on your machine, and post the reply back into the same thread.
Current behavior:
- runs locally through Slack Socket Mode, so no public webhook is required
- keeps Claude session continuity per Slack thread
- supports strict commands like
question:andstatus: - supports thread-scoped listening mode with
listen on/listen off - uses the project directory you configure via
PROJECT_CWD
Important warning:
- this Slack agent is for personal use only
- it is not intended to be a shared Claude gateway for teammates
- it should be limited to allowlisted users and channels
READ_ONLY_MODE=trueis the recommended default
If you want a team-facing Slack product, use a proper API-backed architecture instead of routing requests through a local authenticated Claude Code session.
Setup, configuration, safety guidance, and supported commands are documented in slack-agent/README.md.
- Workflows guide: all commands, orchestration model, common paths
- Output standard: required sections, signature rules, decision framing
- Composition model: how skills, workflows, and agents compose
- AI vs non-AI guide: what to automate deterministically, what to keep agentic, and where to invest next
- Cross-tool install: Cursor, Codex, Gemini CLI, others
- Tool connections: MCP setup and integration patterns
- Skills catalog and Agents: source of truth for all components
PRs are welcome. Before opening one, run:
./scripts/validate.shSee CONTRIBUTING.md for skill/workflow/rubric requirements and submission checklist.
Built on ideas from PM practitioners and the AI coding agent community:
- Pawel Huryn's PM Skills Marketplace
- Dean Peters' Product-Manager-Skills
- Sachin Rekhi's Claude Code for PMs
- prodmgmt.world
- ccforpms.com
- VoltAgent's awesome-claude-code-subagents
MIT