From d5f0d016aff065ead0401b10d35b29cd29378168 Mon Sep 17 00:00:00 2001 From: Jack Felke Date: Wed, 18 Mar 2026 13:41:13 -0700 Subject: [PATCH 1/5] docs: add example CLAUDE.md for automatic preflight integration Adds a ready-to-use CLAUDE.md template that makes Claude Code automatically run preflight_check on prompts. Users can copy it into their project to get preflight working without manual tool calls. Referenced from Quick Start in README and examples/README. --- README.md | 10 ++++++++++ examples/CLAUDE.md | 30 ++++++++++++++++++++++++++++++ examples/README.md | 16 ++++++++++++++++ 3 files changed, 56 insertions(+) create mode 100644 examples/CLAUDE.md diff --git a/README.md b/README.md index 6d03f5d..7969d58 100644 --- a/README.md +++ b/README.md @@ -124,6 +124,16 @@ claude mcp add preflight -- preflight-dev-serve > **Note:** `preflight-dev` runs the interactive setup wizard. `preflight-dev-serve` starts the MCP server — that's what you want in your Claude Code config. +### Make Claude use preflight automatically + +Add preflight rules to your project's `CLAUDE.md` so Claude runs `preflight_check` on every prompt without you asking: + +```bash +cp /path/to/preflight/examples/CLAUDE.md your-project/CLAUDE.md +``` + +See [`examples/CLAUDE.md`](examples/CLAUDE.md) for a ready-to-use template with recommended rules for when to preflight, session hygiene, and skip-lists. + --- ## How It Works diff --git a/examples/CLAUDE.md b/examples/CLAUDE.md new file mode 100644 index 0000000..da55f40 --- /dev/null +++ b/examples/CLAUDE.md @@ -0,0 +1,30 @@ +# CLAUDE.md — Preflight Integration Example +# +# Drop this into your project's CLAUDE.md (or .claude/CLAUDE.md) to make +# Claude Code automatically use preflight tools during your sessions. +# Customize the rules below to match your workflow. + +## Preflight Rules + +Before starting any non-trivial task, run `preflight_check` with my prompt. This catches vague instructions before they waste tokens on wrong→fix cycles. + +### When to use preflight tools: + +- **Every prompt**: `preflight_check` triages automatically — let it decide what's needed +- **Before multi-file changes**: Run `scope_work` to get a phased plan +- **Before sub-agent tasks**: Use `enrich_agent_task` to add context +- **After making a mistake**: Use `log_correction` so preflight learns the pattern +- **Before ending a session**: Run `checkpoint` to save state for next time +- **When I say "fix it" or "do the others"**: Use `sharpen_followup` to resolve what I actually mean + +### Session hygiene: + +- Run `check_session_health` if we've been going for a while without committing +- If I ask about something we did before, use `search_history` to find it +- Before declaring a task done, run `verify_completion` (type check + tests) + +### Don't preflight these: + +- Simple git commands (commit, push, status) +- Formatting / linting +- Reading files I explicitly named diff --git a/examples/README.md b/examples/README.md index 778f15d..f2fafc1 100644 --- a/examples/README.md +++ b/examples/README.md @@ -12,6 +12,22 @@ The `.preflight/` directory contains example configuration files you can copy in └── api.yml # Manual contract definitions for cross-service types ``` +## `CLAUDE.md` Integration + +The `CLAUDE.md` file tells Claude Code how to behave in your project. Adding preflight rules here makes Claude automatically use preflight tools without you having to ask. + +```bash +# Copy the example into your project: +cp /path/to/preflight/examples/CLAUDE.md my-project/CLAUDE.md + +# Or append to your existing CLAUDE.md: +cat /path/to/preflight/examples/CLAUDE.md >> my-project/CLAUDE.md +``` + +This is the **recommended way** to integrate preflight — once it's in your `CLAUDE.md`, every session automatically runs `preflight_check` on your prompts. + +--- + ### Quick setup ```bash From c17f46344bf005464e2bd65b1f1631852a51e069 Mon Sep 17 00:00:00 2001 From: Jack Felke Date: Thu, 19 Mar 2026 08:20:24 -0700 Subject: [PATCH 2/5] feat(cli): add --help and --version flags, fix Node badge to 20+ - CLI now responds to --help/-h with usage info, profiles, and links - CLI now responds to --version/-v with package version - Previously, any flag just launched the interactive wizard - Fixed README badge from Node 18+ to Node 20+ (matches engines field) --- README.md | 2 +- src/cli/init.ts | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7969d58..e7a385f 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ A 24-tool MCP server for Claude Code that catches ambiguous instructions before [![MCP](https://img.shields.io/badge/MCP-Compatible-blueviolet)](https://modelcontextprotocol.io/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE) [![npm](https://img.shields.io/npm/v/preflight-dev)](https://www.npmjs.com/package/preflight-dev) -[![Node 18+](https://img.shields.io/badge/node-18%2B-brightgreen?logo=node.js&logoColor=white)](https://nodejs.org/) +[![Node 20+](https://img.shields.io/badge/node-20%2B-brightgreen?logo=node.js&logoColor=white)](https://nodejs.org/) [Quick Start](#quick-start) · [How It Works](#how-it-works) · [Tool Reference](#tool-reference) · [Configuration](#configuration) · [Scoring](#the-12-category-scorecard) diff --git a/src/cli/init.ts b/src/cli/init.ts index dfaaa25..d1b0021 100644 --- a/src/cli/init.ts +++ b/src/cli/init.ts @@ -9,6 +9,46 @@ import { join, dirname } from "node:path"; import { existsSync } from "node:fs"; import { fileURLToPath } from "node:url"; +// Handle --help and --version before launching interactive wizard +const args = process.argv.slice(2); + +if (args.includes("--help") || args.includes("-h")) { + console.log(` +✈️ preflight-dev — MCP server for Claude Code prompt discipline + +Usage: + preflight-dev Interactive setup wizard (creates .mcp.json) + preflight-dev --help Show this help message + preflight-dev --version Show version + +The wizard will: + 1. Ask you to choose a profile (minimal / standard / full) + 2. Optionally create a .preflight/ config directory + 3. Write an .mcp.json so Claude Code auto-connects to preflight + +After setup, restart Claude Code and preflight tools will appear. + +Profiles: + minimal 4 tools — clarify_intent, check_session_health, session_stats, prompt_score + standard 16 tools — all prompt discipline + session_stats + prompt_score + full 20 tools — everything + timeline/vector search (needs LanceDB) + +More info: https://github.com/TerminalGravity/preflight +`); + process.exit(0); +} + +if (args.includes("--version") || args.includes("-v")) { + const pkgPath = join(dirname(fileURLToPath(import.meta.url)), "../../package.json"); + try { + const pkg = JSON.parse(await readFile(pkgPath, "utf-8")); + console.log(`preflight-dev v${pkg.version}`); + } catch { + console.log("preflight-dev (version unknown)"); + } + process.exit(0); +} + const rl = createInterface({ input: process.stdin, output: process.stdout }); function ask(question: string): Promise { From 6cc729459e78dc6c30c084d83895fe4f35c42cb6 Mon Sep 17 00:00:00 2001 From: Jack Felke Date: Thu, 19 Mar 2026 11:17:06 -0700 Subject: [PATCH 3/5] feat: add export_report tool for markdown session reports (#5) Adds a new export_report MCP tool that generates markdown reports from timeline data. Includes: - Daily activity breakdown with sparkline bars - Prompt quality trend analysis (correction rate over time) - Commit summary list - Configurable period (day/week/month) and scope - Recommendations when correction rate is high Also adds 4 tests for the new tool. Closes #5 --- src/index.ts | 2 + src/tools/export-report.ts | 330 ++++++++++++++++++++++++++++++ tests/tools/export-report.test.ts | 89 ++++++++ 3 files changed, 421 insertions(+) create mode 100644 src/tools/export-report.ts create mode 100644 tests/tools/export-report.test.ts diff --git a/src/index.ts b/src/index.ts index e7e9d00..1528c44 100644 --- a/src/index.ts +++ b/src/index.ts @@ -49,6 +49,7 @@ import { registerScanSessions } from "./tools/scan-sessions.js"; import { registerGenerateScorecard } from "./tools/generate-scorecard.js"; import { registerSearchContracts } from "./tools/search-contracts.js"; import { registerEstimateCost } from "./tools/estimate-cost.js"; +import { registerExportReport } from "./tools/export-report.js"; // Validate related projects from config function validateRelatedProjects(): void { @@ -110,6 +111,7 @@ const toolRegistry: Array<[string, RegisterFn]> = [ ["generate_scorecard", registerGenerateScorecard], ["estimate_cost", registerEstimateCost], ["search_contracts", registerSearchContracts], + ["export_report", registerExportReport], ]; let registered = 0; diff --git a/src/tools/export-report.ts b/src/tools/export-report.ts new file mode 100644 index 0000000..dd2c121 --- /dev/null +++ b/src/tools/export-report.ts @@ -0,0 +1,330 @@ +// ============================================================================= +// export_report — Generate markdown session reports from timeline data +// Closes #5: Export timeline to markdown/PDF reports +// ============================================================================= + +import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; +import { z } from "zod"; +import { getTimeline, listIndexedProjects } from "../lib/timeline-db.js"; +import { getRelatedProjects } from "../lib/config.js"; +import type { SearchScope } from "../types.js"; + +// ── Helpers ───────────────────────────────────────────────────────────────── + +function parseRelativeDate(input: string): string { + const match = input.match(/^(\d+)(days?|weeks?|months?)$/); + if (!match) return input; + const [, numStr, unit] = match; + const num = parseInt(numStr, 10); + const d = new Date(); + if (unit.startsWith("day")) d.setDate(d.getDate() - num); + else if (unit.startsWith("week")) d.setDate(d.getDate() - num * 7); + else if (unit.startsWith("month")) d.setMonth(d.getMonth() - num); + return d.toISOString(); +} + +async function getSearchProjects(scope: SearchScope): Promise { + const currentProject = process.env.CLAUDE_PROJECT_DIR; + switch (scope) { + case "current": + return currentProject ? [currentProject] : []; + case "related": { + const related = getRelatedProjects(); + return currentProject ? [currentProject, ...related] : related; + } + case "all": { + const projects = await listIndexedProjects(); + return projects.map((p) => p.project); + } + default: + return currentProject ? [currentProject] : []; + } +} + +interface DayStats { + prompts: number; + assistantMessages: number; + toolCalls: number; + corrections: number; + commits: number; + errors: number; + compactions: number; + subAgentSpawns: number; +} + +function emptyDayStats(): DayStats { + return { + prompts: 0, + assistantMessages: 0, + toolCalls: 0, + corrections: 0, + commits: 0, + errors: 0, + compactions: 0, + subAgentSpawns: 0, + }; +} + +function eventTypeToDayStat(type: string): keyof DayStats | null { + switch (type) { + case "prompt": return "prompts"; + case "assistant": return "assistantMessages"; + case "tool_call": return "toolCalls"; + case "correction": return "corrections"; + case "commit": return "commits"; + case "error": return "errors"; + case "compaction": return "compactions"; + case "sub_agent_spawn": return "subAgentSpawns"; + default: return null; + } +} + +function correctionRate(stats: DayStats): number { + return stats.prompts > 0 ? (stats.corrections / stats.prompts) * 100 : 0; +} + +function sparkbar(value: number, max: number, width = 20): string { + if (max === 0) return "░".repeat(width); + const filled = Math.round((value / max) * width); + return "█".repeat(Math.min(filled, width)) + "░".repeat(Math.max(width - filled, 0)); +} + +// ── Registration ──────────────────────────────────────────────────────────── + +export function registerExportReport(server: McpServer): void { + server.tool( + "export_report", + "Generate a markdown session report from timeline data. Includes daily activity breakdown, prompt quality trends, correction rates, and commit summaries. Great for weekly standups and retrospectives.", + { + scope: z + .enum(["current", "related", "all"]) + .default("current") + .describe("Search scope"), + period: z + .enum(["day", "week", "month"]) + .default("week") + .describe("Report period"), + since: z + .string() + .optional() + .describe("Start date (ISO or relative like '7days', '2weeks')"), + until: z + .string() + .optional() + .describe("End date (ISO or relative)"), + project: z + .string() + .optional() + .describe("Filter to specific project (overrides scope)"), + }, + async (params) => { + // Determine date range + let sinceDate: string; + let untilDate: string | undefined; + + if (params.since) { + sinceDate = parseRelativeDate(params.since); + } else { + // Default based on period + const d = new Date(); + switch (params.period) { + case "day": + d.setDate(d.getDate() - 1); + break; + case "week": + d.setDate(d.getDate() - 7); + break; + case "month": + d.setMonth(d.getMonth() - 1); + break; + } + sinceDate = d.toISOString(); + } + if (params.until) { + untilDate = parseRelativeDate(params.until); + } + + // Get projects + let projectDirs: string[]; + if (params.project) { + projectDirs = [params.project]; + } else { + projectDirs = await getSearchProjects(params.scope); + } + + if (projectDirs.length === 0) { + return { + content: [ + { + type: "text" as const, + text: `No projects found for scope "${params.scope}". Set CLAUDE_PROJECT_DIR or onboard a project first.`, + }, + ], + }; + } + + // Fetch all events + const events = await getTimeline({ + project_dirs: projectDirs, + since: sinceDate, + until: untilDate, + limit: 5000, + offset: 0, + }); + + if (events.length === 0) { + return { + content: [ + { + type: "text" as const, + text: `# Session Report\n\n_No events found for the given period._`, + }, + ], + }; + } + + // Aggregate by day + const dayMap = new Map(); + const commitMessages: { day: string; hash: string; message: string }[] = []; + + for (const event of events) { + const day = event.timestamp + ? new Date(event.timestamp).toISOString().slice(0, 10) + : "unknown"; + if (!dayMap.has(day)) dayMap.set(day, emptyDayStats()); + const stats = dayMap.get(day)!; + const key = eventTypeToDayStat(event.type); + if (key) stats[key]++; + + if (event.type === "commit") { + const ev = event as any; + const hash = ev.commit_hash + ? String(ev.commit_hash).slice(0, 7) + : ""; + const message = ( + ev.content || + ev.summary || + "" + ) + .slice(0, 80) + .replace(/\n/g, " "); + commitMessages.push({ day, hash, message }); + } + } + + const sortedDays = [...dayMap.keys()].sort(); + const totalStats = emptyDayStats(); + for (const stats of dayMap.values()) { + for (const key of Object.keys(totalStats) as (keyof DayStats)[]) { + totalStats[key] += stats[key]; + } + } + + // Build report + const lines: string[] = []; + const startDay = sortedDays[0]; + const endDay = sortedDays[sortedDays.length - 1]; + const projLabel = params.project || `${projectDirs.length} project(s)`; + + lines.push(`# Session Report: ${startDay} → ${endDay}`); + lines.push(`_${projLabel} | ${events.length} events | ${sortedDays.length} active days_`); + lines.push(""); + + // Summary + lines.push("## Summary"); + lines.push(""); + lines.push(`| Metric | Count |`); + lines.push(`|--------|-------|`); + lines.push(`| Prompts | ${totalStats.prompts} |`); + lines.push(`| Tool calls | ${totalStats.toolCalls} |`); + lines.push(`| Commits | ${totalStats.commits} |`); + lines.push(`| Corrections | ${totalStats.corrections} |`); + lines.push(`| Errors | ${totalStats.errors} |`); + lines.push(`| Compactions | ${totalStats.compactions} |`); + lines.push(`| Sub-agent spawns | ${totalStats.subAgentSpawns} |`); + const overallCorrRate = correctionRate(totalStats); + lines.push( + `| Correction rate | ${overallCorrRate.toFixed(1)}% ${overallCorrRate > 20 ? "⚠️" : overallCorrRate > 10 ? "🟡" : "🟢"} |`, + ); + lines.push(""); + + // Daily breakdown + lines.push("## Daily Activity"); + lines.push(""); + const maxPrompts = Math.max(...[...dayMap.values()].map((s) => s.prompts)); + for (const day of sortedDays) { + const s = dayMap.get(day)!; + const bar = sparkbar(s.prompts, maxPrompts, 15); + const cr = correctionRate(s); + const crFlag = cr > 20 ? " ⚠️" : ""; + lines.push( + `- **${day}** ${bar} ${s.prompts}p / ${s.toolCalls}t / ${s.commits}c / ${s.corrections}err${crFlag}`, + ); + } + lines.push(""); + + // Prompt quality trend + if (sortedDays.length >= 2) { + lines.push("## Prompt Quality Trend"); + lines.push(""); + const firstHalf = sortedDays.slice(0, Math.floor(sortedDays.length / 2)); + const secondHalf = sortedDays.slice(Math.floor(sortedDays.length / 2)); + + const halfStats = (days: string[]) => { + const s = emptyDayStats(); + for (const d of days) { + const ds = dayMap.get(d)!; + for (const k of Object.keys(s) as (keyof DayStats)[]) s[k] += ds[k]; + } + return s; + }; + + const first = halfStats(firstHalf); + const second = halfStats(secondHalf); + const cr1 = correctionRate(first); + const cr2 = correctionRate(second); + const trend = cr2 < cr1 ? "📈 Improving" : cr2 > cr1 ? "📉 Declining" : "➡️ Stable"; + + lines.push( + `First half correction rate: ${cr1.toFixed(1)}% → Second half: ${cr2.toFixed(1)}% — ${trend}`, + ); + lines.push(""); + } + + // Recent commits + if (commitMessages.length > 0) { + lines.push("## Commits"); + lines.push(""); + const recentCommits = commitMessages.slice(-20); + for (const c of recentCommits) { + lines.push(`- \`${c.hash}\` ${c.message} _(${c.day})_`); + } + lines.push(""); + } + + // Tips + if (overallCorrRate > 15) { + lines.push("## 💡 Recommendations"); + lines.push(""); + lines.push( + "- High correction rate detected. Consider using `preflight_check` before complex prompts.", + ); + lines.push( + "- Use `clarify_intent` when requirements are ambiguous.", + ); + lines.push( + "- Break large tasks into smaller, more specific prompts.", + ); + lines.push(""); + } + + lines.push( + "_Report generated by preflight `export_report` tool._", + ); + + return { + content: [{ type: "text" as const, text: lines.join("\n") }], + }; + }, + ); +} diff --git a/tests/tools/export-report.test.ts b/tests/tools/export-report.test.ts new file mode 100644 index 0000000..ac8c28e --- /dev/null +++ b/tests/tools/export-report.test.ts @@ -0,0 +1,89 @@ +import { describe, it, expect, vi, beforeEach } from "vitest"; + +// Mock timeline-db before importing the module +vi.mock("../../src/lib/timeline-db.js", () => ({ + getTimeline: vi.fn(), + listIndexedProjects: vi.fn().mockResolvedValue([]), +})); + +vi.mock("../../src/lib/config.js", () => ({ + getRelatedProjects: vi.fn().mockReturnValue([]), +})); + +import { getTimeline } from "../../src/lib/timeline-db.js"; + +// Helper: create a minimal MCP server mock that captures tool registrations +function createMockServer() { + const tools: Record = {}; + return { + tool(name: string, description: string, _schema: any, handler: Function) { + tools[name] = { handler, description }; + }, + tools, + }; +} + +describe("export_report", () => { + let server: ReturnType; + + beforeEach(async () => { + vi.clearAllMocks(); + server = createMockServer(); + // Dynamic import to get the register function + const mod = await import("../../src/tools/export-report.js"); + mod.registerExportReport(server as any); + }); + + it("registers the export_report tool", () => { + expect(server.tools["export_report"]).toBeDefined(); + expect(server.tools["export_report"].description).toContain("markdown"); + }); + + it("returns empty message when no events", async () => { + vi.mocked(getTimeline).mockResolvedValue([]); + process.env.CLAUDE_PROJECT_DIR = "/test/project"; + + const result = await server.tools["export_report"].handler({ + scope: "current", + period: "week", + }); + + expect(result.content[0].text).toContain("No events found"); + }); + + it("generates report with daily breakdown", async () => { + vi.mocked(getTimeline).mockResolvedValue([ + { timestamp: "2026-03-15T10:00:00Z", type: "prompt", content: "fix bug", project: "/test", branch: "main", session_id: "s1", source_file: "f1", source_line: 1, metadata: "{}" }, + { timestamp: "2026-03-15T10:05:00Z", type: "assistant", content: "done", project: "/test", branch: "main", session_id: "s1", source_file: "f1", source_line: 2, metadata: "{}" }, + { timestamp: "2026-03-15T10:10:00Z", type: "commit", content: "fix: resolve bug", project: "/test", branch: "main", session_id: "s1", source_file: "f1", source_line: 3, metadata: "{}" }, + { timestamp: "2026-03-16T09:00:00Z", type: "prompt", content: "add test", project: "/test", branch: "main", session_id: "s1", source_file: "f1", source_line: 4, metadata: "{}" }, + { timestamp: "2026-03-16T09:05:00Z", type: "correction", content: "wrong approach", project: "/test", branch: "main", session_id: "s1", source_file: "f1", source_line: 5, metadata: "{}" }, + ] as any); + process.env.CLAUDE_PROJECT_DIR = "/test"; + + const result = await server.tools["export_report"].handler({ + scope: "current", + period: "week", + }); + + const text = result.content[0].text; + expect(text).toContain("# Session Report"); + expect(text).toContain("2026-03-15"); + expect(text).toContain("2026-03-16"); + expect(text).toContain("## Summary"); + expect(text).toContain("## Daily Activity"); + expect(text).toContain("## Commits"); + expect(text).toContain("fix: resolve bug"); + }); + + it("shows no-project message when scope has no projects", async () => { + delete process.env.CLAUDE_PROJECT_DIR; + + const result = await server.tools["export_report"].handler({ + scope: "current", + period: "week", + }); + + expect(result.content[0].text).toContain("No projects found"); + }); +}); From 486b958c57cbcb35cb3b9b60799f12ec85e678e9 Mon Sep 17 00:00:00 2001 From: Jack Felke Date: Thu, 19 Mar 2026 11:21:26 -0700 Subject: [PATCH 4/5] Add examples/.preflight/ starter config directory MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The README references examples/.preflight/ but it didn't exist. Adds well-commented starter configs: - config.yml — profile, related projects, thresholds, embeddings - triage.yml — triage rules with domain-specific keyword examples - contracts/api.yml — sample manual contract definitions - README.md — quick setup guide --- examples/.preflight/README.md | 31 ++++++++++++ examples/.preflight/config.yml | 64 ++++++++++++++---------- examples/.preflight/contracts/api.yml | 70 +++++++++++++-------------- examples/.preflight/triage.yml | 43 ++++++++-------- 4 files changed, 129 insertions(+), 79 deletions(-) create mode 100644 examples/.preflight/README.md diff --git a/examples/.preflight/README.md b/examples/.preflight/README.md new file mode 100644 index 0000000..c1e5400 --- /dev/null +++ b/examples/.preflight/README.md @@ -0,0 +1,31 @@ +# `.preflight/` Config Directory + +Drop this directory into your project root to configure preflight for your team. + +## Quick Setup + +```bash +# From the preflight repo: +cp -r examples/.preflight /path/to/your/project/ + +# Or from your project: +cp -r /path/to/preflight/examples/.preflight . +``` + +## Files + +| File | Purpose | +|------|---------| +| `config.yml` | Main config — profile, related projects, thresholds, embeddings | +| `triage.yml` | Triage rules — which prompts to check, skip, or flag as cross-service | +| `contracts/*.yml` | Manual contract definitions — types, interfaces, routes | + +## What to customize first + +1. **`config.yml`** — Add your `related_projects` if you work across multiple repos +2. **`triage.yml`** — Add domain-specific keywords to `always_check` (your project's tricky terms) +3. **`contracts/`** — Define shared types that preflight should know about + +## Commit it + +These files are meant to be committed to your repo so the whole team shares the same preflight config. No secrets in here — API keys go in env vars. diff --git a/examples/.preflight/config.yml b/examples/.preflight/config.yml index f59170f..2cc5d96 100644 --- a/examples/.preflight/config.yml +++ b/examples/.preflight/config.yml @@ -1,35 +1,49 @@ # .preflight/config.yml — Drop this in your project root +# Every field is optional. Defaults are sensible — only override what you need. # -# This is an example config for a typical Next.js + microservices setup. -# Every field is optional — preflight works with sensible defaults out of the box. -# Commit this to your repo so the whole team gets the same preflight behavior. +# Copy this entire .preflight/ directory into your project: +# cp -r examples/.preflight /path/to/your/project/ -# Profile controls how much detail preflight returns. -# "minimal" — only flags ambiguous+ prompts, skips clarification detail -# "standard" — balanced (default) -# "full" — maximum detail on every non-trivial prompt +# ────────────────────────────────────────────────────────── +# Profile: controls how verbose preflight's feedback is +# ────────────────────────────────────────────────────────── +# "minimal" — only flags ambiguous+ prompts, skips detail +# "standard" — balanced (default) +# "full" — maximum detail on every non-trivial prompt profile: standard -# Related projects for cross-service awareness. -# Preflight will search these for shared types, routes, and contracts -# so it can warn you when a change might break a consumer. -related_projects: - - path: /Users/you/code/auth-service - alias: auth - - path: /Users/you/code/billing-api - alias: billing - - path: /Users/you/code/shared-types - alias: types +# ────────────────────────────────────────────────────────── +# Related projects for cross-service awareness +# ────────────────────────────────────────────────────────── +# When you reference code that lives in another repo (e.g. a shared auth +# service), preflight can search that project's types, routes, and schemas +# to give you relevant context automatically. +# +# related_projects: +# - path: /Users/you/code/auth-service +# alias: auth +# - path: /Users/you/code/shared-types +# alias: types -# Behavioral thresholds — tune these to your workflow +# ────────────────────────────────────────────────────────── +# Behavioral thresholds +# ────────────────────────────────────────────────────────── thresholds: - session_stale_minutes: 30 # Warn if no activity for this long - max_tool_calls_before_checkpoint: 100 # Suggest a checkpoint after N tool calls - correction_pattern_threshold: 3 # Min corrections before flagging a pattern + # Warn if no session activity for this many minutes + session_stale_minutes: 30 + + # Suggest a checkpoint after this many tool calls + max_tool_calls_before_checkpoint: 100 + + # Minimum correction count before preflight detects a pattern + # (e.g. "you've corrected JWT handling 3 times — adding a reminder") + correction_pattern_threshold: 3 -# Embedding provider for semantic search over session history. -# "local" uses Xenova transformers (no API key needed, runs on CPU). -# "openai" uses text-embedding-3-small (faster, needs OPENAI_API_KEY). +# ────────────────────────────────────────────────────────── +# Embedding configuration (for semantic search over session history) +# ────────────────────────────────────────────────────────── +# "local" uses Xenova transformers — no API key needed, runs on-device. +# "openai" uses OpenAI embeddings — better quality, requires API key. embeddings: provider: local - # openai_api_key: sk-... # Uncomment if using openai provider + # openai_api_key: sk-... # Uncomment if using openai provider diff --git a/examples/.preflight/contracts/api.yml b/examples/.preflight/contracts/api.yml index 512543f..53e1f42 100644 --- a/examples/.preflight/contracts/api.yml +++ b/examples/.preflight/contracts/api.yml @@ -1,17 +1,16 @@ # .preflight/contracts/api.yml — Manual contract definitions # -# Define shared types and interfaces that preflight should know about. # These supplement auto-extracted contracts from your codebase. -# Manual definitions win on name conflicts with auto-extracted ones. +# Use them for: +# - API contracts that aren't easily auto-detected +# - Shared types across services that need explicit documentation +# - Domain models that drive business logic # -# Why manual contracts? -# - Document cross-service interfaces that live in docs, not code -# - Define contracts for external APIs your services consume -# - Pin down types that are implicit (e.g., event payloads) +# Manual definitions win on name conflicts with auto-extracted ones. - name: User kind: interface - description: Core user model shared across all services + description: Core user model — referenced across auth, billing, and API layers fields: - name: id type: string @@ -19,40 +18,41 @@ - name: email type: string required: true - - name: tier - type: "'free' | 'pro' | 'enterprise'" + - name: role + type: "'admin' | 'member' | 'viewer'" required: true - name: createdAt type: Date required: true -- name: AuthToken - kind: interface - description: JWT payload structure from auth-service - fields: - - name: userId - type: string - required: true - - name: permissions - type: string[] - required: true - - name: expiresAt - type: number - required: true - -- name: WebhookPayload +- name: ApiResponse kind: interface - description: Standard webhook envelope for inter-service events + description: Standard API response wrapper used by all endpoints fields: - - name: event - type: string - required: true - - name: timestamp - type: string + - name: success + type: boolean required: true - name: data - type: Record - required: true - - name: source - type: string - required: true + type: T + required: false + - name: error + type: "{ code: string; message: string }" + required: false + +# Add your own contracts below. Examples: +# +# - name: OrderStatus +# kind: enum +# description: Order lifecycle states +# values: [pending, confirmed, shipped, delivered, cancelled] +# +# - name: POST /api/webhooks +# kind: route +# description: Incoming webhook handler +# fields: +# - name: event +# type: string +# required: true +# - name: payload +# type: Record +# required: true diff --git a/examples/.preflight/triage.yml b/examples/.preflight/triage.yml index b3d394e..1300182 100644 --- a/examples/.preflight/triage.yml +++ b/examples/.preflight/triage.yml @@ -1,45 +1,50 @@ -# .preflight/triage.yml — Controls how preflight classifies your prompts +# .preflight/triage.yml — Controls the prompt triage engine # -# The triage engine routes prompts into categories: +# Triage classifies every prompt into one of four categories: # TRIVIAL → pass through (commit, format, lint) # CLEAR → well-specified, no intervention needed # AMBIGUOUS → needs clarification before proceeding -# MULTI-STEP → complex task, preflight suggests a plan -# CROSS-SERVICE → touches multiple projects, pulls in contracts +# CROSS-SERVICE → touches multiple projects, needs contract context # -# Customize the keywords below to match your domain. +# Customize the keywords below to match YOUR project's domain. rules: - # Prompts containing these words are always flagged as AMBIGUOUS. - # Add domain-specific terms that tend to produce vague prompts. + # ── Always check ────────────────────────────────────── + # Prompts containing these words are always flagged as AMBIGUOUS + # (at minimum) so preflight asks for clarification. + # Add your project's tricky/overloaded terms here. always_check: - rewards - permissions - migration - schema - - pricing # example: your billing domain - - onboarding # example: multi-step user flows + # - billing # Uncomment for fintech projects + # - deployment # Uncomment if deploys are complex - # Prompts containing these words skip checks entirely (TRIVIAL). - # These are safe, mechanical tasks that don't need guardrails. + # ── Skip (trivial) ─────────────────────────────────── + # Prompts matching these pass through without intervention. + # These are low-risk, well-understood commands. skip: - commit - format - lint - - prettier - - "git push" + - "git status" + - "run tests" - # Prompts containing these words trigger CROSS-SERVICE classification. - # Preflight will search related_projects for relevant types and routes. + # ── Cross-service keywords ─────────────────────────── + # Prompts containing these trigger cross-project search, + # pulling in types/routes/schemas from related_projects. cross_service_keywords: - auth - notification - event - webhook - - billing # matches the related_project alias + # - payment # Uncomment if payments are a separate service + # - queue # Uncomment for message queue integrations -# How aggressively to classify prompts. -# "relaxed" — more prompts pass as clear (experienced users) +# ── Strictness ──────────────────────────────────────── +# How aggressively to classify prompts: +# "relaxed" — more prompts pass as clear (fewer interruptions) # "standard" — balanced (default) -# "strict" — more prompts flagged as ambiguous (new teams, complex codebases) +# "strict" — more prompts flagged as ambiguous (safer, more verbose) strictness: standard From c32561904e397c5a62952cc449d9481a13adb111 Mon Sep 17 00:00:00 2001 From: Jack Felke Date: Thu, 19 Mar 2026 11:45:20 -0700 Subject: [PATCH 5/5] test: add tests for estimate_cost and prompt_score tools - estimate_cost: 6 tests covering file analysis, correction detection, preflight tool counting, model pricing, and error handling - prompt_score: 8 tests covering scoring dimensions (specificity, scope, actionability, done condition), grade assignment, and session tracking Increases tool test coverage from 1/25 to 3/25. --- tests/tools/estimate-cost.test.ts | 137 ++++++++++++++++++++++++++++++ tests/tools/prompt-score.test.ts | 98 +++++++++++++++++++++ 2 files changed, 235 insertions(+) create mode 100644 tests/tools/estimate-cost.test.ts create mode 100644 tests/tools/prompt-score.test.ts diff --git a/tests/tools/estimate-cost.test.ts b/tests/tools/estimate-cost.test.ts new file mode 100644 index 0000000..d189c3c --- /dev/null +++ b/tests/tools/estimate-cost.test.ts @@ -0,0 +1,137 @@ +import { describe, it, expect, vi, beforeEach } from "vitest"; +import { writeFileSync, mkdirSync, rmSync } from "node:fs"; +import { join } from "node:path"; +import { tmpdir } from "node:os"; + +// Mock session-parser +vi.mock("../../src/lib/session-parser.js", () => ({ + findSessionDirs: vi.fn().mockReturnValue([]), + findSessionFiles: vi.fn().mockReturnValue([]), +})); + +function createMockServer() { + const tools: Record = {}; + return { + tool(name: string, description: string, _schema: any, handler: Function) { + tools[name] = { handler, description }; + }, + tools, + }; +} + +describe("estimate_cost", () => { + let server: ReturnType; + let tmpDir: string; + + beforeEach(async () => { + vi.clearAllMocks(); + server = createMockServer(); + tmpDir = join(tmpdir(), `preflight-test-${Date.now()}`); + mkdirSync(tmpDir, { recursive: true }); + const mod = await import("../../src/tools/estimate-cost.js"); + mod.registerEstimateCost(server as any); + }); + + afterEach(() => { + try { rmSync(tmpDir, { recursive: true }); } catch {} + }); + + it("registers the estimate_cost tool", () => { + expect(server.tools["estimate_cost"]).toBeDefined(); + expect(server.tools["estimate_cost"].description).toContain("cost"); + }); + + it("reports file not found for bad path", async () => { + const result = await server.tools["estimate_cost"].handler({ + session_dir: "/nonexistent/path.jsonl", + }); + expect(result.content[0].text).toContain("not found"); + }); + + it("analyzes a simple session file", async () => { + const sessionFile = join(tmpDir, "session.jsonl"); + const lines = [ + JSON.stringify({ type: "user", timestamp: "2026-03-15T10:00:00Z", message: { content: "fix the bug in auth.ts" } }), + JSON.stringify({ type: "assistant", timestamp: "2026-03-15T10:01:00Z", message: { content: [{ type: "text", text: "I'll fix that bug now." }] } }), + JSON.stringify({ type: "user", timestamp: "2026-03-15T10:02:00Z", message: { content: "no, wrong file" } }), + JSON.stringify({ type: "assistant", timestamp: "2026-03-15T10:03:00Z", message: { content: [{ type: "text", text: "Sorry, let me check the right file." }] } }), + ]; + writeFileSync(sessionFile, lines.join("\n")); + + const result = await server.tools["estimate_cost"].handler({ + session_dir: sessionFile, + model: "claude-sonnet-4", + }); + + const text = result.content[0].text; + expect(text).toContain("Session Cost Estimate"); + expect(text).toContain("2 prompts"); + expect(text).toContain("Corrections detected: 1"); + }); + + it("detects zero corrections in clean session", async () => { + const sessionFile = join(tmpDir, "clean.jsonl"); + const lines = [ + JSON.stringify({ type: "user", timestamp: "2026-03-15T10:00:00Z", message: { content: "add a test for utils.ts" } }), + JSON.stringify({ type: "assistant", timestamp: "2026-03-15T10:01:00Z", message: { content: [{ type: "text", text: "Done, test added." }] } }), + ]; + writeFileSync(sessionFile, lines.join("\n")); + + const result = await server.tools["estimate_cost"].handler({ + session_dir: sessionFile, + }); + + expect(result.content[0].text).toContain("No corrections detected"); + }); + + it("counts preflight tool calls", async () => { + const sessionFile = join(tmpDir, "preflight.jsonl"); + const lines = [ + JSON.stringify({ type: "user", timestamp: "2026-03-15T10:00:00Z", message: { content: "check my prompt" } }), + JSON.stringify({ + type: "assistant", + timestamp: "2026-03-15T10:01:00Z", + message: { + content: [ + { type: "tool_use", name: "preflight_check", id: "t1", input: { prompt: "test" } }, + ], + }, + }), + JSON.stringify({ type: "tool_result", tool_use_id: "t1", content: "looks good" }), + ]; + writeFileSync(sessionFile, lines.join("\n")); + + const result = await server.tools["estimate_cost"].handler({ + session_dir: sessionFile, + }); + + const text = result.content[0].text; + expect(text).toContain("Preflight checks: 1"); + }); + + it("uses correct pricing for different models", async () => { + const sessionFile = join(tmpDir, "model.jsonl"); + const bigText = "x".repeat(4000); // ~1000 tokens + const lines = [ + JSON.stringify({ type: "user", timestamp: "2026-03-15T10:00:00Z", message: { content: bigText } }), + JSON.stringify({ type: "assistant", timestamp: "2026-03-15T10:01:00Z", message: { content: [{ type: "text", text: bigText }] } }), + ]; + writeFileSync(sessionFile, lines.join("\n")); + + const sonnetResult = await server.tools["estimate_cost"].handler({ + session_dir: sessionFile, + model: "claude-sonnet-4", + }); + const opusResult = await server.tools["estimate_cost"].handler({ + session_dir: sessionFile, + model: "claude-opus-4", + }); + + // Opus should show higher cost + expect(opusResult.content[0].text).toContain("Claude Opus 4"); + expect(sonnetResult.content[0].text).toContain("Claude Sonnet 4"); + }); +}); + +// Need afterEach import +import { afterEach } from "vitest"; diff --git a/tests/tools/prompt-score.test.ts b/tests/tools/prompt-score.test.ts new file mode 100644 index 0000000..7ab3320 --- /dev/null +++ b/tests/tools/prompt-score.test.ts @@ -0,0 +1,98 @@ +import { describe, it, expect, vi, beforeEach } from "vitest"; + +// Mock fs to avoid writing state files during tests +vi.mock("node:fs/promises", () => ({ + readFile: vi.fn().mockRejectedValue(new Error("no file")), + writeFile: vi.fn().mockResolvedValue(undefined), + mkdir: vi.fn().mockResolvedValue(undefined), +})); + +function createMockServer() { + const tools: Record = {}; + return { + tool(name: string, description: string, _schema: any, handler: Function) { + tools[name] = { handler, description }; + }, + tools, + }; +} + +describe("prompt_score", () => { + let server: ReturnType; + + beforeEach(async () => { + vi.clearAllMocks(); + server = createMockServer(); + const mod = await import("../../src/tools/prompt-score.js"); + mod.registerPromptScore(server as any); + }); + + it("registers the prompt_score tool", () => { + expect(server.tools["prompt_score"]).toBeDefined(); + expect(server.tools["prompt_score"].description).toContain("Score"); + }); + + it("gives high score to a specific, actionable prompt", async () => { + const result = await server.tools["prompt_score"].handler({ + prompt: "Rename the `fetchUser` function in `src/api/users.ts` to `getUserById`. Only this one function should change. The existing tests must still pass.", + }); + + const text = result.content[0].text; + // Should score well on all dimensions + expect(text).toMatch(/Specificity:\s+25\/25/); + expect(text).toMatch(/Actionability:\s+25\/25/); + expect(text).toMatch(/Done condition:\s+25\/25/); + // Grade should be good + expect(text).toMatch(/[AB][+-]?\s/); + }); + + it("gives low score to a vague prompt", async () => { + const result = await server.tools["prompt_score"].handler({ + prompt: "make it better", + }); + + const text = result.content[0].text; + expect(text).toMatch(/[DF]/); // Low grade + expect(text).toContain("specific"); // Should have feedback about specificity + }); + + it("recognizes action verbs", async () => { + const result = await server.tools["prompt_score"].handler({ + prompt: "refactor the auth module", + }); + const text = result.content[0].text; + expect(text).toMatch(/Actionability:\s+25\/25/); + }); + + it("penalizes vague verbs", async () => { + const result = await server.tools["prompt_score"].handler({ + prompt: "clean up the code", + }); + const text = result.content[0].text; + expect(text).toMatch(/Actionability:\s+15\/25/); + expect(text).toContain("Vague verb"); + }); + + it("rewards file paths for specificity", async () => { + const result = await server.tools["prompt_score"].handler({ + prompt: "do something to src/index.ts", + }); + const text = result.content[0].text; + expect(text).toMatch(/Specificity:\s+25\/25/); + }); + + it("recognizes questions as having done conditions", async () => { + const result = await server.tools["prompt_score"].handler({ + prompt: "What does the auth middleware do?", + }); + const text = result.content[0].text; + expect(text).toMatch(/Done condition:\s+20\/25/); + }); + + it("tracks session average", async () => { + await server.tools["prompt_score"].handler({ prompt: "fix bug" }); + const result = await server.tools["prompt_score"].handler({ prompt: "add test" }); + const text = result.content[0].text; + expect(text).toContain("prompts scored"); + }); +});