diff --git a/AUDIT.md b/AUDIT.md index 0d959bf..9931f29 100644 --- a/AUDIT.md +++ b/AUDIT.md @@ -14,6 +14,28 @@ Severity legend: **HIGH** = security bypass / RCE primitive / crash / silent dat --- +## Remediation status (verified against source) + +> Updated 2026-06-10. Each finding below was re-checked against the current tree; fixes carry an +> inline comment citing their finding ID, and the full suite is green (`tsc --noEmit` clean, 561 +> tests passing). **35 of 39 findings are fixed.** The remaining 4 are accepted decisions, not open +> defects. + +- **Fixed (35):** H1, H3, H4, H5, H6, H7, H8, H9, H10 · M1–M14 (all) · L1, L2, L3, L4, L5, L7, L8, + L11, L12, L13, L14, L15. +- **Accepted — won't change (3):** + - **H2** (DNS-rebinding SSRF) — keep permissive; reaching internal/metadata IPs is often the + engagement goal. See the 🟡 row in the triage table. + - **L9** (debug log writes unredacted tool I/O) — opt-in, `0o600`, local-only; the operator's + explicit choice. See the ⛔ row. + - **L6** (no generic high-entropy redaction fallback) — inherent to label-anchored redaction. +- **Hardened (1):** **L10** — self-update now pins the installer to the requested release tag + (immutable ref) instead of mutable `main` for versioned updates, and asserts the script URL is + https on `raw.githubusercontent.com` before fetch. The downloaded binary was already SHA-256 + verified fail-closed by `install.sh`. (`src/update/selfUpdate.ts`) + +--- + ## Capability-impact triage — "fix without limiting the operator" PentesterFlow's mission is to help authorized pentesters/bug-hunters/security-engineers work diff --git a/CHANGELOG.md b/CHANGELOG.md index 5577e15..2f8cbda 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,42 @@ All notable changes to this project are documented here. The format is based on [Keep a Changelog](https://keepachangelog.com/), and this project adheres to [Semantic Versioning](https://semver.org/). +## [Unreleased] + +### Added + +- **Saved memory (`#` quick-add)** — a curated, human-readable memory layer + modeled on Claude Code. `#` saves a durable fact (one Markdown file per + fact with frontmatter, under `.pentesterflow/memory/`); `#!` saves it to + the personal scope. The fact catalog is pinned into the system prompt every + turn (survives compaction) and the most relevant facts are recalled in full + per turn, surfaced as a `recalled memory: …` line. Manage with + `/memory add|list|forget`. Secrets are redacted before write. +- **Parallel tool dispatch** — independent tool calls in one step now run + concurrently (bounded fan-out) with results recorded in call order; recon + fan-outs finish in ~max(latency) instead of the sum. Single-call and + `load_skill` steps stay sequential. The permission prompter serializes its + modal so approvals still appear one at a time. +- **LLM retry/backoff** — transient backend failures (HTTP 429 / 502 / 503 / + 504 and connection drops) are retried with exponential backoff, honoring a + `Retry-After` header. Applied to the OpenAI-compatible client (Kimi, Groq, + OpenRouter, DeepSeek, LM Studio). + +### Changed + +- **Self-update hardening** — a pinned `pentesterflow update ` now + fetches the installer from that release tag (immutable) instead of `main`, and + the installer URL is asserted to be https on `raw.githubusercontent.com` + before fetch. + +### Fixed + +- **Redaction gaps** — connection-string query-param credentials + (`?password=` / `&auth=` / `&access_token=`), HTTP Digest `response=` hashes, + and GCP service-account `private_key_id` are now masked. +- Closed out the internal code audit: 35 of 39 findings fixed, 3 accepted as + intentional, 1 hardened (see `AUDIT.md`). + ## [0.2.0] - 2026-06-06 Hardening, model tuning, and a transcript/status overhaul, plus Claude diff --git a/README.md b/README.md index d6eb35a..c1e836f 100644 --- a/README.md +++ b/README.md @@ -273,10 +273,30 @@ Useful commands: | Command | Purpose | |---|---| | `/compact` | Summarize the current session into persistent memory. | -| `/memory` | Show current session memory. | +| `/memory` | Show saved facts + the session checkpoint. | +| `/memory add ` | Save a durable fact (same as `#`). | +| `/memory list` | List saved facts. | +| `/memory forget ` | Drop saved facts and checkpoint items matching the text. | | `/snapshot` | Write a redacted context snapshot immediately. | | `/next [objective]` | Ask for coverage-driven next steps. | +### Saved memory (`#` quick-add) + +Type `#` followed by anything you want the agent to remember for the rest of +this session and beyond — for example `#orders API is IDOR-prone on +/api/orders/{id}`. Use `#!` to save it to your **personal** scope instead +of the project. + +- Saved facts are durable, human-readable Markdown — one file per fact with + frontmatter — under `./.pentesterflow/memory/` (project) and + `~/.pentesterflow/memory/` (personal), with a generated `MEMORY.md` index. +- The fact catalog is pinned into the system prompt on **every** turn, so it + survives compaction; the facts most relevant to the current turn are recalled + in full automatically (you'll see a `recalled memory: …` line). +- Secrets are redacted before a fact is written to disk. +- Manage them with `#` / `/memory add`, `/memory list`, and + `/memory forget `. + ## Burp Integration Use the companion diff --git a/src/agent/agent.test.ts b/src/agent/agent.test.ts index 2ee97c0..c368b56 100644 --- a/src/agent/agent.test.ts +++ b/src/agent/agent.test.ts @@ -10,6 +10,7 @@ import { describe, expect, it } from 'vitest'; import { IntelligenceStore } from '../intelligence/store.js'; import type { Client } from '../llm/client.js'; import type { ChatRequest, ChatResponse } from '../llm/types.js'; +import { MemoryStore } from '../memory/store.js'; import { AlwaysAllow } from '../permission/permission.js'; import { Store as SessionStore, newID as newSessionID } from '../session/store.js'; import { Registry as SkillRegistry } from '../skills/registry.js'; @@ -1217,3 +1218,230 @@ describe('reconcileToolCalls (H6)', () => { expect(reconcileToolCalls(input)).toEqual(input); }); }); + +// ---------- E1: parallel tool dispatch ---------- + +/** A tool that blocks until `n` instances are running at once, proving the + * agent dispatched them concurrently. If they ran sequentially the first + * never sees a second arrive and falls through the timeout, returning + * 'serial' — a fast, deterministic failure rather than a hang. */ +class BarrierTool implements Tool { + private count = 0; + private waiters: Array<(ok: boolean) => void> = []; + constructor(private readonly need: number) {} + name(): string { + return 'barrier'; + } + description(): string { + return 'barrier'; + } + schema(): Record { + return { type: 'object', properties: { id: { type: 'string' } } }; + } + requiresPermission(): boolean { + return false; + } + async run(args: Record): Promise { + this.count += 1; + if (this.count >= this.need) { + for (const w of this.waiters) w(true); + this.waiters = []; + return `parallel-ok:${String(args.id ?? '')}`; + } + const ok = await new Promise((resolve) => { + this.waiters.push(resolve); + setTimeout(() => resolve(false), 300); + }); + return `${ok ? 'parallel-ok' : 'serial'}:${String(args.id ?? '')}`; + } +} + +/** A tool that sleeps for `delay_ms` then echoes `id` — used to verify result + * order is the call order, not the completion order. */ +class DelayTool implements Tool { + name(): string { + return 'delay'; + } + description(): string { + return 'delay'; + } + schema(): Record { + return { type: 'object', properties: { id: { type: 'string' }, delay_ms: { type: 'number' } } }; + } + requiresPermission(): boolean { + return false; + } + async run(args: Record): Promise { + await new Promise((r) => setTimeout(r, Number(args.delay_ms ?? 0))); + return `done:${String(args.id ?? '')}`; + } +} + +function toolCall(id: string, name: string, args: Record) { + return { id, type: 'function' as const, function: { name, arguments: JSON.stringify(args) } }; +} + +function agentWithTools(scripted: ChatResponse[], tools: Tool[]): Agent { + const reg = new ToolRegistry(); + for (const t of tools) reg.register(t); + return new Agent({ + client: new FakeClient(scripted), + tools: reg, + skills: new SkillRegistry(), + prompter: new AlwaysAllow(), + store: null, + target: new Target(), + }); +} + +describe('Agent parallel tool dispatch (E1)', () => { + it('runs independent tool calls in the same step concurrently', async () => { + const agent = agentWithTools( + [ + { + message: { + role: 'assistant', + content: '', + toolCalls: [ + toolCall('c1', 'barrier', { id: 'a' }), + toolCall('c2', 'barrier', { id: 'b' }), + ], + }, + finishReason: 'tool_calls', + }, + { message: { role: 'assistant', content: 'done' }, finishReason: 'stop' }, + ], + [new BarrierTool(2)], + ); + const { events, sink } = collect(); + await agent.run('go', new AbortController().signal, sink); + const results = events + .filter((e) => e.type === 'tool-result') + .map((e) => (e.type === 'tool-result' ? e.result : '')); + // Both only resolve to parallel-ok if they were in flight simultaneously. + expect(results).toEqual(['parallel-ok:a', 'parallel-ok:b']); + }); + + it('emits tool results in call order even when later calls finish first', async () => { + const agent = agentWithTools( + [ + { + message: { + role: 'assistant', + content: '', + toolCalls: [ + toolCall('c1', 'delay', { id: 'first', delay_ms: 60 }), + toolCall('c2', 'delay', { id: 'second', delay_ms: 0 }), + ], + }, + finishReason: 'tool_calls', + }, + { message: { role: 'assistant', content: 'done' }, finishReason: 'stop' }, + ], + [new DelayTool()], + ); + const { events, sink } = collect(); + await agent.run('go', new AbortController().signal, sink); + const order = events + .filter((e) => e.type === 'tool-result') + .map((e) => (e.type === 'tool-result' ? e.result : '')); + // 'second' finishes first, but results are recorded in call order. + expect(order).toEqual(['done:first', 'done:second']); + // History keeps the same order: assistant, tool(first), tool(second). + const toolMsgs = agent.getHistory().filter((m) => m.role === 'tool'); + expect(toolMsgs.map((m) => m.toolCallID)).toEqual(['c1', 'c2']); + }); +}); + +// ---------- Curated memory: pinned catalog + recall + survives compaction ---------- + +describe('Agent curated memory', () => { + function makeMemoryAgent(scripted: ChatResponse[]) { + const cwd = mkdtempSync(join(tmpdir(), 'pf-agent-mem-cwd-')); + const home = mkdtempSync(join(tmpdir(), 'pf-agent-mem-home-')); + const memoryStore = new MemoryStore({ cwd, home }); + const tools = new ToolRegistry(); + tools.register(new EchoTool()); + const agent = new Agent({ + client: new FakeClient(scripted), + tools, + skills: new SkillRegistry(), + prompter: new AlwaysAllow(), + store: null, + target: new Target(), + memoryStore, + }); + return { + agent, + memoryStore, + cleanup: () => { + rmSync(cwd, { recursive: true, force: true }); + rmSync(home, { recursive: true, force: true }); + }, + }; + } + + it('pins a saved fact into the system prompt immediately (next turn)', async () => { + const { agent, cleanup } = makeMemoryAgent([]); + try { + const fact = await agent.addMemory({ text: 'orders API IDOR on /api/orders/{id}' }); + expect(fact).not.toBeNull(); + const sys = agent.getHistory()[0]?.content ?? ''; + expect(sys).toContain('Saved memory'); + expect(sys).toContain(fact?.name ?? 'NOPE'); + } finally { + cleanup(); + } + }); + + it('recalls a relevant fact into the turn and emits a memory-recall event', async () => { + const { agent, cleanup } = makeMemoryAgent([ + { message: { role: 'assistant', content: 'ok' }, finishReason: 'stop' }, + ]); + try { + await agent.addMemory({ text: 'orders API IDOR via sequential id on /api/orders/{id}' }); + const { events, sink } = collect(); + await agent.run('test the orders endpoint for idor', new AbortController().signal, sink); + const recall = events.find((e) => e.type === 'memory-recall'); + expect(recall && recall.type === 'memory-recall' ? recall.names.length : 0).toBeGreaterThan( + 0, + ); + } finally { + cleanup(); + } + }); + + it('keeps the saved-memory catalog in the prompt after a compaction', async () => { + // compact() resets history to [system, summary]; the catalog must still be + // present in the rebuilt system prompt so the model never "forgets" it. + const { agent, cleanup } = makeMemoryAgent([ + { message: { role: 'assistant', content: 'COMPACTED SUMMARY' }, finishReason: 'stop' }, + ]); + try { + const fact = await agent.addMemory({ text: 'login OAuth redirect_uri bypass works' }); + // Seed a couple of turns of history so there is something to compact. + agent.getHistory(); // no-op accessor + await agent.compact(new AbortController().signal, () => {}); + const sys = agent.getHistory()[0]?.content ?? ''; + expect(sys).toContain('Saved memory'); + expect(sys).toContain(fact?.name ?? 'NOPE'); + } finally { + cleanup(); + } + }); + + it('forgetMemory removes a curated fact and drops it from the prompt', async () => { + const { agent, cleanup } = makeMemoryAgent([]); + try { + const fact = await agent.addMemory({ text: 'orders API IDOR on /api/orders/{id}' }); + const name = fact?.name ?? 'NOPE'; + expect(agent.listCuratedMemory()).toHaveLength(1); + const removed = await agent.forgetMemory('orders'); + expect(removed).toContain(name); + expect(agent.listCuratedMemory()).toHaveLength(0); + expect(agent.getHistory()[0]?.content ?? '').not.toContain(name); + } finally { + cleanup(); + } + }); +}); diff --git a/src/agent/agent.ts b/src/agent/agent.ts index 409316a..a0360b7 100644 --- a/src/agent/agent.ts +++ b/src/agent/agent.ts @@ -6,9 +6,15 @@ import { type IntelligenceStore, formatIntelligenceContext } from '../intelligence/store.js'; import type { Client, StreamingClient } from '../llm/client.js'; import { isStreaming } from '../llm/client.js'; -import type { ChatRequest, Message } from '../llm/types.js'; +import type { ChatRequest, Message, ToolCall } from '../llm/types.js'; import { parsedArgs } from '../llm/types.js'; import { error as logError } from '../logger/logger.js'; +import { + type AddMemoryInput, + type MemoryFact, + type MemoryStore, + formatMemoryRecall, +} from '../memory/store.js'; import type { Prompter } from '../permission/permission.js'; import { redact } from '../redact/index.js'; import type { SessionMemory, Store } from '../session/store.js'; @@ -54,6 +60,9 @@ export interface AgentOptions { streamingEnabled?: boolean; /** Local scan-intelligence dataset used to improve coverage across sessions. */ intelligence?: IntelligenceStore | null; + /** Curated, human-editable memory (Claude-Code-style facts). Its catalog is + * pinned into the system prompt and matching facts are recalled each turn. */ + memoryStore?: MemoryStore | null; /** Operator-authored engagement notes (from .pentesterflow/engagement.md), * always injected into the system prompt. Loaded once at startup. */ engagement?: string; @@ -64,6 +73,56 @@ export interface AgentOptions { * want to retry it on every turn. */ const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3; const COMPACTION_INPUT_CHAR_LIMIT = 22_000; +// When the model emits several independent tool calls in one step, run them +// concurrently up to this fan-out instead of strictly one-at-a-time — recon +// fan-outs (multiple curl/grep probes) finish in ~max(latency) rather than the +// sum (E1). The permission prompter serializes its modal internally, so +// approvals still appear one at a time. +const MAX_PARALLEL_TOOL_CALLS = 4; +// Tools whose execution mutates agent state that a later call in the SAME step +// can observe (load_skill changes the active-skill allowlist used by the +// allowed-tools gate). A step containing one of these falls back to sequential +// execution so ordering stays deterministic. +const STATEFUL_TOOLS = new Set(['load_skill']); + +interface ParsedToolCall { + args: Record; + argsJSON: string; + parseErr?: Error; +} + +interface ToolCallResult { + result: string; + errStr: string; + durationMs: number; +} + +/** + * Map `items` through `fn` with at most `limit` running at once, returning + * results in input order. `fn` is expected not to reject (callers fold errors + * into their result value); an unexpected rejection still propagates. + */ +async function mapWithConcurrency( + items: T[], + limit: number, + fn: (item: T, index: number) => Promise, +): Promise { + const results = new Array(items.length); + let next = 0; + const worker = async (): Promise => { + while (true) { + const i = next; + next += 1; + if (i >= items.length) return; + const item = items[i]; + if (item === undefined) continue; + results[i] = await fn(item, i); + } + }; + const pool = Array.from({ length: Math.min(limit, items.length) }, worker); + await Promise.all(pool); + return results; +} // Upper bound on retained items per memory list, so a long engagement can't // grow the persisted checkpoint (and its disk footprint) without limit. The // most recent items win; dedup happens in mergeList. Lists default to a much @@ -81,6 +140,7 @@ export class Agent { readonly store: Store | null; readonly target: Target; readonly intelligence: IntelligenceStore | null; + readonly memoryStore: MemoryStore | null; private thinking: boolean; private maxSteps: number; @@ -116,6 +176,7 @@ export class Agent { this.prompter = opts.prompter; this.store = opts.store ?? null; this.intelligence = opts.intelligence ?? null; + this.memoryStore = opts.memoryStore ?? null; this.target = opts.target; this.thinking = opts.thinkingEnabled ?? false; this.maxSteps = opts.maxSteps && opts.maxSteps > 0 ? opts.maxSteps : 20; @@ -132,6 +193,7 @@ export class Agent { promptProfile: this.promptProfile, memory: this.memory, engagement: this.engagement, + curatedMemory: this.memoryStore?.index() ?? '', }); this.history = [{ role: 'system', content: this.sysPrompt }]; } @@ -197,34 +259,39 @@ export class Agent { } /** - * Remove individual memory items whose text contains `query` - * (case-insensitive) across every list, so a single wrong line can be - * dropped without nuking the whole checkpoint. Returns the removed items. + * Remove memory whose text contains `query` (case-insensitive) — both durable + * curated facts and individual session-checkpoint items — so a single wrong + * line can be dropped without nuking everything. Returns the removed entries. */ async forgetMemory(query: string): Promise { const needle = query.trim().toLowerCase(); - if (!needle || !this.memory) return []; + if (!needle) return []; const removed: string[] = []; - const prune = (items: string[]): string[] => - items.filter((item) => { - if (item.toLowerCase().includes(needle)) { - removed.push(item); - return false; - } - return true; - }); - this.memory = { - ...this.memory, - objectives: prune(this.memory.objectives), - plan: prune(this.memory.plan), - completed: prune(this.memory.completed), - findings: prune(this.memory.findings), - tested: prune(this.memory.tested), - files: prune(this.memory.files), - commands: prune(this.memory.commands), - credentials: prune(this.memory.credentials), - todos: prune(this.memory.todos), - }; + // Durable curated facts (deletes the backing files + rebuilds the index). + if (this.memoryStore) removed.push(...this.memoryStore.forget(query)); + // Session checkpoint items. + if (this.memory) { + const prune = (items: string[]): string[] => + items.filter((item) => { + if (item.toLowerCase().includes(needle)) { + removed.push(item); + return false; + } + return true; + }); + this.memory = { + ...this.memory, + objectives: prune(this.memory.objectives), + plan: prune(this.memory.plan), + completed: prune(this.memory.completed), + findings: prune(this.memory.findings), + tested: prune(this.memory.tested), + files: prune(this.memory.files), + commands: prune(this.memory.commands), + credentials: prune(this.memory.credentials), + todos: prune(this.memory.todos), + }; + } if (removed.length === 0) return []; this.rebuildSystemPrompt(); this.history = ensureSystemPrompt(this.history, this.sysPrompt); @@ -652,6 +719,14 @@ export class Agent { if (intelligenceContext && last) { working.splice(working.length - 1, 0, { role: 'system', content: intelligenceContext }); } + // Recall the durable curated facts most relevant to this turn and inject + // their full text just before the user message, so they stay in context + // even after a compaction has scrubbed the transcript (the catalog of names + // is already pinned in the system prompt; this brings in the bodies). + const recall = this.recallCuratedMemory(userMsg, emit); + if (recall && last) { + working.splice(working.length - 1, 0, { role: 'system', content: recall }); + } if (last) last.content = expandedUserMsg; const maxSteps = this.maxSteps; @@ -690,90 +765,173 @@ export class Agent { return; } + await this.executeToolCalls(toolCalls, signal, emit, working); + } + emit({ type: 'error', err: new MaxStepsError(maxSteps) }); + } + + /** + * Run the step's tool calls. A single call, or any step containing a + * state-mutating tool (load_skill), runs sequentially with the original + * interleaved tool-call/tool-result emit order and per-call save. Multiple + * independent calls run with bounded concurrency (E1): all tool-call events + * are emitted in order, the calls execute concurrently, then results are + * emitted and tool messages recorded in the original order so the + * transcript and the provider's tool_call→result pairing stay deterministic + * regardless of completion order. + */ + private async executeToolCalls( + toolCalls: ToolCall[], + signal: AbortSignal, + emit: EventSink, + working: Message[], + ): Promise { + const sequential = + toolCalls.length <= 1 || toolCalls.some((tc) => STATEFUL_TOOLS.has(tc.function.name)); + + if (sequential) { for (const tc of toolCalls) { if (signal.aborted) throw new Error('aborted'); - let args: Record = {}; - let parseErr: Error | undefined; - try { - args = parsedArgs(tc.function); - } catch (err) { - parseErr = err instanceof Error ? err : new Error(String(err)); - } - const argsJSON = tc.function.arguments; + const parsed = this.parseToolCall(tc); emit({ type: 'tool-call', id: tc.id, name: tc.function.name, - args, - argsJSON, + args: parsed.args, + argsJSON: parsed.argsJSON, }); + const res = await this.runParsedToolCall(tc, parsed, signal); + this.recordToolResult(tc, parsed, res, emit, working); + await this.save().catch((err) => + emit({ type: 'error', err: new Error(`save session: ${errMessage(err)}`) }), + ); + } + return; + } - const start = Date.now(); - let result = ''; - let runErr: Error | undefined; - if (parseErr) { - runErr = new Error(`could not parse arguments: ${parseErr.message} (raw: ${argsJSON})`); - } else { - // Enforce the active skills' allowed-tools union before - // dispatch. Soft-fail (set runErr) instead of throwing so the - // model sees the error as a tool result and can self-correct - // — usually by loading a different skill or by giving up on - // the disallowed action. Workflow tools (load_skill, - // coverage, read_*, ask, finding, browser_capture_*) are - // always allowed regardless of which skill is active. - const allowed = this.isToolAllowed(tc.function.name); - if (!allowed.ok) { - runErr = new Error(allowed.reason ?? 'tool blocked by active skills'); - } else { - try { - result = await this.tools.execute(tc.function.name, args, signal, this.prompter); - } catch (err) { - runErr = err instanceof Error ? err : new Error(String(err)); - } - } - } - const durationMs = Date.now() - start; - let errStr = ''; - if (runErr) { - errStr = runErr.message; - result = `ERROR: ${errStr}`; - logError('agent: tool failed', { - tool: tc.function.name, - duration_ms: durationMs, - err: errStr, - }); - } + const parsedAll = toolCalls.map((tc) => this.parseToolCall(tc)); + toolCalls.forEach((tc, i) => { + const parsed = parsedAll[i]; + if (parsed) { emit({ - type: 'tool-result', + type: 'tool-call', id: tc.id, name: tc.function.name, - result, - err: errStr, - durationMs, + args: parsed.args, + argsJSON: parsed.argsJSON, }); + } + }); + const results = await mapWithConcurrency(toolCalls, MAX_PARALLEL_TOOL_CALLS, (tc, i) => + this.runParsedToolCall(tc, parsedAll[i] ?? this.parseToolCall(tc), signal), + ); + toolCalls.forEach((tc, i) => { + const parsed = parsedAll[i]; + const res = results[i]; + if (parsed && res) this.recordToolResult(tc, parsed, res, emit, working); + }); + // One save covers the whole batch — every tool message is appended above. + await this.save().catch((err) => + emit({ type: 'error', err: new Error(`save session: ${errMessage(err)}`) }), + ); + if (signal.aborted) throw new Error('aborted'); + } - if (tc.function.name === 'load_skill' && !runErr) { - const nm = typeof args.name === 'string' ? args.name : ''; - if (nm) { - this.activeSkills.add(nm); - emit({ type: 'skill-active', name: nm }); - } + /** Parse a tool call's JSON arguments, capturing (not throwing) a parse error + * so the model sees it as a tool result and can self-correct. */ + private parseToolCall(tc: ToolCall): ParsedToolCall { + let args: Record = {}; + let parseErr: Error | undefined; + try { + args = parsedArgs(tc.function); + } catch (err) { + parseErr = err instanceof Error ? err : new Error(String(err)); + } + return { args, argsJSON: tc.function.arguments, parseErr }; + } + + /** Dispatch one parsed tool call (allowed-tools gate + permission-gated + * execute). Never throws — failures come back as an error result string so + * the call can run inside a concurrency pool without rejecting its peers. */ + private async runParsedToolCall( + tc: ToolCall, + parsed: ParsedToolCall, + signal: AbortSignal, + ): Promise { + if (signal.aborted) return { result: 'ERROR: aborted', errStr: 'aborted', durationMs: 0 }; + const start = Date.now(); + let result = ''; + let runErr: Error | undefined; + if (parsed.parseErr) { + runErr = new Error( + `could not parse arguments: ${parsed.parseErr.message} (raw: ${parsed.argsJSON})`, + ); + } else { + // Enforce the active skills' allowed-tools union before dispatch. + // Soft-fail (set runErr) instead of throwing so the model sees the error + // as a tool result and can self-correct — usually by loading a different + // skill or giving up on the disallowed action. Workflow tools + // (load_skill, coverage, read_*, ask, finding, browser_capture_*) are + // always allowed regardless of which skill is active. + const allowed = this.isToolAllowed(tc.function.name); + if (!allowed.ok) { + runErr = new Error(allowed.reason ?? 'tool blocked by active skills'); + } else { + try { + result = await this.tools.execute(tc.function.name, parsed.args, signal, this.prompter); + } catch (err) { + runErr = err instanceof Error ? err : new Error(String(err)); } + } + } + const durationMs = Date.now() - start; + let errStr = ''; + if (runErr) { + errStr = runErr.message; + result = `ERROR: ${errStr}`; + logError('agent: tool failed', { + tool: tc.function.name, + duration_ms: durationMs, + err: errStr, + }); + } + return { result, errStr, durationMs }; + } - const toolMsg: Message = { - role: 'tool', - content: result, - toolCallID: tc.id, - name: tc.function.name, - }; - this.history.push(toolMsg); - working.push(toolMsg); - await this.save().catch((err) => - emit({ type: 'error', err: new Error(`save session: ${errMessage(err)}`) }), - ); + /** Emit a tool result, apply any active-skill activation (load_skill), and + * append the tool message to both history and the working transcript. */ + private recordToolResult( + tc: ToolCall, + parsed: ParsedToolCall, + res: ToolCallResult, + emit: EventSink, + working: Message[], + ): void { + emit({ + type: 'tool-result', + id: tc.id, + name: tc.function.name, + result: res.result, + err: res.errStr, + durationMs: res.durationMs, + }); + + if (tc.function.name === 'load_skill' && !res.errStr) { + const nm = typeof parsed.args.name === 'string' ? parsed.args.name : ''; + if (nm) { + this.activeSkills.add(nm); + emit({ type: 'skill-active', name: nm }); } } - emit({ type: 'error', err: new MaxStepsError(maxSteps) }); + + const toolMsg: Message = { + role: 'tool', + content: res.result, + toolCallID: tc.id, + name: tc.function.name, + }; + this.history.push(toolMsg); + working.push(toolMsg); } private async chat( @@ -898,6 +1056,7 @@ export class Agent { promptProfile: this.promptProfile, memory: this.memory, engagement: this.engagement, + curatedMemory: this.memoryStore?.index() ?? '', }); } @@ -906,6 +1065,39 @@ export class Agent { await this.store.save(this.history, this.target, this.memory); } + /** + * Save a durable curated-memory fact (the `#` quick-add / `/memory add` + * backend). Rebuilds the system prompt and reseeds it into history so the new + * fact's catalog entry is in context on the very next turn, then persists. + * Returns the stored fact, or null when there's no store or the text is empty. + */ + async addMemory(input: AddMemoryInput): Promise { + if (!this.memoryStore) return null; + const fact = this.memoryStore.add(input); + if (!fact) return null; + this.rebuildSystemPrompt(); + this.history = ensureSystemPrompt(this.history, this.sysPrompt); + await this.save(); + return fact; + } + + /** All curated memory facts (for /memory listing). */ + listCuratedMemory(): MemoryFact[] { + return this.memoryStore?.list() ?? []; + } + + /** Recall the curated facts relevant to this turn; emit a transparency note + * naming what was pulled in (mirrors how Claude Code surfaces recalled + * memories). Returns the prompt stanza, or '' when nothing matched. */ + private recallCuratedMemory(userMsg: string, emit: EventSink): string { + if (!this.memoryStore) return ''; + const query = [userMsg, this.target.baseURL(), this.target.name()].join('\n'); + const facts = this.memoryStore.search(query, 5); + if (facts.length === 0) return ''; + emit({ type: 'memory-recall', names: facts.map((f) => f.name) }); + return formatMemoryRecall(facts); + } + private buildIntelligenceContext(userMsg: string): string { if (!this.intelligence) return ''; const query = [ diff --git a/src/agent/events.ts b/src/agent/events.ts index 3ec1ff0..51cdd3b 100644 --- a/src/agent/events.ts +++ b/src/agent/events.ts @@ -43,6 +43,10 @@ export interface SkillActiveEvent { type: 'skill-active'; name: string; } +export interface MemoryRecallEvent { + type: 'memory-recall'; + names: string[]; +} export interface DoneEvent { type: 'done'; } @@ -56,6 +60,7 @@ export type AgentEvent = | CompactEvent | DecisionEvent | SkillActiveEvent + | MemoryRecallEvent | DoneEvent; /** MaxStepsError is the recognizable Error subtype raised when the tool diff --git a/src/agent/systemPrompt.ts b/src/agent/systemPrompt.ts index b7f72e0..b7d3b1f 100644 --- a/src/agent/systemPrompt.ts +++ b/src/agent/systemPrompt.ts @@ -310,6 +310,12 @@ export interface BuildOptions { * injected — transcript-independent, so it survives compaction unconditionally. */ engagement?: string; + /** + * Curated-memory catalog (names + descriptions from MemoryStore.index()). + * Always injected so the model knows which durable facts exist on every turn, + * surviving compaction; the full facts arrive via per-turn relevance recall. + */ + curatedMemory?: string; } export function buildSystemPrompt(opts: BuildOptions): string { @@ -351,6 +357,7 @@ export function buildSystemPrompt(opts: BuildOptions): string { } sb += renderEngagement(opts.engagement); + sb += renderCuratedMemory(opts.curatedMemory); sb += renderMemory(opts.memory); // Advertise only the skills the user has left enabled AND which allow @@ -383,6 +390,19 @@ function renderEngagement(engagement: string | undefined): string { return `\n# Engagement notes (operator-authored — authoritative, follow over inferred context)\n${text}\n`; } +/** + * Render the curated-memory catalog: the names + one-line descriptions of every + * durable fact the operator saved (via `#` quick-add or /memory add). It rides + * in the system prompt on every request so the model always knows what it can + * recall, even right after a compaction. The full text of a relevant fact is + * injected separately each turn by the agent's relevance recall. + */ +function renderCuratedMemory(catalog: string | undefined): string { + const text = catalog?.trim(); + if (!text) return ''; + return `\n# Saved memory (durable — recalled by relevance each turn)\nThese facts persist across this and future sessions. The matching ones are expanded into the turn automatically; ask to recall any by name.\n${text}\n`; +} + /** * Render persistent session memory as a pinned prompt stanza. This is what * lets a long session survive context compaction and restart: the cumulative diff --git a/src/cli/index.ts b/src/cli/index.ts index 27522bd..0a21195 100644 --- a/src/cli/index.ts +++ b/src/cli/index.ts @@ -34,6 +34,7 @@ import { } from '../llm/providers.js'; import * as logger from '../logger/logger.js'; import { createSessionDebugLog } from '../logger/sessionDebug.js'; +import { MemoryStore } from '../memory/store.js'; import { YoloPrompter } from '../permission/permission.js'; import * as sessionStore from '../session/store.js'; import { skillSearchDirs } from '../skills/discovery.js'; @@ -363,6 +364,10 @@ async function main(): Promise { // agent has tried. Persists alongside findings so resumes keep state. const coverageStore = new CoverageStore(`findings/coverage-${sessionID}.json`); const intelligenceStore = new IntelligenceStore(); + // Curated, human-editable memory (Claude-Code-style facts). Its catalog is + // pinned into the system prompt and matching facts recalled each turn, so a + // `#`-saved fact stays in context for the rest of the session and beyond. + const memoryStore = new MemoryStore(); // Operator-authored engagement notes (scope/rules/creds). Read once at // startup from project + home .pentesterflow/engagement.md; always injected // into the system prompt so it survives compaction unconditionally. @@ -522,6 +527,7 @@ async function main(): Promise { toolingProfile: cfg.tooling_profile, promptProfile: effectivePromptProfile(cfg), intelligence: intelligenceStore, + memoryStore, engagement, // --no-stream takes precedence over the config default so users can // toggle off streaming for a single launch without rewriting config. diff --git a/src/llm/errors.ts b/src/llm/errors.ts index b5d8d56..8506830 100644 --- a/src/llm/errors.ts +++ b/src/llm/errors.ts @@ -14,6 +14,8 @@ export class BackendError extends Error { readonly category: ErrorCategory; readonly statusCode: number; readonly detail: string; + /** Server-advised wait before retrying (from a Retry-After header), in ms. */ + retryAfterMs?: number; constructor(backend: string, category: ErrorCategory, statusCode: number, detail: string) { const msg = @@ -27,6 +29,35 @@ export class BackendError extends Error { } } +/** + * True for errors worth retrying with backoff: rate limits (429) and transient + * upstream failures (502/503/504), plus a `backend-down` transport error (the + * daemon may be mid-restart). A plain 500 is treated as deterministic and NOT + * retried, since it usually reflects a bad request rather than a blip. + */ +export function isTransient(err: unknown): boolean { + if (!(err instanceof BackendError)) return false; + if (err.category === 'backend-down') return true; + return err.statusCode === 429 || [502, 503, 504].includes(err.statusCode); +} + +/** + * Parse a Retry-After header (delta-seconds or an HTTP-date) into ms relative to + * `now`. Returns undefined when absent/unparseable so the caller falls back to + * its own backoff schedule. `now` is injectable for deterministic tests. + */ +export function parseRetryAfter( + header: string | null | undefined, + now = Date.now(), +): number | undefined { + if (!header) return undefined; + const trimmed = header.trim(); + if (/^\d+$/.test(trimmed)) return Number(trimmed) * 1000; + const when = Date.parse(trimmed); + if (Number.isNaN(when)) return undefined; + return Math.max(0, when - now); +} + /** * Classify a transport or non-2xx response into a BackendError. Pass * `transportErr` (from fetch/undici) OR `statusCode` + `body` (from a diff --git a/src/llm/openai.ts b/src/llm/openai.ts index b753100..96772f5 100644 --- a/src/llm/openai.ts +++ b/src/llm/openai.ts @@ -7,11 +7,20 @@ // server omits one. import type { Client, Pinger, StreamingClient } from './client.js'; -import { classifyBackend } from './errors.js'; +import { type BackendError, classifyBackend, parseRetryAfter } from './errors.js'; import { newCallID } from './ids.js'; import { kimiLocksTemperature, kimiSupportsThinkingToggle } from './providers.js'; +import { withRetry } from './retry.js'; import type { ChatRequest, ChatResponse, Message, ToolCall } from './types.js'; +/** Annotate a backend error with the server's Retry-After so withRetry can + * honor it instead of its computed backoff. */ +function withRetryAfter(err: BackendError, resp: Response): BackendError { + const ms = parseRetryAfter(resp.headers.get('retry-after')); + if (ms !== undefined) err.retryAfterMs = ms; + return err; +} + interface OAIToolCallFragment { index: number; id?: string; @@ -134,6 +143,13 @@ export class OpenAIClient implements Client, StreamingClient, Pinger { } async chat(req: ChatRequest, signal?: AbortSignal): Promise { + // Retry rate limits / transient 5xx with backoff (E7). The non-streaming + // call has no observable side effects before it returns, so it's safe to + // re-run wholesale. + return withRetry(() => this.chatOnce(req, signal), { signal }); + } + + private async chatOnce(req: ChatRequest, signal?: AbortSignal): Promise { const body = this.encodeRequest(req, false); const combinedSignal = withTimeout(signal, CHAT_TIMEOUT_MS); let resp: Response; @@ -149,7 +165,7 @@ export class OpenAIClient implements Client, StreamingClient, Pinger { } const raw = await resp.text(); if (resp.status !== 200) { - throw classifyBackend(this.label, null, resp.status, raw); + throw withRetryAfter(classifyBackend(this.label, null, resp.status, raw), resp); } let out: OAIChatResp; try { @@ -192,23 +208,10 @@ export class OpenAIClient implements Client, StreamingClient, Pinger { onDelta: (delta: string) => void, signal?: AbortSignal, ): Promise { - const body = this.encodeRequest(req, true); - const combinedSignal = withTimeout(signal, CHAT_TIMEOUT_MS); - let resp: Response; - try { - resp = await fetch(`${this.baseURL}/chat/completions`, { - method: 'POST', - headers: { ...this.headers(), Accept: 'text/event-stream' }, - body: JSON.stringify(body), - signal: combinedSignal, - }); - } catch (err) { - throw classifyBackend(this.label, err, 0, undefined); - } - if (resp.status !== 200) { - const raw = await resp.text(); - throw classifyBackend(this.label, null, resp.status, raw); - } + // Retry only the connection setup (E7): a transient 429/5xx surfaces before + // any delta is emitted, so re-running openStream can't double-emit tokens. + // Once the 200 stream is flowing, a mid-stream failure is NOT retried. + const resp = await withRetry(() => this.openStream(req, signal), { signal }); if (!resp.body) { throw new Error(`${this.label}: empty stream body`); } @@ -298,6 +301,30 @@ export class OpenAIClient implements Client, StreamingClient, Pinger { return { message: msg, finishReason: finish }; } + /** Open the SSE stream and return the live 200 response, or throw a + * (retry-annotated) BackendError. Extracted so withRetry can re-attempt the + * connection without re-entering the consume loop. */ + private async openStream(req: ChatRequest, signal?: AbortSignal): Promise { + const body = this.encodeRequest(req, true); + const combinedSignal = withTimeout(signal, CHAT_TIMEOUT_MS); + let resp: Response; + try { + resp = await fetch(`${this.baseURL}/chat/completions`, { + method: 'POST', + headers: { ...this.headers(), Accept: 'text/event-stream' }, + body: JSON.stringify(body), + signal: combinedSignal, + }); + } catch (err) { + throw classifyBackend(this.label, err, 0, undefined); + } + if (resp.status !== 200) { + const raw = await resp.text(); + throw withRetryAfter(classifyBackend(this.label, null, resp.status, raw), resp); + } + return resp; + } + private headers(): Record { const h: Record = { ...this.extraHeaders, 'Content-Type': 'application/json' }; if (this.apiKey) h.Authorization = `Bearer ${this.apiKey}`; diff --git a/src/llm/retry.test.ts b/src/llm/retry.test.ts new file mode 100644 index 0000000..7af0c77 --- /dev/null +++ b/src/llm/retry.test.ts @@ -0,0 +1,92 @@ +import { describe, expect, it, vi } from 'vitest'; +import { BackendError, isTransient, parseRetryAfter } from './errors.js'; +import { withRetry } from './retry.js'; + +const transient = (status: number) => new BackendError('test', 'unknown', status, 'rate limited'); +const down = () => new BackendError('test', 'backend-down', 0, 'socket hang up'); + +describe('isTransient', () => { + it('flags 429 and 502/503/504 and backend-down', () => { + expect(isTransient(transient(429))).toBe(true); + expect(isTransient(transient(503))).toBe(true); + expect(isTransient(down())).toBe(true); + }); + it('does not flag 4xx (except 429), 500, or plain errors', () => { + expect(isTransient(transient(400))).toBe(false); + expect(isTransient(transient(401))).toBe(false); + expect(isTransient(transient(500))).toBe(false); + expect(isTransient(new Error('boom'))).toBe(false); + }); +}); + +describe('parseRetryAfter', () => { + it('parses delta-seconds', () => { + expect(parseRetryAfter('2')).toBe(2000); + }); + it('parses an HTTP-date relative to now', () => { + const now = Date.parse('2026-01-01T00:00:00Z'); + expect(parseRetryAfter('Thu, 01 Jan 2026 00:00:05 GMT', now)).toBe(5000); + }); + it('returns undefined for missing/garbage', () => { + expect(parseRetryAfter(null)).toBeUndefined(); + expect(parseRetryAfter('soon')).toBeUndefined(); + }); +}); + +describe('withRetry', () => { + const noSleep = vi.fn(async () => {}); + + it('returns immediately on success without retrying', async () => { + const fn = vi.fn(async () => 'ok'); + expect(await withRetry(fn, { sleep: noSleep })).toBe('ok'); + expect(fn).toHaveBeenCalledTimes(1); + }); + + it('retries transient failures up to the limit then succeeds', async () => { + const fn = vi + .fn() + .mockRejectedValueOnce(transient(429)) + .mockRejectedValueOnce(transient(503)) + .mockResolvedValueOnce('done'); + const onRetry = vi.fn(); + expect(await withRetry(fn, { sleep: noSleep, onRetry })).toBe('done'); + expect(fn).toHaveBeenCalledTimes(3); + expect(onRetry).toHaveBeenCalledTimes(2); + }); + + it('gives up after exhausting retries and rethrows the last error', async () => { + const fn = vi.fn().mockRejectedValue(transient(429)); + await expect(withRetry(fn, { retries: 2, sleep: noSleep })).rejects.toThrow(/429/); + expect(fn).toHaveBeenCalledTimes(3); // 1 + 2 retries + }); + + it('does not retry a non-transient error', async () => { + const fn = vi.fn().mockRejectedValue(transient(400)); + await expect(withRetry(fn, { sleep: noSleep })).rejects.toThrow(/400/); + expect(fn).toHaveBeenCalledTimes(1); + }); + + it('uses exponential backoff and honors a larger Retry-After', async () => { + const delays: number[] = []; + const sleep = vi.fn(async (ms: number) => { + delays.push(ms); + }); + const withRA = transient(429); + withRA.retryAfterMs = 9000; // larger than the computed 500ms first step + const fn = vi + .fn() + .mockRejectedValueOnce(transient(429)) // backoff 500 + .mockRejectedValueOnce(withRA) // Retry-After 9000 wins + .mockResolvedValueOnce('ok'); + await withRetry(fn, { sleep, baseDelayMs: 500, maxDelayMs: 8000 }); + expect(delays).toEqual([500, 9000]); + }); + + it('stops when the signal is already aborted', async () => { + const fn = vi.fn().mockRejectedValue(transient(429)); + const ctrl = new AbortController(); + ctrl.abort(); + await expect(withRetry(fn, { sleep: noSleep, signal: ctrl.signal })).rejects.toThrow(); + expect(fn).toHaveBeenCalledTimes(1); + }); +}); diff --git a/src/llm/retry.ts b/src/llm/retry.ts new file mode 100644 index 0000000..8285471 --- /dev/null +++ b/src/llm/retry.ts @@ -0,0 +1,64 @@ +// Exponential-backoff retry for transient LLM-backend failures (rate limits, +// 5xx blips, daemon mid-restart). Only errors classified as transient by +// errors.isTransient are retried; everything else (auth failures, bad requests, +// aborts) propagates immediately. A server-advised Retry-After always wins over +// the computed backoff when it asks us to wait longer. + +import { BackendError, isTransient } from './errors.js'; + +export interface RetryOptions { + /** Max additional attempts after the first try (default 2 → up to 3 calls). */ + retries?: number; + /** First backoff step in ms; doubles each attempt (default 500). */ + baseDelayMs?: number; + /** Upper bound on a single backoff wait (default 8000). */ + maxDelayMs?: number; + /** Injectable sleep so tests don't wait real time. */ + sleep?: (ms: number) => Promise; + /** Cancels pending waits and stops further attempts. */ + signal?: AbortSignal; + /** Observability hook fired before each retry wait. */ + onRetry?: (info: { attempt: number; delayMs: number; err: unknown }) => void; +} + +const defaultSleep = (ms: number, signal?: AbortSignal): Promise => + new Promise((resolve, reject) => { + if (signal?.aborted) return reject(new Error('aborted')); + const t = setTimeout(resolve, ms); + signal?.addEventListener( + 'abort', + () => { + clearTimeout(t); + reject(new Error('aborted')); + }, + { once: true }, + ); + }); + +/** + * Run `fn`, retrying on transient backend errors with exponential backoff. + * Returns `fn`'s result, or rethrows the last error once attempts are exhausted + * or the error is non-transient. + */ +export async function withRetry(fn: () => Promise, opts: RetryOptions = {}): Promise { + const retries = opts.retries ?? 2; + const baseDelayMs = opts.baseDelayMs ?? 500; + const maxDelayMs = opts.maxDelayMs ?? 8000; + const sleep = opts.sleep ?? ((ms: number) => defaultSleep(ms, opts.signal)); + + let attempt = 0; + while (true) { + try { + return await fn(); + } catch (err) { + if (opts.signal?.aborted) throw err; + if (attempt >= retries || !isTransient(err)) throw err; + const backoff = Math.min(maxDelayMs, baseDelayMs * 2 ** attempt); + const advised = err instanceof BackendError ? (err.retryAfterMs ?? 0) : 0; + const delayMs = Math.max(backoff, advised); + opts.onRetry?.({ attempt: attempt + 1, delayMs, err }); + await sleep(delayMs); + attempt += 1; + } + } +} diff --git a/src/memory/store.test.ts b/src/memory/store.test.ts new file mode 100644 index 0000000..225f572 --- /dev/null +++ b/src/memory/store.test.ts @@ -0,0 +1,108 @@ +import { mkdtempSync, rmSync } from 'node:fs'; +import { tmpdir } from 'node:os'; +import { join } from 'node:path'; +import { afterEach, beforeEach, describe, expect, it } from 'vitest'; +import { MemoryStore, formatMemoryRecall } from './store.js'; + +let cwd: string; +let home: string; + +beforeEach(() => { + cwd = mkdtempSync(join(tmpdir(), 'pf-mem-cwd-')); + home = mkdtempSync(join(tmpdir(), 'pf-mem-home-')); +}); +afterEach(() => { + rmSync(cwd, { recursive: true, force: true }); + rmSync(home, { recursive: true, force: true }); +}); + +const store = () => new MemoryStore({ cwd, home }); + +describe('MemoryStore', () => { + it('adds a fact and lists it', () => { + const s = store(); + const fact = s.add({ + text: 'orders API has IDOR via sequential id', + createdAt: '2026-06-10T00:00:00Z', + }); + expect(fact).not.toBeNull(); + const list = s.list(); + expect(list).toHaveLength(1); + expect(list[0]?.description).toContain('orders API has IDOR'); + }); + + it('pins a catalog index with name + description', () => { + const s = store(); + s.add({ + text: 'prefer curl over scanners', + type: 'preference', + createdAt: '2026-06-10T00:00:00Z', + }); + const idx = s.index(); + expect(idx).toContain('[preference]'); + expect(idx).toContain('prefer curl over scanners'); + }); + + it('recalls facts by relevance to a query', () => { + const s = store(); + s.add({ + text: 'orders API IDOR via sequential id on /api/orders/{id}', + createdAt: '2026-06-10T00:00:01Z', + }); + s.add({ + text: 'login uses OAuth with a vulnerable redirect_uri', + createdAt: '2026-06-10T00:00:02Z', + }); + const hits = s.search('test the orders endpoint for idor', 5); + expect(hits[0]?.text).toContain('orders API IDOR'); + }); + + it('redacts secrets before persisting', () => { + const s = store(); + const fact = s.add({ + text: 'admin token Bearer abcdef0123456789abcdef0123456789', + createdAt: '2026-06-10T00:00:00Z', + }); + expect(fact?.text).not.toContain('abcdef0123456789abcdef0123456789'); + // And it's not on disk either. + const reread = store().list(); + expect(reread[0]?.text).not.toContain('abcdef0123456789abcdef0123456789'); + }); + + it('separates project and personal scopes', () => { + const s = store(); + s.add({ + text: 'project-only host scope note', + scope: 'project', + createdAt: '2026-06-10T00:00:00Z', + }); + s.add({ + text: 'personal habit always test two accounts', + scope: 'personal', + createdAt: '2026-06-10T00:00:01Z', + }); + const scopes = s + .list() + .map((f) => f.scope) + .sort(); + expect(scopes).toEqual(['personal', 'project']); + }); + + it('forgets facts matching a query', () => { + const s = store(); + s.add({ text: 'orders API IDOR', createdAt: '2026-06-10T00:00:00Z' }); + s.add({ text: 'login OAuth redirect bug', createdAt: '2026-06-10T00:00:01Z' }); + const removed = s.forget('orders'); + expect(removed).toHaveLength(1); + expect(s.list()).toHaveLength(1); + expect(s.list()[0]?.text).toContain('login OAuth'); + }); + + it('returns null on empty text', () => { + expect(store().add({ text: ' ' })).toBeNull(); + }); + + it('formatMemoryRecall renders nothing for an empty set', () => { + expect(formatMemoryRecall([])).toBe(''); + }); +}); diff --git a/src/memory/store.ts b/src/memory/store.ts new file mode 100644 index 0000000..2cad693 --- /dev/null +++ b/src/memory/store.ts @@ -0,0 +1,303 @@ +// Curated memory — durable, human-readable facts the operator (or a `#` +// quick-add) saves, modeled on Claude Code's memory: one markdown file per +// fact with frontmatter, plus a generated MEMORY.md index. Distinct from: +// - SessionMemory (auto compaction checkpoint — ephemeral per session) +// - IntelligenceStore (auto-extracted JSONL scenarios — machine-learned) +// This layer is what the user reads, edits, and recalls. +// +// Two scopes (mirrors EngagementStore / IntelligenceStore): +// - project: ./.pentesterflow/memory/ (this engagement; commit with it) +// - personal: ~/.pentesterflow/memory/ (habits/preferences across engagements) +// +// "Don't forget mid-session" is enforced by the agent, not here: the index() +// rides in the system prompt on every request (survives compaction) and +// search() recalls the full matching facts into each turn's context. + +import { existsSync, mkdirSync, readFileSync, readdirSync, rmSync, writeFileSync } from 'node:fs'; +import { homedir } from 'node:os'; +import { join, resolve } from 'node:path'; +import matter from 'gray-matter'; +import { apply as redact } from '../redact/index.js'; + +export type MemoryScope = 'project' | 'personal'; +export type MemoryType = 'target' | 'technique' | 'preference' | 'reference' | 'note'; + +const MEMORY_TYPES: MemoryType[] = ['target', 'technique', 'preference', 'reference', 'note']; + +// Caps so a long engagement can't let the catalog dominate the prompt or a +// single fact balloon the context. The most-recent facts win on overflow. +const MAX_FACTS_PER_SCOPE = 500; +const MAX_FACT_CHARS = 4000; +const MAX_INDEX_LINES = 200; + +export interface MemoryFact { + name: string; + description: string; + type: MemoryType; + scope: MemoryScope; + text: string; + createdAt: string; + /** Absolute path to the fact file. */ + file: string; +} + +export interface AddMemoryInput { + text: string; + description?: string; + type?: MemoryType; + scope?: MemoryScope; + /** Injectable timestamp (the agent passes one; tests pin it). */ + createdAt?: string; +} + +export interface MemoryStoreOptions { + cwd?: string; + home?: string; +} + +export class MemoryStore { + readonly projectDir: string; + readonly personalDir: string; + + constructor(opts: MemoryStoreOptions = {}) { + const cwd = resolve(opts.cwd ?? process.cwd()); + const home = opts.home ?? homedir(); + this.projectDir = join(cwd, '.pentesterflow', 'memory'); + this.personalDir = join(home, '.pentesterflow', 'memory'); + } + + private dir(scope: MemoryScope): string { + return scope === 'personal' ? this.personalDir : this.projectDir; + } + + /** All facts across both scopes, newest first. Corrupt files are skipped. */ + list(): MemoryFact[] { + const facts = [...this.readScope('project'), ...this.readScope('personal')]; + return facts.sort((a, b) => b.createdAt.localeCompare(a.createdAt)); + } + + /** + * Persist a fact. The text is redacted first so a `#` quick-add of a request + * never writes a live secret to disk. Returns the stored fact, or null when + * the text is empty after trimming. + */ + add(input: AddMemoryInput): MemoryFact | null { + const text = redact(input.text.trim()); + if (!text) return null; + const scope = input.scope ?? 'project'; + const type = input.type && MEMORY_TYPES.includes(input.type) ? input.type : inferType(text); + const description = (input.description?.trim() || firstLine(text)).slice(0, 160); + const createdAt = input.createdAt ?? new Date().toISOString(); + const dir = this.dir(scope); + mkdirSync(dir, { recursive: true, mode: 0o700 }); + + const name = this.uniqueName(dir, slugify(description) || 'note'); + const file = join(dir, `${name}.md`); + const content = matter.stringify(`${text}\n`, { name, description, type, createdAt }); + writeFileSync(file, content, { mode: 0o600 }); + + const fact: MemoryFact = { name, description, type, scope, text, createdAt, file }; + this.pruneScope(scope); + this.writeIndex(scope); + return fact; + } + + /** Remove facts whose name/description/text contains `query` + * (case-insensitive), across both scopes. Returns the removed names. */ + forget(query: string): string[] { + const needle = query.trim().toLowerCase(); + if (!needle) return []; + const removed: string[] = []; + for (const scope of ['project', 'personal'] as MemoryScope[]) { + let changed = false; + for (const fact of this.readScope(scope)) { + const hay = `${fact.name}\n${fact.description}\n${fact.text}`.toLowerCase(); + if (hay.includes(needle)) { + try { + rmSync(fact.file, { force: true }); + removed.push(fact.name); + changed = true; + } catch { + /* best effort */ + } + } + } + if (changed) this.writeIndex(scope); + } + return removed; + } + + /** Compact catalog (names + descriptions) for always-on system-prompt + * injection. Empty string when there is nothing to advertise. */ + index(): string { + const facts = this.list(); + if (facts.length === 0) return ''; + const lines = facts + .slice(0, MAX_INDEX_LINES) + .map((f) => `- [${f.type}] ${f.name} — ${f.description}`); + const overflow = + facts.length > MAX_INDEX_LINES ? `\n- …and ${facts.length - MAX_INDEX_LINES} more` : ''; + return lines.join('\n') + overflow; + } + + /** Relevance recall: the full facts most relevant to `query`, best first. */ + search(query: string, limit = 5): MemoryFact[] { + const tokens = tokenize(query); + if (tokens.length === 0) return []; + const scored: Array<{ fact: MemoryFact; score: number }> = []; + for (const fact of this.list()) { + const score = scoreFact(fact, tokens); + if (score > 0) scored.push({ fact, score }); + } + return scored + .sort((a, b) => b.score - a.score || b.fact.createdAt.localeCompare(a.fact.createdAt)) + .slice(0, Math.max(1, Math.floor(limit))) + .map((s) => s.fact); + } + + private readScope(scope: MemoryScope): MemoryFact[] { + const dir = this.dir(scope); + if (!existsSync(dir)) return []; + let names: string[]; + try { + names = readdirSync(dir).filter((n) => n.endsWith('.md') && n !== 'MEMORY.md'); + } catch { + return []; + } + const out: MemoryFact[] = []; + for (const n of names) { + const file = join(dir, n); + try { + const parsed = matter(readFileSync(file, 'utf8')); + const data = parsed.data as Partial; + const text = parsed.content.trim(); + if (!text) continue; + out.push({ + name: typeof data.name === 'string' ? data.name : n.replace(/\.md$/, ''), + description: typeof data.description === 'string' ? data.description : firstLine(text), + type: + typeof data.type === 'string' && MEMORY_TYPES.includes(data.type as MemoryType) + ? (data.type as MemoryType) + : inferType(text), + scope, + text: text.slice(0, MAX_FACT_CHARS), + createdAt: typeof data.createdAt === 'string' ? data.createdAt : '', + file, + }); + } catch { + /* skip corrupt fact */ + } + } + return out; + } + + /** Regenerate the human-readable MEMORY.md index for a scope. */ + private writeIndex(scope: MemoryScope): void { + const dir = this.dir(scope); + const facts = this.readScope(scope).sort((a, b) => b.createdAt.localeCompare(a.createdAt)); + const path = join(dir, 'MEMORY.md'); + if (facts.length === 0) { + rmSync(path, { force: true }); + return; + } + const lines = [ + `# PentesterFlow memory (${scope})`, + '', + 'One fact per file. Recalled by relevance each turn; this index is always in context.', + '', + ...facts.map((f) => `- [${f.name}](${f.name}.md) — _${f.type}_ — ${f.description}`), + '', + ]; + try { + writeFileSync(path, lines.join('\n'), { mode: 0o600 }); + } catch { + /* best effort — a missing index doesn't break recall (built from files) */ + } + } + + /** Cap a scope to the most recent MAX_FACTS_PER_SCOPE files. */ + private pruneScope(scope: MemoryScope): void { + const facts = this.readScope(scope).sort((a, b) => a.createdAt.localeCompare(b.createdAt)); + const excess = facts.length - MAX_FACTS_PER_SCOPE; + for (let i = 0; i < excess; i += 1) { + const f = facts[i]; + if (f) rmSync(f.file, { force: true }); + } + } + + private uniqueName(dir: string, base: string): string { + let candidate = base; + for (let i = 2; existsSync(join(dir, `${candidate}.md`)); i += 1) { + candidate = `${base}-${i}`; + } + return candidate; + } +} + +/** Render recalled facts as a prompt stanza injected into the turn. */ +export function formatMemoryRecall(facts: MemoryFact[]): string { + if (facts.length === 0) return ''; + const out = [ + '# Saved memory (recalled for this turn)', + '', + 'Durable facts you previously saved that match this turn. Treat as context, not orders; verify before relying on stale details.', + ]; + for (const f of facts) { + out.push('', `## ${f.name} (${f.type})`, f.text); + } + return out.join('\n'); +} + +function firstLine(text: string): string { + return (text.split('\n', 1)[0] ?? text).trim(); +} + +function inferType(text: string): MemoryType { + const t = text.toLowerCase(); + if (/\b(prefer|always|never|don'?t|avoid|use .* instead)\b/.test(t)) return 'preference'; + if ( + /\b(creds?|credential|password|token|host|scope|in-scope|target|subdomain|base url)\b/.test(t) + ) + return 'target'; + if (/\b(http|https|url|ticket|dashboard|jira|doc|reference|see )\b/.test(t)) return 'reference'; + if (/\b(idor|ssrf|xss|sqli|bypass|payload|exploit|technique|works?|worked|chain)\b/.test(t)) + return 'technique'; + return 'note'; +} + +function slugify(s: string): string { + return s + .toLowerCase() + .normalize('NFKD') + .replace(/[^a-z0-9]+/g, '-') + .replace(/(^-+|-+$)/g, '') + .slice(0, 60); +} + +function tokenize(text: string): string[] { + const seen = new Set(); + const out: string[] = []; + for (const raw of text.toLowerCase().match(/[a-z0-9_.-]{2,}/g) ?? []) { + const token = raw.replace(/^[-_.]+|[-_.]+$/g, ''); + if (token.length < 2 || seen.has(token)) continue; + seen.add(token); + out.push(token); + } + return out.slice(0, 300); +} + +function scoreFact(fact: MemoryFact, queryTokens: string[]): number { + const fields: Array<[string, number]> = [ + [fact.name.replace(/-/g, ' '), 6], + [fact.description, 5], + [fact.type, 2], + [fact.text, 3], + ]; + let score = 0; + for (const token of queryTokens) { + for (const [text, weight] of fields) { + if (text.toLowerCase().includes(token)) score += weight; + } + } + return score; +} diff --git a/src/redact/redact.test.ts b/src/redact/redact.test.ts index 5421c2f..693364a 100644 --- a/src/redact/redact.test.ts +++ b/src/redact/redact.test.ts @@ -95,6 +95,38 @@ describe('redact.apply', () => { expect(out).not.toContain(apiKey); }); + it('redacts OpenSSH-format private key blocks (E25 regression)', () => { + const begin = frag('-----BEGIN OPENSSH PRIVATE ', 'KEY-----'); + const end = frag('-----END OPENSSH PRIVATE ', 'KEY-----'); + const body = frag('b3BlbnNzaC1rZXktdjEAAAAABG5vbmU', 'AAAAEbm9uZQAAAA'); + const out = apply(`${begin}\n${body}\n${end}`); + expect(out).not.toContain(body); + expect(out).toContain('BEGIN PRIVATE KEY'); + }); + + it('redacts credentials in connection-string query params (E25)', () => { + const pw = 'hunter2'; + const out = apply(`mongodb+srv://u@db.internal/app?authSource=admin&password=${pw}`); + expect(out).not.toContain(`password=${pw}`); + expect(out).toContain('authSource=admin'); // non-secret param survives + const tok = frag('ya29.', 'a0AfH6SMBshort'); + const out2 = apply(`https://api/x?access_token=${tok}&page=2`); + expect(out2).not.toContain(tok); + expect(out2).toContain('page=2'); + }); + + it('redacts the HTTP Digest auth response hash (E25)', () => { + const hash = frag('6629fae49393a0', '5397450978507c4ef1'); + const out = apply(`Authorization: Digest username="admin", realm="x", response="${hash}"`); + expect(out).not.toContain(hash); + }); + + it('redacts a GCP service-account private_key_id (E25)', () => { + const kid = frag('a1b2c3d4e5f6', '0718293a4b5c6d7e8f90'); + const out = apply(`{"type":"service_account","private_key_id":"${kid}"}`); + expect(out).not.toContain(kid); + }); + it('returns empty string for empty input', () => { expect(apply('')).toBe(''); }); diff --git a/src/redact/redact.ts b/src/redact/redact.ts index e8b7bee..e6818bf 100644 --- a/src/redact/redact.ts +++ b/src/redact/redact.ts @@ -39,6 +39,18 @@ const patterns: RegExp[] = [ /\b(eyJ[A-Za-z0-9_-]{8,}\.)([A-Za-z0-9_-]{8,}(?:\.[A-Za-z0-9_-]+)?)\b/g, // Generic api_key / secret / password / token = value assignment. /((?:api[_-]?key|secret|password|passwd|token)\s*[:=]\s*["']?)([A-Za-z0-9._\-+/=]{16,})/gi, + // Credentials carried as URL/connection-string query parameters, e.g. + // `mongodb+srv://h/db?authSource=admin&password=hunter2` or `?auth=...&token=...`. + // The generic assignment above needs a 16+ char value; here we mask even a + // short query-param secret since the `&`/`#`/whitespace delimiter bounds it (E25). + /([?&](?:password|passwd|pwd|auth|token|api[_-]?key|access_token|secret)=)([^&#\s"']+)/gi, + // HTTP Digest auth: the `response=` field is the credential-derived hash; the + // 2-token Authorization pattern above only catches `Digest username=...`, + // leaving the response/nonce exposed (E25). + /(\bresponse=)("?[A-Fa-f0-9]{8,}"?)/g, + // GCP service-account JSON: the private_key body is caught by the PEM block + // below, but private_key_id (a key fingerprint) leaks separately (E25). + /(["']?private_key_id["']?\s*[:=]\s*["']?)([A-Za-z0-9]{16,})/gi, // Cookie headers. /((?:set-)?cookie:\s*)([^\r\n]+)/gi, // Common API key headers. diff --git a/src/ui/App.tsx b/src/ui/App.tsx index b9b0557..baefab2 100644 --- a/src/ui/App.tsx +++ b/src/ui/App.tsx @@ -479,6 +479,41 @@ export function App({ setHistoryIdx(null); historyDraft.current = ''; + // `#text` quick-adds a durable memory fact (Claude-Code-style); `#!text` + // saves it to the personal scope instead of the project. The fact is in + // context on the very next turn — no agent round-trip. + if (agentValue.startsWith('#')) { + const personal = agentValue.startsWith('#!'); + const text = agentValue.slice(personal ? 2 : 1).trim(); + if (!text) { + dispatch({ + type: 'append', + entry: { + kind: 'system', + text: 'usage: # (or #! for personal)', + }, + }); + return; + } + void agent + .addMemory({ text, scope: personal ? 'personal' : 'project' }) + .then((fact) => + dispatch({ + type: 'append', + entry: fact + ? { kind: 'system', text: `remembered (${fact.scope}/${fact.type}): ${fact.name}` } + : { kind: 'error', text: 'memory not saved (empty after redaction or no store)' }, + }), + ) + .catch((err: unknown) => + dispatch({ + type: 'append', + entry: { kind: 'error', text: `memory save failed: ${String(err)}` }, + }), + ); + return; + } + if (agentValue.startsWith('/')) { const handled = handleSlash( agent, @@ -967,9 +1002,54 @@ function handleSlash( ); return true; } + if (sub === 'add') { + const text = rest.slice(1).join(' ').trim(); + if (!text) { + dispatch({ + type: 'append', + entry: { kind: 'error', text: 'usage: /memory add (or use #)' }, + }); + return true; + } + void agent + .addMemory({ text }) + .then((fact) => + dispatch({ + type: 'append', + entry: fact + ? { kind: 'system', text: `remembered (${fact.scope}/${fact.type}): ${fact.name}` } + : { kind: 'error', text: 'memory not saved' }, + }), + ) + .catch((err: unknown) => + dispatch({ + type: 'append', + entry: { kind: 'error', text: `memory save failed: ${String(err)}` }, + }), + ); + return true; + } + if (sub === 'list') { + const facts = agent.listCuratedMemory(); + dispatch({ + type: 'append', + entry: { + kind: 'system', + text: facts.length + ? `Saved memory (${facts.length}):\n${facts.map((f) => `- [${f.type}] ${f.name} — ${f.description}`).join('\n')}` + : 'no saved memory yet — add one with # or /memory add ', + }, + }); + return true; + } + // Default view: durable curated facts first, then the session checkpoint. + const facts = agent.listCuratedMemory(); + const curated = facts.length + ? `Saved memory (${facts.length}):\n${facts.map((f) => `- [${f.type}] ${f.name} — ${f.description}`).join('\n')}` + : 'No saved memory yet. Add one with # or /memory add .'; dispatch({ type: 'append', - entry: { kind: 'system', text: agent.formatMemory() }, + entry: { kind: 'system', text: `${curated}\n\n${agent.formatMemory()}` }, }); return true; } @@ -1471,6 +1551,10 @@ const helpChalk = new Chalk({ level: chalkLevel() }); const KEYBINDINGS: Array<{ keys: string; desc: string }> = [ { keys: '@', desc: 'inline a file into the next turn (Tab opens a picker)' }, + { + keys: '#', + desc: 'remember a durable fact (#! = personal); recalled automatically', + }, { keys: '/', desc: 'open the slash-command menu' }, { keys: '↑ / ↓', desc: 'walk session prompt history (on first / last line of input)' }, { keys: 'Ctrl-N / Ctrl-J', desc: 'insert a newline inside the input' }, diff --git a/src/ui/permBridge.test.ts b/src/ui/permBridge.test.ts index d1993c1..47094b1 100644 --- a/src/ui/permBridge.test.ts +++ b/src/ui/permBridge.test.ts @@ -85,3 +85,71 @@ describe('BridgedPrompter', () => { expect(denials).toBe(1); }); }); + +describe('BridgedPrompter concurrency (E1)', () => { + it('serializes concurrent asks so only one modal is ever open at once', async () => { + let open = 0; + let maxOpen = 0; + let pending: PermissionRequest | null = null; + const bridge = new BridgedPrompter((req) => { + if (req) { + open += 1; + maxOpen = Math.max(maxOpen, open); + pending = req; + } else { + open -= 1; + } + }); + const flush = () => new Promise((r) => setTimeout(r, 0)); + + // Two asks fired before either resolves — different cacheKeys so neither is + // served from cache. + const p1 = bridge.ask({ tool: 'http', summary: 's', detail: 'd', cacheKey: 'a' }); + const p2 = bridge.ask({ tool: 'http', summary: 's', detail: 'd', cacheKey: 'b' }); + + expect(open).toBe(1); // only the first opened a modal; the second is parked + + (pending as PermissionRequest | null)?.resolve('allow-once'); + await p1; + await flush(); // let the lock hand off and the queued ask publish + + expect(open).toBe(1); // first modal closed, second now open + (pending as PermissionRequest | null)?.resolve('allow-once'); + await p2; + + expect(maxOpen).toBe(1); // never two modals at the same time + }); + + it('coalesces a same-origin fan-out into one modal once allow-session lands', async () => { + let open = 0; + let maxOpen = 0; + let pending: PermissionRequest | null = null; + const bridge = new BridgedPrompter((req) => { + if (req) { + open += 1; + maxOpen = Math.max(maxOpen, open); + pending = req; + } else { + open -= 1; + } + }); + const flush = () => new Promise((r) => setTimeout(r, 0)); + const req = { tool: 'http', summary: 's', detail: 'd', cacheKey: 'https://t' }; + + // Three concurrent asks to the same origin — all miss the cache and queue. + const p1 = bridge.ask(req); + const p2 = bridge.ask(req); + const p3 = bridge.ask(req); + expect(open).toBe(1); // only the first opened a modal + + (pending as PermissionRequest | null)?.resolve('allow-session'); + const d1 = await p1; + await flush(); + const [d2, d3] = await Promise.all([p2, p3]); + + expect(d1).toBe('allow-session'); + expect(d2).toBe('allow-once'); // served from cache after acquiring the lock + expect(d3).toBe('allow-once'); + expect(maxOpen).toBe(1); // the queued asks never re-opened a modal + }); +}); diff --git a/src/ui/permBridge.ts b/src/ui/permBridge.ts index d5762ae..8f0197e 100644 --- a/src/ui/permBridge.ts +++ b/src/ui/permBridge.ts @@ -15,6 +15,13 @@ export type PermissionPublisher = (req: PermissionRequest | null) => void; export class BridgedPrompter implements Prompter { private publish: PermissionPublisher; private sessionAllowed = new Set(); + // Single-modal lock. The TUI can only show one permission modal at a time, so + // concurrent ask() calls (e.g. parallel tool dispatch) must be serialized — + // otherwise the second publish() clobbers the first request in app state. The + // lock is handed off directly to the next waiter (busy is never cleared while + // a waiter exists) so a freshly-arriving ask can't slip in between. + private busy = false; + private waiters: Array<() => void> = []; constructor(publish: PermissionPublisher) { this.publish = publish; @@ -37,6 +44,31 @@ export class BridgedPrompter implements Prompter { // the cache (L13 — defuses a latent bypass even though key spaces don't // currently collide). if (!req.noSessionCache && this.sessionAllowed.has(this.keyFor(req))) return 'allow-once'; + // Acquire the single-modal lock. If idle we take it synchronously (so the + // first publish stays on the caller's stack); otherwise we park until the + // in-flight prompt hands the lock off to us. + if (this.busy) { + await new Promise((resolve) => this.waiters.push(resolve)); + } else { + this.busy = true; + } + try { + // Re-check the cache now that we hold the lock: in a parallel fan-out to + // one origin, several asks miss the cache before any resolves and queue + // here. The first "allow session" populates the cache while the rest wait, + // so the queued ones ride that approval instead of each re-opening an + // identical modal. + if (!req.noSessionCache && this.sessionAllowed.has(this.keyFor(req))) return 'allow-once'; + return await this.askOnce(req, signal); + } finally { + const next = this.waiters.shift(); + if (next) + next(); // hand the lock to the next waiter; busy stays true + else this.busy = false; + } + } + + private askOnce(req: Request, signal?: AbortSignal): Promise { return new Promise((resolve, reject) => { if (signal?.aborted) { reject(new Error('aborted')); diff --git a/src/ui/slashItems.ts b/src/ui/slashItems.ts index 4b451b1..a3650d9 100644 --- a/src/ui/slashItems.ts +++ b/src/ui/slashItems.ts @@ -19,8 +19,8 @@ export const SLASH_ITEMS: SlashItem[] = [ { name: '/compact', description: 'summarize conversation into persistent session memory' }, { name: '/memory', - args: '[clear|forget ]', - description: 'show session memory; clear wipes it, forget drops matching items', + args: '[add |list|forget |clear]', + description: 'saved + session memory; add/list curated facts (or #), forget/clear', }, { name: '/snapshot', description: 'write the current redacted context snapshot now' }, { name: '/burp', args: '[port]', description: 'start the local Burp/PentesterFlow listener' }, diff --git a/src/ui/state.ts b/src/ui/state.ts index 7b52fbe..5e20142 100644 --- a/src/ui/state.ts +++ b/src/ui/state.ts @@ -674,6 +674,14 @@ function applyAgentEvent(state: AppState, ev: AgentEvent): AppState { }; case 'skill-active': return { ...state, activeSkill: ev.name }; + case 'memory-recall': + return { + ...state, + transcript: [ + ...state.transcript, + { kind: 'system', text: `recalled memory: ${ev.names.join(', ')}` }, + ], + }; case 'done': { // End of turn: finalize a trailing streaming assistant entry so it // moves out of the live frame and into the committed scrollback log. diff --git a/src/update/selfUpdate.test.ts b/src/update/selfUpdate.test.ts new file mode 100644 index 0000000..e6eb429 --- /dev/null +++ b/src/update/selfUpdate.test.ts @@ -0,0 +1,30 @@ +import { describe, expect, it } from 'vitest'; +import { assertInstallerURL } from './selfUpdate.js'; + +describe('assertInstallerURL (L10)', () => { + it('accepts the canonical https githubusercontent installer URL', () => { + expect(() => + assertInstallerURL('https://raw.githubusercontent.com/PentesterFlow/agent/main/install.sh'), + ).not.toThrow(); + expect(() => + assertInstallerURL('https://raw.githubusercontent.com/PentesterFlow/agent/v0.2.0/install.sh'), + ).not.toThrow(); + }); + + it('rejects a non-https scheme', () => { + expect(() => + assertInstallerURL('http://raw.githubusercontent.com/PentesterFlow/agent/main/install.sh'), + ).toThrow(/non-https/); + expect(() => assertInstallerURL('file:///etc/passwd')).toThrow(/non-https/); + }); + + it('rejects an unexpected host (tampered PENTESTERFLOW_REPO)', () => { + expect(() => assertInstallerURL('https://evil.example.com/x/main/install.sh')).toThrow( + /unexpected host/, + ); + }); + + it('rejects a malformed URL', () => { + expect(() => assertInstallerURL('not a url')).toThrow(/invalid installer URL/); + }); +}); diff --git a/src/update/selfUpdate.ts b/src/update/selfUpdate.ts index 6e56db9..aa42165 100644 --- a/src/update/selfUpdate.ts +++ b/src/update/selfUpdate.ts @@ -27,10 +27,18 @@ export async function runSelfUpdate(version = 'latest'): Promise { ...(installDir ? { PENTESTERFLOW_INSTALL_DIR: installDir } : {}), }; + // Pin the installer to the requested release tag (immutable git ref) instead + // of the mutable `main` branch whenever a concrete version is given, so + // `/update v0.2.0` runs exactly the installer that shipped with that tag — + // auditable and unchanging — rather than whatever currently sits on main + // (L10). `latest` has no tag to pin to, so it still tracks main; the binary + // it pulls is SHA-256 verified fail-closed by install.sh regardless. + const ref = normalizedVersion === 'latest' ? 'main' : normalizedVersion; + const output = process.platform === 'win32' - ? await runWindowsInstaller(repo, env) - : await runUnixInstaller(repo, env); + ? await runWindowsInstaller(repo, ref, env) + : await runUnixInstaller(repo, ref, env); return { version: normalizedVersion, @@ -52,8 +60,12 @@ function detectInstallDir(): string | undefined { return undefined; } -async function runUnixInstaller(repo: string, env: NodeJS.ProcessEnv): Promise { - const scriptURL = `https://raw.githubusercontent.com/${repo}/main/install.sh`; +async function runUnixInstaller( + repo: string, + ref: string, + env: NodeJS.ProcessEnv, +): Promise { + const scriptURL = `https://raw.githubusercontent.com/${repo}/${ref}/install.sh`; const script = await fetchText(scriptURL); const dir = await mkdtemp(join(tmpdir(), 'pentesterflow-update-')); const file = join(dir, 'install.sh'); @@ -71,8 +83,12 @@ async function runUnixInstaller(repo: string, env: NodeJS.ProcessEnv): Promise { - const scriptURL = `https://raw.githubusercontent.com/${repo}/main/install.ps1`; +async function runWindowsInstaller( + repo: string, + ref: string, + env: NodeJS.ProcessEnv, +): Promise { + const scriptURL = `https://raw.githubusercontent.com/${repo}/${ref}/install.ps1`; const command = [ '$ErrorActionPreference = "Stop"', '$ProgressPreference = "SilentlyContinue"', @@ -93,7 +109,29 @@ async function runWindowsInstaller(repo: string, env: NodeJS.ProcessEnv): Promis return joinOutput(stdout, stderr); } +/** + * Reject an installer URL that isn't https on the expected githubusercontent + * host. The script is fetched then executed, so a tampered PENTESTERFLOW_REPO + * must not be able to redirect the fetch to an attacker scheme/host (L10). TLS + * guards the bytes in flight; this guards the destination. + */ +export function assertInstallerURL(url: string): void { + let parsed: URL; + try { + parsed = new URL(url); + } catch { + throw new Error(`invalid installer URL: ${url}`); + } + if (parsed.protocol !== 'https:') { + throw new Error(`refusing to fetch installer over non-https URL: ${url}`); + } + if (parsed.hostname !== 'raw.githubusercontent.com') { + throw new Error(`refusing to fetch installer from unexpected host: ${parsed.hostname}`); + } +} + async function fetchText(url: string): Promise { + assertInstallerURL(url); const resp = await fetch(url); if (!resp.ok) throw new Error(`download failed: ${url} (${resp.status})`); return await resp.text();