Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions AUDIT.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,28 @@ Severity legend: **HIGH** = security bypass / RCE primitive / crash / silent dat

---

## Remediation status (verified against source)

> Updated 2026-06-10. Each finding below was re-checked against the current tree; fixes carry an
> inline comment citing their finding ID, and the full suite is green (`tsc --noEmit` clean, 561
> tests passing). **35 of 39 findings are fixed.** The remaining 4 are accepted decisions, not open
> defects.

- **Fixed (35):** H1, H3, H4, H5, H6, H7, H8, H9, H10 · M1–M14 (all) · L1, L2, L3, L4, L5, L7, L8,
L11, L12, L13, L14, L15.
- **Accepted — won't change (3):**
- **H2** (DNS-rebinding SSRF) — keep permissive; reaching internal/metadata IPs is often the
engagement goal. See the 🟡 row in the triage table.
- **L9** (debug log writes unredacted tool I/O) — opt-in, `0o600`, local-only; the operator's
explicit choice. See the ⛔ row.
- **L6** (no generic high-entropy redaction fallback) — inherent to label-anchored redaction.
- **Hardened (1):** **L10** — self-update now pins the installer to the requested release tag
(immutable ref) instead of mutable `main` for versioned updates, and asserts the script URL is
https on `raw.githubusercontent.com` before fetch. The downloaded binary was already SHA-256
verified fail-closed by `install.sh`. (`src/update/selfUpdate.ts`)

---

## Capability-impact triage — "fix without limiting the operator"

PentesterFlow's mission is to help authorized pentesters/bug-hunters/security-engineers work
Expand Down
36 changes: 36 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,42 @@ All notable changes to this project are documented here. The format is based on
[Keep a Changelog](https://keepachangelog.com/), and this project adheres to
[Semantic Versioning](https://semver.org/).

## [Unreleased]

### Added

- **Saved memory (`#` quick-add)** — a curated, human-readable memory layer
modeled on Claude Code. `#<text>` saves a durable fact (one Markdown file per
fact with frontmatter, under `.pentesterflow/memory/`); `#!<text>` saves it to
the personal scope. The fact catalog is pinned into the system prompt every
turn (survives compaction) and the most relevant facts are recalled in full
per turn, surfaced as a `recalled memory: …` line. Manage with
`/memory add|list|forget`. Secrets are redacted before write.
- **Parallel tool dispatch** — independent tool calls in one step now run
concurrently (bounded fan-out) with results recorded in call order; recon
fan-outs finish in ~max(latency) instead of the sum. Single-call and
`load_skill` steps stay sequential. The permission prompter serializes its
modal so approvals still appear one at a time.
- **LLM retry/backoff** — transient backend failures (HTTP 429 / 502 / 503 /
504 and connection drops) are retried with exponential backoff, honoring a
`Retry-After` header. Applied to the OpenAI-compatible client (Kimi, Groq,
OpenRouter, DeepSeek, LM Studio).

### Changed

- **Self-update hardening** — a pinned `pentesterflow update <version>` now
fetches the installer from that release tag (immutable) instead of `main`, and
the installer URL is asserted to be https on `raw.githubusercontent.com`
before fetch.

### Fixed

- **Redaction gaps** — connection-string query-param credentials
(`?password=` / `&auth=` / `&access_token=`), HTTP Digest `response=` hashes,
and GCP service-account `private_key_id` are now masked.
- Closed out the internal code audit: 35 of 39 findings fixed, 3 accepted as
intentional, 1 hardened (see `AUDIT.md`).

## [0.2.0] - 2026-06-06

Hardening, model tuning, and a transcript/status overhaul, plus Claude
Expand Down
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -273,10 +273,30 @@ Useful commands:
| Command | Purpose |
|---|---|
| `/compact` | Summarize the current session into persistent memory. |
| `/memory` | Show current session memory. |
| `/memory` | Show saved facts + the session checkpoint. |
| `/memory add <text>` | Save a durable fact (same as `#<text>`). |
| `/memory list` | List saved facts. |
| `/memory forget <text>` | Drop saved facts and checkpoint items matching the text. |
| `/snapshot` | Write a redacted context snapshot immediately. |
| `/next [objective]` | Ask for coverage-driven next steps. |

### Saved memory (`#` quick-add)

Type `#` followed by anything you want the agent to remember for the rest of
this session and beyond — for example `#orders API is IDOR-prone on
/api/orders/{id}`. Use `#!<text>` to save it to your **personal** scope instead
of the project.

- Saved facts are durable, human-readable Markdown — one file per fact with
frontmatter — under `./.pentesterflow/memory/` (project) and
`~/.pentesterflow/memory/` (personal), with a generated `MEMORY.md` index.
- The fact catalog is pinned into the system prompt on **every** turn, so it
survives compaction; the facts most relevant to the current turn are recalled
in full automatically (you'll see a `recalled memory: …` line).
- Secrets are redacted before a fact is written to disk.
- Manage them with `#<text>` / `/memory add`, `/memory list`, and
`/memory forget <text>`.

## Burp Integration

Use the companion
Expand Down
228 changes: 228 additions & 0 deletions src/agent/agent.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import { describe, expect, it } from 'vitest';
import { IntelligenceStore } from '../intelligence/store.js';
import type { Client } from '../llm/client.js';
import type { ChatRequest, ChatResponse } from '../llm/types.js';
import { MemoryStore } from '../memory/store.js';
import { AlwaysAllow } from '../permission/permission.js';
import { Store as SessionStore, newID as newSessionID } from '../session/store.js';
import { Registry as SkillRegistry } from '../skills/registry.js';
Expand Down Expand Up @@ -1217,3 +1218,230 @@ describe('reconcileToolCalls (H6)', () => {
expect(reconcileToolCalls(input)).toEqual(input);
});
});

// ---------- E1: parallel tool dispatch ----------

/** A tool that blocks until `n` instances are running at once, proving the
* agent dispatched them concurrently. If they ran sequentially the first
* never sees a second arrive and falls through the timeout, returning
* 'serial' — a fast, deterministic failure rather than a hang. */
class BarrierTool implements Tool {
private count = 0;
private waiters: Array<(ok: boolean) => void> = [];
constructor(private readonly need: number) {}
name(): string {
return 'barrier';
}
description(): string {
return 'barrier';
}
schema(): Record<string, unknown> {
return { type: 'object', properties: { id: { type: 'string' } } };
}
requiresPermission(): boolean {
return false;
}
async run(args: Record<string, unknown>): Promise<string> {
this.count += 1;
if (this.count >= this.need) {
for (const w of this.waiters) w(true);
this.waiters = [];
return `parallel-ok:${String(args.id ?? '')}`;
}
const ok = await new Promise<boolean>((resolve) => {
this.waiters.push(resolve);
setTimeout(() => resolve(false), 300);
});
return `${ok ? 'parallel-ok' : 'serial'}:${String(args.id ?? '')}`;
}
}

/** A tool that sleeps for `delay_ms` then echoes `id` — used to verify result
* order is the call order, not the completion order. */
class DelayTool implements Tool {
name(): string {
return 'delay';
}
description(): string {
return 'delay';
}
schema(): Record<string, unknown> {
return { type: 'object', properties: { id: { type: 'string' }, delay_ms: { type: 'number' } } };
}
requiresPermission(): boolean {
return false;
}
async run(args: Record<string, unknown>): Promise<string> {
await new Promise((r) => setTimeout(r, Number(args.delay_ms ?? 0)));
return `done:${String(args.id ?? '')}`;
}
}

function toolCall(id: string, name: string, args: Record<string, unknown>) {
return { id, type: 'function' as const, function: { name, arguments: JSON.stringify(args) } };
}

function agentWithTools(scripted: ChatResponse[], tools: Tool[]): Agent {
const reg = new ToolRegistry();
for (const t of tools) reg.register(t);
return new Agent({
client: new FakeClient(scripted),
tools: reg,
skills: new SkillRegistry(),
prompter: new AlwaysAllow(),
store: null,
target: new Target(),
});
}

describe('Agent parallel tool dispatch (E1)', () => {
it('runs independent tool calls in the same step concurrently', async () => {
const agent = agentWithTools(
[
{
message: {
role: 'assistant',
content: '',
toolCalls: [
toolCall('c1', 'barrier', { id: 'a' }),
toolCall('c2', 'barrier', { id: 'b' }),
],
},
finishReason: 'tool_calls',
},
{ message: { role: 'assistant', content: 'done' }, finishReason: 'stop' },
],
[new BarrierTool(2)],
);
const { events, sink } = collect();
await agent.run('go', new AbortController().signal, sink);
const results = events
.filter((e) => e.type === 'tool-result')
.map((e) => (e.type === 'tool-result' ? e.result : ''));
// Both only resolve to parallel-ok if they were in flight simultaneously.
expect(results).toEqual(['parallel-ok:a', 'parallel-ok:b']);
});

it('emits tool results in call order even when later calls finish first', async () => {
const agent = agentWithTools(
[
{
message: {
role: 'assistant',
content: '',
toolCalls: [
toolCall('c1', 'delay', { id: 'first', delay_ms: 60 }),
toolCall('c2', 'delay', { id: 'second', delay_ms: 0 }),
],
},
finishReason: 'tool_calls',
},
{ message: { role: 'assistant', content: 'done' }, finishReason: 'stop' },
],
[new DelayTool()],
);
const { events, sink } = collect();
await agent.run('go', new AbortController().signal, sink);
const order = events
.filter((e) => e.type === 'tool-result')
.map((e) => (e.type === 'tool-result' ? e.result : ''));
// 'second' finishes first, but results are recorded in call order.
expect(order).toEqual(['done:first', 'done:second']);
// History keeps the same order: assistant, tool(first), tool(second).
const toolMsgs = agent.getHistory().filter((m) => m.role === 'tool');
expect(toolMsgs.map((m) => m.toolCallID)).toEqual(['c1', 'c2']);
});
});

// ---------- Curated memory: pinned catalog + recall + survives compaction ----------

describe('Agent curated memory', () => {
function makeMemoryAgent(scripted: ChatResponse[]) {
const cwd = mkdtempSync(join(tmpdir(), 'pf-agent-mem-cwd-'));
const home = mkdtempSync(join(tmpdir(), 'pf-agent-mem-home-'));
const memoryStore = new MemoryStore({ cwd, home });
const tools = new ToolRegistry();
tools.register(new EchoTool());
const agent = new Agent({
client: new FakeClient(scripted),
tools,
skills: new SkillRegistry(),
prompter: new AlwaysAllow(),
store: null,
target: new Target(),
memoryStore,
});
return {
agent,
memoryStore,
cleanup: () => {
rmSync(cwd, { recursive: true, force: true });
rmSync(home, { recursive: true, force: true });
},
};
}

it('pins a saved fact into the system prompt immediately (next turn)', async () => {
const { agent, cleanup } = makeMemoryAgent([]);
try {
const fact = await agent.addMemory({ text: 'orders API IDOR on /api/orders/{id}' });
expect(fact).not.toBeNull();
const sys = agent.getHistory()[0]?.content ?? '';
expect(sys).toContain('Saved memory');
expect(sys).toContain(fact?.name ?? 'NOPE');
} finally {
cleanup();
}
});

it('recalls a relevant fact into the turn and emits a memory-recall event', async () => {
const { agent, cleanup } = makeMemoryAgent([
{ message: { role: 'assistant', content: 'ok' }, finishReason: 'stop' },
]);
try {
await agent.addMemory({ text: 'orders API IDOR via sequential id on /api/orders/{id}' });
const { events, sink } = collect();
await agent.run('test the orders endpoint for idor', new AbortController().signal, sink);
const recall = events.find((e) => e.type === 'memory-recall');
expect(recall && recall.type === 'memory-recall' ? recall.names.length : 0).toBeGreaterThan(
0,
);
} finally {
cleanup();
}
});

it('keeps the saved-memory catalog in the prompt after a compaction', async () => {
// compact() resets history to [system, summary]; the catalog must still be
// present in the rebuilt system prompt so the model never "forgets" it.
const { agent, cleanup } = makeMemoryAgent([
{ message: { role: 'assistant', content: 'COMPACTED SUMMARY' }, finishReason: 'stop' },
]);
try {
const fact = await agent.addMemory({ text: 'login OAuth redirect_uri bypass works' });
// Seed a couple of turns of history so there is something to compact.
agent.getHistory(); // no-op accessor
await agent.compact(new AbortController().signal, () => {});
const sys = agent.getHistory()[0]?.content ?? '';
expect(sys).toContain('Saved memory');
expect(sys).toContain(fact?.name ?? 'NOPE');
} finally {
cleanup();
}
});

it('forgetMemory removes a curated fact and drops it from the prompt', async () => {
const { agent, cleanup } = makeMemoryAgent([]);
try {
const fact = await agent.addMemory({ text: 'orders API IDOR on /api/orders/{id}' });
const name = fact?.name ?? 'NOPE';
expect(agent.listCuratedMemory()).toHaveLength(1);
const removed = await agent.forgetMemory('orders');
expect(removed).toContain(name);
expect(agent.listCuratedMemory()).toHaveLength(0);
expect(agent.getHistory()[0]?.content ?? '').not.toContain(name);
} finally {
cleanup();
}
});
});
Loading
Loading