Skip to content

Commit 17f7ed0

Browse files
fix: harden dynamic tool handlers against deadlock, hangs, and runaway output
Four latent liveness/stability bugs on the MCP dynamic-tool execution path would silently hang agent turns or crash the container. None surfaced visible errors, which made them the worst kind of bug: the agent just stopped. 1. Pipe-buffer deadlock: executeShellHandler and executeScriptHandler drained stdout then stderr sequentially. Any handler writing >64KB to stderr before closing stdout (curl -v, git clone, npm install, verbose loggers) blocked on its next stderr write while phantom waited for stdout EOF forever. Fix: Promise.all over both streams via a new readStreamWithCap helper. 2. No subprocess timeout: Bun.spawn ran with no kill path. A hung handler froze the agent turn indefinitely with no recovery. Fix: drainProcessWithLimits schedules SIGTERM at HANDLER_TIMEOUT_MS (default 60s, env-overridable via PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS) and escalates to SIGKILL after a 2s grace. Timeouts report partial stderr so the agent has actionable signal. 3. No stdout/stderr size cap: new Response(stream).text() slurped unbounded output, risking OOM of the 2GB container. Fix: readStreamWithCap enforces a 1MB cap by default (PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES), appends a clear truncation notice, and continues draining-to-void so the child never blocks on a full pipe buffer. 4. DynamicToolRegistry.registerAllOnServer had no per-tool guard. One tool with a bad inputSchema would throw during the loop and silently skip every subsequent tool on every agent query (MCP factory pattern recreates servers per query). Fix: per-tool try/catch, warn with tool name, continue. Broken tools are not auto-unregistered; the operator decides. buildSafeEnv and the --env-file= pattern in executeScriptHandler are unchanged, preserving the subprocess environment isolation boundary from SECURITY.md. Tests spawn real subprocesses and include a 200KB-stderr regression test that would hang under the old sequential-drain code. Env-var cleanup in the new tests uses Reflect.deleteProperty(process.env, ...) rather than `delete` (Biome noDelete) or `= undefined` (coerces to the string "undefined" on process.env and does not actually unset the key). This matches the pattern acknowledged as correct by the maintainer in #5.
1 parent 58571e1 commit 17f7ed0

4 files changed

Lines changed: 308 additions & 15 deletions

File tree

src/mcp/__tests__/dynamic-handlers.test.ts

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,115 @@ describe("executeDynamicHandler", () => {
159159
}
160160
});
161161

162+
test("concurrent drain: large stderr before stdout does not deadlock", async () => {
163+
// Regression test for pipe-buffer deadlock. Write 200 KB to stderr
164+
// (well beyond the 64 KB pipe buffer), then print to stdout and exit.
165+
// Under the old sequential-drain code this hangs forever.
166+
const tool: DynamicToolDef = {
167+
name: "test_pipe_deadlock",
168+
description: "test",
169+
inputSchema: {},
170+
handlerType: "shell",
171+
handlerCode: 'head -c 204800 /dev/urandom | base64 >&2; echo "done"',
172+
};
173+
174+
const start = Date.now();
175+
const result = await executeDynamicHandler(tool, {});
176+
const elapsed = Date.now() - start;
177+
178+
expect(result.isError).toBeFalsy();
179+
const text = (result.content[0] as { type: string; text: string }).text;
180+
expect(text).toBe("done");
181+
// Should complete in well under 5 seconds. A deadlock would hit the
182+
// test runner timeout instead.
183+
expect(elapsed).toBeLessThan(5000);
184+
});
185+
186+
test("timeout kills a hung handler", async () => {
187+
const origTimeout = process.env.PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS;
188+
process.env.PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS = "500";
189+
try {
190+
const tool: DynamicToolDef = {
191+
name: "test_hang",
192+
description: "test",
193+
inputSchema: {},
194+
handlerType: "shell",
195+
handlerCode: "sleep 10",
196+
};
197+
198+
const start = Date.now();
199+
const result = await executeDynamicHandler(tool, {});
200+
const elapsed = Date.now() - start;
201+
202+
expect(result.isError).toBe(true);
203+
const text = (result.content[0] as { type: string; text: string }).text;
204+
expect(text).toContain("timed out");
205+
// 500ms timeout + 2s grace + slack. Should be well under 10s (the sleep).
206+
expect(elapsed).toBeLessThan(5000);
207+
} finally {
208+
if (origTimeout !== undefined) {
209+
process.env.PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS = origTimeout;
210+
} else {
211+
// `= undefined` would coerce to the string "undefined" on process.env;
212+
// Reflect.deleteProperty actually removes the key so later tests see it as unset.
213+
// (Matches the pattern acknowledged by the maintainer in #5.)
214+
Reflect.deleteProperty(process.env, "PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS");
215+
}
216+
}
217+
});
218+
219+
test("output cap truncates runaway stdout", async () => {
220+
const origCap = process.env.PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES;
221+
process.env.PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES = "10000";
222+
try {
223+
const tool: DynamicToolDef = {
224+
name: "test_runaway",
225+
description: "test",
226+
inputSchema: {},
227+
handlerType: "shell",
228+
// Emit ~270 KB of base64 to stdout, far exceeding the 10 KB cap.
229+
handlerCode: "head -c 200000 /dev/urandom | base64",
230+
};
231+
232+
const result = await executeDynamicHandler(tool, {});
233+
234+
expect(result.isError).toBeFalsy();
235+
const text = (result.content[0] as { type: string; text: string }).text;
236+
expect(text).toContain("Output truncated");
237+
// Captured bytes ≤ cap + truncation notice (~50 chars). Trim happens
238+
// after the truncation marker was appended, so just assert it's
239+
// bounded well below the full output size.
240+
expect(text.length).toBeLessThan(11000);
241+
} finally {
242+
if (origCap !== undefined) {
243+
process.env.PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES = origCap;
244+
} else {
245+
// See note on PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS above.
246+
Reflect.deleteProperty(process.env, "PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES");
247+
}
248+
}
249+
});
250+
251+
test("non-zero exit surfaces stderr and exit code in error message", async () => {
252+
// The concurrency claim (stdout and stderr drained in parallel) is
253+
// already proven by the pipe-deadlock regression test above. This test
254+
// guards the non-zero exit path: the error message must include the
255+
// captured stderr and the exit code so the agent has actionable signal.
256+
const tool: DynamicToolDef = {
257+
name: "test_mixed_streams",
258+
description: "test",
259+
inputSchema: {},
260+
handlerType: "shell",
261+
handlerCode: 'echo "stdout-marker"; echo "stderr-marker" >&2; exit 2',
262+
};
263+
264+
const result = await executeDynamicHandler(tool, {});
265+
expect(result.isError).toBe(true);
266+
const text = (result.content[0] as { type: string; text: string }).text;
267+
expect(text).toContain("stderr-marker");
268+
expect(text).toContain("exit 2");
269+
});
270+
162271
test("script handler receives TOOL_INPUT via env", async () => {
163272
const tmpFile = "/tmp/phantom-test-tool-input.ts";
164273
await Bun.write(tmpFile, "console.log(process.env.TOOL_INPUT)");

src/mcp/__tests__/dynamic-tools.test.ts

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,57 @@ describe("DynamicToolRegistry", () => {
215215
freshDb.close();
216216
});
217217

218+
test("registerAllOnServer tolerates a failing tool registration", () => {
219+
const freshDb = new Database(":memory:");
220+
runMigrations(freshDb);
221+
222+
const registry = new DynamicToolRegistry(freshDb);
223+
registry.register({
224+
name: "good_tool",
225+
description: "Good",
226+
input_schema: {},
227+
handler_type: "shell",
228+
handler_code: "echo good",
229+
});
230+
registry.register({
231+
name: "bad_tool",
232+
description: "Bad",
233+
input_schema: {},
234+
handler_type: "shell",
235+
handler_code: "echo bad",
236+
});
237+
238+
const attempted: string[] = [];
239+
const succeeded: string[] = [];
240+
const mockServer = {
241+
registerTool(name: string, _meta: unknown, _handler: unknown) {
242+
attempted.push(name);
243+
if (name === "bad_tool") {
244+
throw new Error("simulated schema failure");
245+
}
246+
succeeded.push(name);
247+
},
248+
};
249+
250+
const origWarn = console.warn;
251+
const warnings: string[] = [];
252+
console.warn = (...args: unknown[]) => {
253+
warnings.push(args.map((a) => String(a)).join(" "));
254+
};
255+
256+
try {
257+
expect(() => registry.registerAllOnServer(mockServer as never)).not.toThrow();
258+
} finally {
259+
console.warn = origWarn;
260+
}
261+
262+
expect(attempted).toEqual(["good_tool", "bad_tool"]);
263+
expect(succeeded).toEqual(["good_tool"]);
264+
expect(warnings.some((w) => w.includes("bad_tool") && w.includes("simulated schema failure"))).toBe(true);
265+
266+
freshDb.close();
267+
});
268+
218269
test("upserts on duplicate name", () => {
219270
const freshDb = new Database(":memory:");
220271
runMigrations(freshDb);

src/mcp/dynamic-handlers.ts

Lines changed: 142 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,20 @@
11
import type { CallToolResult } from "@modelcontextprotocol/sdk/types.js";
2+
import type { Subprocess } from "bun";
23
import type { DynamicToolDef } from "./dynamic-tools.ts";
34

5+
const DEFAULT_HANDLER_TIMEOUT_MS = 60_000;
6+
const DEFAULT_MAX_OUTPUT_BYTES = 1_000_000;
7+
const HANDLER_GRACE_MS = 2_000;
8+
9+
type HandlerLimits = { timeoutMs: number; maxOutputBytes: number };
10+
11+
function getHandlerLimits(): HandlerLimits {
12+
return {
13+
timeoutMs: Number(process.env.PHANTOM_DYNAMIC_HANDLER_TIMEOUT_MS ?? DEFAULT_HANDLER_TIMEOUT_MS),
14+
maxOutputBytes: Number(process.env.PHANTOM_DYNAMIC_HANDLER_MAX_OUTPUT_BYTES ?? DEFAULT_MAX_OUTPUT_BYTES),
15+
};
16+
}
17+
418
/**
519
* Safe environment for subprocess execution.
620
* Only expose what dynamic tools legitimately need.
@@ -16,16 +30,120 @@ export function buildSafeEnv(input: Record<string, unknown>): Record<string, str
1630
};
1731
}
1832

33+
/**
34+
* Drain a ReadableStream with a hard byte cap.
35+
*
36+
* Critically, we keep reading (and dropping) chunks past the cap so the child
37+
* process never blocks on a full 64 KB pipe buffer. Cancelling the reader would
38+
* be simpler but risks leaving the child stuck on its next write.
39+
*/
40+
async function readStreamWithCap(
41+
stream: ReadableStream<Uint8Array>,
42+
maxBytes: number,
43+
): Promise<{ text: string; truncated: boolean }> {
44+
const reader = stream.getReader();
45+
const chunks: Uint8Array[] = [];
46+
let totalBytes = 0;
47+
let truncated = false;
48+
try {
49+
while (true) {
50+
const { done, value } = await reader.read();
51+
if (done) break;
52+
if (truncated) continue;
53+
if (totalBytes + value.byteLength > maxBytes) {
54+
const remaining = maxBytes - totalBytes;
55+
if (remaining > 0) chunks.push(value.subarray(0, remaining));
56+
totalBytes = maxBytes;
57+
truncated = true;
58+
} else {
59+
chunks.push(value);
60+
totalBytes += value.byteLength;
61+
}
62+
}
63+
} finally {
64+
reader.releaseLock();
65+
}
66+
67+
const combined = new Uint8Array(totalBytes);
68+
let offset = 0;
69+
for (const chunk of chunks) {
70+
combined.set(chunk, offset);
71+
offset += chunk.byteLength;
72+
}
73+
const text = new TextDecoder().decode(combined);
74+
return {
75+
text: truncated ? `${text}\n\n_(Output truncated at ${maxBytes} bytes.)_` : text,
76+
truncated,
77+
};
78+
}
79+
80+
type DrainResult = {
81+
stdout: string;
82+
stderr: string;
83+
exitCode: number | null;
84+
timedOut: boolean;
85+
};
86+
87+
/**
88+
* Run a spawned subprocess with concurrent pipe drains, a hard timeout, and
89+
* stdout/stderr size caps. Concurrent drains prevent the classic sequential
90+
* drain deadlock (child blocks on stderr write while parent waits for stdout
91+
* EOF). Timeout fires SIGTERM, escalates to SIGKILL after a grace period.
92+
*/
93+
async function drainProcessWithLimits(
94+
proc: Subprocess<"pipe" | "ignore" | "inherit", "pipe", "pipe">,
95+
limits: HandlerLimits,
96+
): Promise<DrainResult> {
97+
let timedOut = false;
98+
const termTimer = setTimeout(() => {
99+
timedOut = true;
100+
proc.kill("SIGTERM");
101+
}, limits.timeoutMs);
102+
const killTimer = setTimeout(() => {
103+
proc.kill("SIGKILL");
104+
}, limits.timeoutMs + HANDLER_GRACE_MS);
105+
106+
try {
107+
const [stdoutResult, stderrResult] = await Promise.all([
108+
readStreamWithCap(proc.stdout, limits.maxOutputBytes),
109+
readStreamWithCap(proc.stderr, limits.maxOutputBytes),
110+
]);
111+
await proc.exited;
112+
return {
113+
stdout: stdoutResult.text,
114+
stderr: stderrResult.text,
115+
exitCode: proc.exitCode,
116+
timedOut,
117+
};
118+
} finally {
119+
clearTimeout(termTimer);
120+
clearTimeout(killTimer);
121+
}
122+
}
123+
124+
function timeoutResult(toolName: string, timeoutMs: number, partial: string): CallToolResult {
125+
const snippet = partial.slice(0, 500);
126+
return {
127+
content: [
128+
{
129+
type: "text",
130+
text: `Tool '${toolName}' timed out after ${timeoutMs}ms and was killed. Partial output: ${snippet}`,
131+
},
132+
],
133+
isError: true,
134+
};
135+
}
136+
19137
export async function executeDynamicHandler(
20138
tool: DynamicToolDef,
21139
input: Record<string, unknown>,
22140
): Promise<CallToolResult> {
23141
try {
24142
switch (tool.handlerType) {
25143
case "script":
26-
return executeScriptHandler(tool.handlerPath ?? "", input);
144+
return executeScriptHandler(tool, input);
27145
case "shell":
28-
return executeShellHandler(tool.handlerCode ?? "", input);
146+
return executeShellHandler(tool, input);
29147
default:
30148
return {
31149
content: [
@@ -46,7 +164,8 @@ export async function executeDynamicHandler(
46164
}
47165
}
48166

49-
async function executeScriptHandler(path: string, input: Record<string, unknown>): Promise<CallToolResult> {
167+
async function executeScriptHandler(tool: DynamicToolDef, input: Record<string, unknown>): Promise<CallToolResult> {
168+
const path = tool.handlerPath ?? "";
50169
const { existsSync } = await import("node:fs");
51170
if (!existsSync(path)) {
52171
return {
@@ -55,6 +174,8 @@ async function executeScriptHandler(path: string, input: Record<string, unknown>
55174
};
56175
}
57176

177+
const limits = getHandlerLimits();
178+
58179
// --env-file= prevents bun from auto-loading .env/.env.local files,
59180
// which would leak secrets into the subprocess despite buildSafeEnv.
60181
const proc = Bun.spawn(["bun", "--env-file=", "run", path], {
@@ -67,34 +188,41 @@ async function executeScriptHandler(path: string, input: Record<string, unknown>
67188
proc.stdin.write(JSON.stringify(input));
68189
proc.stdin.end();
69190

70-
const stdout = await new Response(proc.stdout).text();
71-
const stderr = await new Response(proc.stderr).text();
72-
await proc.exited;
191+
const { stdout, stderr, exitCode, timedOut } = await drainProcessWithLimits(proc, limits);
73192

74-
if (proc.exitCode !== 0) {
193+
if (timedOut) {
194+
return timeoutResult(tool.name, limits.timeoutMs, stderr || stdout);
195+
}
196+
197+
if (exitCode !== 0) {
75198
return {
76-
content: [{ type: "text", text: `Script error (exit ${proc.exitCode}): ${stderr || stdout}` }],
199+
content: [{ type: "text", text: `Script error (exit ${exitCode}): ${stderr || stdout}` }],
77200
isError: true,
78201
};
79202
}
80203

81204
return { content: [{ type: "text", text: stdout.trim() }] };
82205
}
83206

84-
async function executeShellHandler(command: string, input: Record<string, unknown>): Promise<CallToolResult> {
207+
async function executeShellHandler(tool: DynamicToolDef, input: Record<string, unknown>): Promise<CallToolResult> {
208+
const command = tool.handlerCode ?? "";
209+
const limits = getHandlerLimits();
210+
85211
const proc = Bun.spawn(["bash", "-c", command], {
86212
stdout: "pipe",
87213
stderr: "pipe",
88214
env: buildSafeEnv(input),
89215
});
90216

91-
const stdout = await new Response(proc.stdout).text();
92-
const stderr = await new Response(proc.stderr).text();
93-
await proc.exited;
217+
const { stdout, stderr, exitCode, timedOut } = await drainProcessWithLimits(proc, limits);
218+
219+
if (timedOut) {
220+
return timeoutResult(tool.name, limits.timeoutMs, stderr || stdout);
221+
}
94222

95-
if (proc.exitCode !== 0) {
223+
if (exitCode !== 0) {
96224
return {
97-
content: [{ type: "text", text: `Shell error (exit ${proc.exitCode}): ${stderr || stdout}` }],
225+
content: [{ type: "text", text: `Shell error (exit ${exitCode}): ${stderr || stdout}` }],
98226
isError: true,
99227
};
100228
}

0 commit comments

Comments
 (0)