feat(agent): run MCP tools as scripts#2771
Conversation
Lets the agent write one JS script that calls connected MCP tools as async functions instead of one tool-call at a time. Adds: - McpClientPool: opens MCP clients from the session's McpServerConfig map, inheriting auth (stdio env, http/sse headers) verbatim - buildToolsProxy: lazy tools.<server>.<tool>(args) proxy - runScript: constrained node:vm sandbox with wall-clock timeout, captured console, and no ambient fs/net/process authority - renderToolsetSignatures: JSON Schema to TS-style signatures
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/agent/src/mcp-scripting/signatures.ts:104-106
`*/` sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains `*/` (e.g. `"Computes a*/b"`) will produce output like `/** Computes a*/b */`, which closes the JSDoc at the first `*/` and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.
```suggestion
function oneLine(text: string): string {
return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /");
}
```
### Issue 2 of 2
packages/agent/src/mcp-scripting/runner.ts:96-103
**Double-budget timing: total wall-clock time up to 2× `timeoutMs`**
The `run` IIFE calls `script.runInContext(context, { timeout: timeoutMs })` synchronously. This synchronous phase completes — and blocks — entirely *before* `withTimeout(run, timeoutMs)` starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at `timeoutMs`) followed by async tool calls (capped at another `timeoutMs`) can run for nearly 2× the configured budget. At the `MAX_TIMEOUT_MS` of 120 s, that ceiling is 240 s. The `RunScriptOptions` docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.
Reviews (1): Last reviewed commit: "feat(agent): expose run_mcp_script / lis..." | Re-trigger Greptile |
| function oneLine(text: string): string { | ||
| return text.replace(/\s+/g, " ").trim(); | ||
| } |
There was a problem hiding this comment.
*/ sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains */ (e.g. "Computes a*/b") will produce output like /** Computes a*/b */, which closes the JSDoc at the first */ and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.
| function oneLine(text: string): string { | |
| return text.replace(/\s+/g, " ").trim(); | |
| } | |
| function oneLine(text: string): string { | |
| return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /"); | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/mcp-scripting/signatures.ts
Line: 104-106
Comment:
`*/` sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains `*/` (e.g. `"Computes a*/b"`) will produce output like `/** Computes a*/b */`, which closes the JSDoc at the first `*/` and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.
```suggestion
function oneLine(text: string): string {
return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /");
}
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Good catch — fixed in f3a43fa. oneLine now rewrites */ to * / so a tool description can't close the generated JSDoc block early. Added a test asserting the rendered line stays well-formed.
| const run = (async (): Promise<unknown> => { | ||
| // `timeout` here guards synchronous spin; async work is bounded by the race. | ||
| const completion = script.runInContext(context, { timeout: timeoutMs }); | ||
| return await completion; | ||
| })(); | ||
|
|
||
| try { | ||
| const result = await withTimeout(run, timeoutMs); |
There was a problem hiding this comment.
Double-budget timing: total wall-clock time up to 2×
timeoutMs
The run IIFE calls script.runInContext(context, { timeout: timeoutMs }) synchronously. This synchronous phase completes — and blocks — entirely before withTimeout(run, timeoutMs) starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at timeoutMs) followed by async tool calls (capped at another timeoutMs) can run for nearly 2× the configured budget. At the MAX_TIMEOUT_MS of 120 s, that ceiling is 240 s. The RunScriptOptions docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/mcp-scripting/runner.ts
Line: 96-103
Comment:
**Double-budget timing: total wall-clock time up to 2× `timeoutMs`**
The `run` IIFE calls `script.runInContext(context, { timeout: timeoutMs })` synchronously. This synchronous phase completes — and blocks — entirely *before* `withTimeout(run, timeoutMs)` starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at `timeoutMs`) followed by async tool calls (capped at another `timeoutMs`) can run for nearly 2× the configured budget. At the `MAX_TIMEOUT_MS` of 120 s, that ceiling is 240 s. The `RunScriptOptions` docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in f3a43fa. runScript now derives a single absolute deadline up front; the synchronous runInContext timeout is capped to the time remaining and the async race keys off the same deadline, so total wall-clock time stays within one timeoutMs budget instead of two sequential ones. Added a sync-then-async test that asserts the combined run trips the single deadline.
Registers the scripting tools in the local-tools registry and threads the session's external MCP server map into LocalToolCtx from both the Claude (claude-agent.ts) and Codex (codex-agent.ts) adapters, so a script dials the same servers with inherited auth. Tools self-disable when no external MCP servers are connected.
4c0a6f6 to
49f7c85
Compare
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
- runScript now enforces timeoutMs as one shared wall-clock deadline across the synchronous and async phases (previously up to 2x the budget) - signature rendering neutralizes */ in tool descriptions so a description can't close the generated JSDoc block early
Problem
The agent calls connected MCP tools one at a time. Anything that fans out (read 100 issues, comment on the stale ones) becomes 100 sequential round-trips, each paying full tool-call overhead and burning context. There was no way to express "loop over these results and act on each" in a single step.
Changes
Adds a capability that lets the agent write one JavaScript script that calls connected MCP tools as async functions:
Two local tools sit alongside the existing signed-git tools:
list_mcp_toolsreturns.d.ts-style signatures for everytools.<server>.<tool>(args), generated from each tool's MCP input schema.run_mcp_scripttakes{ script, timeoutMs? }and returns{ result, logs, error? }.New module
packages/agent/src/mcp-scripting/:client-pool.tsopens and caches MCPClients from the session config map, withlistTools/callTooland ascriptableServerNameshelper.proxy.tsbuilds the lazytools.<server>.<tool>(args)proxy and rejects on toolisError.runner.tsruns the script in a constrainednode:vmsandbox with a wall-clock timeout and captured console.signatures.tsrenders JSON Schema into TS-style signatures.tools.tsholds the two local-tool definitions.Wiring:
local-tools/registry.tscarries the scriptable MCP server map onLocalToolCtx,local-tools/index.tsregisters the tools, and both the Claude and Codex adapters thread the external MCP server map through.Build, not adopt
Evaluated Cloudflare Code Mode,
@utcp/code-mode/code-mode-mcp, andmcpac. Each runs as a separate process or MCP server with its own config and credentials, and several layer in a second abstraction (UTCP). None reuse the in-processMcpServerConfigmap with already-resolved credentials, which is the entire integration here. Building it is a thin layer over the MCP SDKClientwe already depend on, with no new runtime dependencies (only@modelcontextprotocol/sdkandzod, both already present). Full rationale in the module README.Auth and sandbox
No new auth path. The proxy dials the exact
McpServerConfigmap the agent's own MCP tools use, so credentials are inherited verbatim (stdioenv, http/sseheaders). In-processsdkservers are excluded since they have no dialable transport. A script can only reach servers the session was already authorized for; nothing the model writes can set or escalate credentials.The
node:vmsandbox runs with an explicit-allowlist global set (tools, capturedconsole, pure helpers) and deniesrequire,import,process,global/globalThis,Buffer,fetch, and filesystem access.codeGenerationis disabled soeval/new Functionthrow. This is deliberately not a hard boundary against adversarial in-process code, which is documented as a known limitation: the script author is the same agent that already calls these tools, and cloud runs sandbox the whole agent for real isolation.How did you test this?
Automated tests only (authored by an agent):
mcp-scripting.test.ts(20) andclient-pool.integration.test.ts(7), the latter running end-to-end against a real stdio MCP server fixture. They cover proxy generation, script execution, looping/batching, timeout enforcement, error surfacing (script throw and toolisError), blocked sandbox escapes (require/process/global/Buffer/fetch/new Function), signature rendering, tool gating, stdioenvpropagation, and reporting of unreachable servers.packages/agentsuite: 761 passing across 58 files.Automatic notifications
🤖 Agent context
Autonomy: Human-driven (agent-assisted). Michael Matloka directed the design and scope; an agent implemented and tested it. The capability is gated behind explicit local tools and reuses existing session credentials, so it adds no new auth surface.