Skip to content

feat(agent): run MCP tools as scripts#2771

Draft
Twixes wants to merge 3 commits into
mainfrom
feat/mcp-tools-as-scripts
Draft

feat(agent): run MCP tools as scripts#2771
Twixes wants to merge 3 commits into
mainfrom
feat/mcp-tools-as-scripts

Conversation

@Twixes

@Twixes Twixes commented Jun 19, 2026

Copy link
Copy Markdown
Member

Problem

The agent calls connected MCP tools one at a time. Anything that fans out (read 100 issues, comment on the stale ones) becomes 100 sequential round-trips, each paying full tool-call overhead and burning context. There was no way to express "loop over these results and act on each" in a single step.

Changes

Adds a capability that lets the agent write one JavaScript script that calls connected MCP tools as async functions:

const issues = await tools.linear.listIssues({ teamId })
for (const i of issues.filter((x) => x.stale)) {
  await tools.linear.createComment({ issueId: i.id, body: "bump" })
}
return { bumped: issues.length }

Two local tools sit alongside the existing signed-git tools:

  • list_mcp_tools returns .d.ts-style signatures for every tools.<server>.<tool>(args), generated from each tool's MCP input schema.
  • run_mcp_script takes { script, timeoutMs? } and returns { result, logs, error? }.

New module packages/agent/src/mcp-scripting/:

  • client-pool.ts opens and caches MCP Clients from the session config map, with listTools / callTool and a scriptableServerNames helper.
  • proxy.ts builds the lazy tools.<server>.<tool>(args) proxy and rejects on tool isError.
  • runner.ts runs the script in a constrained node:vm sandbox with a wall-clock timeout and captured console.
  • signatures.ts renders JSON Schema into TS-style signatures.
  • tools.ts holds the two local-tool definitions.

Wiring: local-tools/registry.ts carries the scriptable MCP server map on LocalToolCtx, local-tools/index.ts registers the tools, and both the Claude and Codex adapters thread the external MCP server map through.

Build, not adopt

Evaluated Cloudflare Code Mode, @utcp/code-mode / code-mode-mcp, and mcpac. Each runs as a separate process or MCP server with its own config and credentials, and several layer in a second abstraction (UTCP). None reuse the in-process McpServerConfig map with already-resolved credentials, which is the entire integration here. Building it is a thin layer over the MCP SDK Client we already depend on, with no new runtime dependencies (only @modelcontextprotocol/sdk and zod, both already present). Full rationale in the module README.

Auth and sandbox

No new auth path. The proxy dials the exact McpServerConfig map the agent's own MCP tools use, so credentials are inherited verbatim (stdio env, http/sse headers). In-process sdk servers are excluded since they have no dialable transport. A script can only reach servers the session was already authorized for; nothing the model writes can set or escalate credentials.

The node:vm sandbox runs with an explicit-allowlist global set (tools, captured console, pure helpers) and denies require, import, process, global / globalThis, Buffer, fetch, and filesystem access. codeGeneration is disabled so eval / new Function throw. This is deliberately not a hard boundary against adversarial in-process code, which is documented as a known limitation: the script author is the same agent that already calls these tools, and cloud runs sandbox the whole agent for real isolation.

How did you test this?

Automated tests only (authored by an agent):

  • 27 new tests across mcp-scripting.test.ts (20) and client-pool.integration.test.ts (7), the latter running end-to-end against a real stdio MCP server fixture. They cover proxy generation, script execution, looping/batching, timeout enforcement, error surfacing (script throw and tool isError), blocked sandbox escapes (require / process / global / Buffer / fetch / new Function), signature rendering, tool gating, stdio env propagation, and reporting of unreachable servers.
  • Full packages/agent suite: 761 passing across 58 files.
  • Typecheck and build both clean.

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

🤖 Agent context

Autonomy: Human-driven (agent-assisted). Michael Matloka directed the design and scope; an agent implemented and tested it. The capability is gated behind explicit local tools and reuses existing session credentials, so it adds no new auth surface.

@Twixes Twixes self-assigned this Jun 19, 2026
Lets the agent write one JS script that calls connected MCP tools as
async functions instead of one tool-call at a time. Adds:

- McpClientPool: opens MCP clients from the session's McpServerConfig
  map, inheriting auth (stdio env, http/sse headers) verbatim
- buildToolsProxy: lazy tools.<server>.<tool>(args) proxy
- runScript: constrained node:vm sandbox with wall-clock timeout,
  captured console, and no ambient fs/net/process authority
- renderToolsetSignatures: JSON Schema to TS-style signatures
@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/agent/src/mcp-scripting/signatures.ts:104-106
`*/` sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains `*/` (e.g. `"Computes a*/b"`) will produce output like `/** Computes a*/b */`, which closes the JSDoc at the first `*/` and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.

```suggestion
function oneLine(text: string): string {
  return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /");
}
```

### Issue 2 of 2
packages/agent/src/mcp-scripting/runner.ts:96-103
**Double-budget timing: total wall-clock time up to 2× `timeoutMs`**

The `run` IIFE calls `script.runInContext(context, { timeout: timeoutMs })` synchronously. This synchronous phase completes — and blocks — entirely *before* `withTimeout(run, timeoutMs)` starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at `timeoutMs`) followed by async tool calls (capped at another `timeoutMs`) can run for nearly 2× the configured budget. At the `MAX_TIMEOUT_MS` of 120 s, that ceiling is 240 s. The `RunScriptOptions` docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.

Reviews (1): Last reviewed commit: "feat(agent): expose run_mcp_script / lis..." | Re-trigger Greptile

Comment on lines +104 to +106
function oneLine(text: string): string {
return text.replace(/\s+/g, " ").trim();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 */ sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains */ (e.g. "Computes a*/b") will produce output like /** Computes a*/b */, which closes the JSDoc at the first */ and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.

Suggested change
function oneLine(text: string): string {
return text.replace(/\s+/g, " ").trim();
}
function oneLine(text: string): string {
return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /");
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/mcp-scripting/signatures.ts
Line: 104-106

Comment:
`*/` sequence in tool description breaks the JSDoc comment early. Any MCP tool whose description contains `*/` (e.g. `"Computes a*/b"`) will produce output like `/** Computes a*/b */`, which closes the JSDoc at the first `*/` and leaves trailing garbage text. The model then sees malformed TypeScript that could silently misrepresent the available signatures.

```suggestion
function oneLine(text: string): string {
  return text.replace(/\s+/g, " ").trim().replace(/\*\//g, "* /");
}
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in f3a43fa. oneLine now rewrites */ to * / so a tool description can't close the generated JSDoc block early. Added a test asserting the rendered line stays well-formed.

Comment on lines +96 to +103
const run = (async (): Promise<unknown> => {
// `timeout` here guards synchronous spin; async work is bounded by the race.
const completion = script.runInContext(context, { timeout: timeoutMs });
return await completion;
})();

try {
const result = await withTimeout(run, timeoutMs);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double-budget timing: total wall-clock time up to 2× timeoutMs

The run IIFE calls script.runInContext(context, { timeout: timeoutMs }) synchronously. This synchronous phase completes — and blocks — entirely before withTimeout(run, timeoutMs) starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at timeoutMs) followed by async tool calls (capped at another timeoutMs) can run for nearly 2× the configured budget. At the MAX_TIMEOUT_MS of 120 s, that ceiling is 240 s. The RunScriptOptions docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/agent/src/mcp-scripting/runner.ts
Line: 96-103

Comment:
**Double-budget timing: total wall-clock time up to 2× `timeoutMs`**

The `run` IIFE calls `script.runInContext(context, { timeout: timeoutMs })` synchronously. This synchronous phase completes — and blocks — entirely *before* `withTimeout(run, timeoutMs)` starts its wall-clock timer. So a script with a synchronous CPU-bound loop (capped at `timeoutMs`) followed by async tool calls (capped at another `timeoutMs`) can run for nearly 2× the configured budget. At the `MAX_TIMEOUT_MS` of 120 s, that ceiling is 240 s. The `RunScriptOptions` docs and the tool's parameter description both promise a single "wall-clock budget," but the actual guarantee is two independent, sequential budgets.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f3a43fa. runScript now derives a single absolute deadline up front; the synchronous runInContext timeout is capped to the time remaining and the async race keys off the same deadline, so total wall-clock time stays within one timeoutMs budget instead of two sequential ones. Added a sync-then-async test that asserts the combined run trips the single deadline.

Registers the scripting tools in the local-tools registry and threads
the session's external MCP server map into LocalToolCtx from both the
Claude (claude-agent.ts) and Codex (codex-agent.ts) adapters, so a
script dials the same servers with inherited auth. Tools self-disable
when no external MCP servers are connected.
@Twixes Twixes force-pushed the feat/mcp-tools-as-scripts branch from 4c0a6f6 to 49f7c85 Compare June 19, 2026 07:38
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit f3a43fa.

- runScript now enforces timeoutMs as one shared wall-clock deadline across
  the synchronous and async phases (previously up to 2x the budget)
- signature rendering neutralizes */ in tool descriptions so a description
  can't close the generated JSDoc block early
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant