feat(chat,mcp): Cloudflare Code Mode for the AI chat and MCP server#111
feat(chat,mcp): Cloudflare Code Mode for the AI chat and MCP server#111Makisuo wants to merge 10 commits into
Conversation
Instead of handing the model 51 tools one at a time, it can now write a JS snippet against a generated typed `maple.*` API that runs in a Cloudflare Dynamic Worker isolate (network blocked); each `maple.<tool>(input)` RPCs back to the existing tools. Multi-step investigations collapse into one round-trip. - packages/codemode: source-only shared package (pure root + ./sandbox subpath that imports cloudflare:workers). JSON-schema→TS API gen, the sandbox harness (splices user JS into an async IIFE module, no eval), proposed_batch formatting, and runCodeInSandbox + MapleSupervisor (RpcTarget). - chat-flue: a `run_code` Flue tool, injected only when MAPLE_CODE_MODE=1 and the LOADER binding is present (hybrid — the 51 direct tools stay). Dispatch reuses the approval-gated tool execs, so mutating maple.* calls become proposals, collected into a proposed_batch the web renders as one approval card each. - apps/api MCP: a `run_code` tool dispatching to registry handlers under the captured request runtime (FiberSet.makeRuntimePromise + Effect.scoped), preserving org scoping. Mutating tools are blocked inside code. - Deploy: WorkerLoader() binding added to both alchemy.run.ts, gated on the flag. Worker Loader is a Cloudflare beta, so the binding only deploys when MAPLE_CODE_MODE is set and the agent no-ops Code Mode without it (direct tools only) — default behavior is unchanged. Verified via unit tests (the harness runs end-to-end in Node), the Flue build, prompt rendering, and a 24-package typecheck; live isolate execution needs a deployed stage with beta access. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Important
Inside the MCP sandbox, the hardcoded MUTATING_TOOL_NAMES set is the only thing standing between a model-written snippet and an ungated mutation — and the test suite never enforces that every mutating tool in the registry is in that set. The current set is complete, but the design fails open: a future mutating tool added to the registry but forgotten in the set will execute its real side effect inside run_code. Worth closing before this ships behind the flag.
Reviewed changes — initial review of Cloudflare Code Mode: a sandboxed run_code tool for both the MCP server and the AI chat that lets the model orchestrate existing Maple tools from a JS snippet running in a network-isolated Worker Loader isolate.
- New
@maple/codemodepackage — pure root barrel (buildApiDeclarationJSON-schema→TS,buildHarnessModule,formatRunResult/formatRunOutput) plus a./sandboxsubpath isolatingcloudflare:workers(runCodeInSandbox,MapleSupervisor extends RpcTarget). The harness splices model code into an async IIFE — no runtimeeval. - MCP
run_code(apps/api/src/mcp/tools/run-code.ts) — dispatches read-only tool handlers under the captured request runtime (FiberSet.makeRuntimePromise+Effect.scoped), preserving org scoping; mutating tools blocked viaMUTATING_TOOL_NAMES. Always registered, runtime-gated onMAPLE_CODE_MODE=1+ theLOADERbinding. - Chat
run_code(apps/chat-flue/src/lib/codemode/) — reuses the approval-gatedexecute, so mutatingmaple.*calls return proposal markers collected into aproposed_batchenvelope; the web renders oneApprovalCardper proposal (parseToolProposalBatch, keyed${toolCallId}#${i}). - Prompt + deploy wiring —
formatCodeModeBlockappends the generatedmaple.*API to the system prompt;WorkerLoader()added to bothalchemy.run.tsfiles, gated onMAPLE_CODE_MODE. - Incidental
bun.lockchurn — anaccepts/send/negotiatordependency relock unrelated to Code Mode is folded into this PR.
⚠️ Mutating-tool gating inside the MCP sandbox fails open
The run_code MCP tool blocks mutating tools by checking a hand-maintained MUTATING_TOOL_NAMES set. That set is the single point of enforcement: any name not in it runs its real handler under the captured request runtime, side effects and all. The set is correct today, but nothing prevents it from silently falling behind the registry.
- The api-side test (
mutating.test.ts) only assertsMUTATING_TOOL_NAMES ⊆ registry. It never asserts the safety-critical inverse — that every mutating tool in the registry is in the set. - There are two independent copies of the set (
apps/api/src/mcp/tools/mutating.tsandapps/chat-flue/src/lib/approval.ts) kept aligned only by a "keep in sync" comment, with no shared source of truth and no equality test.
Before Code Mode, this set guarded only /chat/apply, which fails closed (an unknown tool is simply rejected). Code Mode makes the same set fail open on the MCP path, which is the behavioral change worth a guard.
Technical details
# Mutating-tool gating inside the MCP sandbox fails open
## Affected sites
- `apps/api/src/mcp/tools/run-code.ts:35` — `resolveCodeModeCall` blocks via `MUTATING_TOOL_NAMES.has(name)`; this is the only gate before `invoke(definition, decoded)` runs the real handler.
- `apps/api/src/mcp/tools/mutating.test.ts:6-11` — asserts only `set ⊆ registry`, not `registry-mutating ⊆ set`.
- `apps/api/src/mcp/tools/mutating.ts` and `apps/chat-flue/src/lib/approval.ts` — two hardcoded copies, "keep in sync" comment only.
## Required outcome
- A regression test (or a structural derivation) that guarantees every mutating tool registered in `registry.ts` is present in `MUTATING_TOOL_NAMES`, so adding a mutating tool without gating it fails CI rather than silently executing inside `run_code`.
- A single source of truth (or a cross-file equality test) for the api and chat-flue copies so they cannot drift.
## Suggested approach
- Cheapest: mark mutating-ness at registration (e.g. a `mutating: true` flag on `MapleToolDefinition`) and derive both the set and the gate from that, eliminating the hand-maintained list entirely.
- If the lists stay hand-maintained: add a test that fails when a registry tool whose name matches the mutating naming pattern (create_/update_/delete_/transition_/…) is absent from the set, and a test asserting the two copies are deep-equal.
## Open questions for the human
- Is the long-term plan to keep two copies, or collapse to one shared export from `@maple/domain`/a shared module both apps import?Claude Opus | 𝕏
CI Knip failure: packages/codemode imports `cloudflare:workers` (in the ./sandbox driver), which Knip reports as an unlisted `cloudflare` dependency (unlisted = error). Add the workspace to knip.json with the same `ignoreDependencies: ["cloudflare"]` the other Worker-importing workspaces use (apps/api, apps/chat-flue, lib/effect-cloudflare). Also stop exporting RUN_CODE_TOOL_NAME (used only internally) to clear the new unused-export warning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…atch cap - run_code self-recursion: a code-mode snippet could call maple.run_code(...) on the MCP path (run_code is in mapleToolDefinitions and isn't mutating), nesting a sandbox inside the running one. resolveCodeModeCall now refuses the RUN_CODE_TOOL_NAME (shared const), and buildCodeModeApi skips it too (defense in depth for the chat path). - Harness break-out: the model's snippet was spliced inline into the harness fetch scope, so code ending in `})();` could escape and tamper with the output cap / log capture. It now runs as its own module (user.js exporting an async fn of (maple, console)); a break-out just fails to parse → crashed run, and it can't reach __logs/__cap/env. Keeps the no-eval property. - Unbounded proposed_batch: formatRunResult now caps proposals at MAX_PROPOSALS_PER_RUN (25) with a dropped-count note, bounding the envelope (and the number of approval cards) regardless of the run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — re-review of ca1ebf63, which addresses three of the four findings from the prior pullfrog review (e88d9a83). The MCP-self-call, harness-breakout, and unbounded-envelope concerns are all resolved cleanly; no new issues introduced.
run_codeself-call blocked on the MCP path —resolveCodeModeCallnow short-circuits whenname === RUN_CODE_TOOL_NAME(a shared const promoted intopackages/codemode/src/types.ts), returning aBlockederror before the dispatch lookup, andbuildCodeModeApiskips it as defense-in-depth. A snippet callingmaple.run_code(...)can no longer nest a sandbox on either path.- Harness breakout closed via module isolation — the model's snippet is now its own module (
user.jsexportingasync (maple, console)) imported bymain.js, rather than spliced into thefetchscope. A})();break-out fails to parse, the worker fails to instantiate, andrunCodeInSandbox's try/catch surfaces it ascrashed: true— so the snippet can no longer reach__logs/__cap/envto subvert the output cap.buildHarnessModulewas renamed tobuildSandboxModules. - Proposal envelope bounded —
formatRunResultcaps proposals atMAX_PROPOSALS_PER_RUN(25) with a dropped-count note, bounding both the returned envelope size and the number of approval cards rendered by the web client. - Tests — added the self-call rejection test (
run-code.test.ts), the proposal-cap test (format.test.ts), and a break-out-isolation test plus a rewritten harness test that writes both modules to a temp dir and importsmain.jsto exercise the real composition.
The prior review's [!IMPORTANT] finding — the MCP mutating-tool gate fails open (no registry-mutating ⊆ MUTATING_TOOL_NAMES test, two un-synced copies of the set) — was not part of this commit and remains the one item worth closing before merge. It is unchanged code, so it is not re-anchored here.
Claude Opus | 𝕏
…d test Review (fails-open): the MCP run_code sandbox blocks mutations via a hand-maintained MUTATING_TOOL_NAMES set with no test guaranteeing every mutating registry tool is in it, kept in two drifting copies. run_code makes the set fail OPEN (an ungated name runs its real handler), so: - Move MUTATING_TOOL_NAMES into @maple/codemode as the single source of truth; apps/api/mutating.ts re-exports it and apps/chat-flue/approval.ts imports it, so the two copies can no longer drift. - Add a fail-closed regression test: every registry tool whose name looks mutating (create_/update_/delete_/… via shared MUTATING_TOOL_PREFIXES) must be in the set, so a new mutating tool can't ship ungated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed the mutating-tool gating fails-open finding in
On the open question — collapsed to one shared export ( 🤖 Addressed by Claude Code |
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — re-review of 1d2e1221, which closes the last open finding from the initial pullfrog review (e88d9a83): the MCP sandbox's mutating-tool gate that failed open. No new issues introduced.
- Single source of truth for
MUTATING_TOOL_NAMES— the set now lives once inpackages/codemode/src/mutating.ts; bothapps/api/src/mcp/tools/mutating.tsandapps/chat-flue/src/lib/approval.tsre-export it, eliminating the two hand-synced literal copies that could drift. - Fail-closed regression test —
apps/api/src/mcp/tools/mutating.test.tsnow asserts the safety-critical inverse (registry-mutating ⊆ set) vialooksMutating(name) && !MUTATING_TOOL_NAMES.has(name)resolving to[], so a conventionally-named mutating tool added to the registry but forgotten in the set fails CI instead of running ungated insiderun_code. MUTATING_TOOL_PREFIXES+looksMutatinghelper — a verb-prefix taxonomy (create_/update_/delete_/…/close_) added to@maple/codemodeto back the heuristic, with its own unit tests asserting every gated name matches a prefix and read-only verbs don't.
I confirmed chat.http.ts's /chat/apply path still resolves MUTATING_TOOL_NAMES through the re-export (fail-closed path intact), no read-only tool name collides with a mutating prefix (which would false-fail the new test), and the full @maple/codemode suite (29 tests) is green. All four findings from the original review are now resolved.
ℹ️ Fail-closed test is bound to the verb-prefix heuristic
The new CI guard catches an ungated mutating tool only if its name starts with a verb in MUTATING_TOOL_PREFIXES. A future mutating tool named with an out-of-list verb — e.g. purge_, apply_, submit_, merge_, clear_, bulk_ — would be both ungated in the run_code sandbox and invisible to the test, quietly reintroducing the fail-open behavior this commit closed.
This is awareness-only, not a blocker: the prior review explicitly offered the prefix-heuristic as an acceptable fix, the current set is complete, and the structural alternative (a mutating: true flag on MapleToolDefinition) is a larger change. Worth keeping in mind when adding tools with unconventional verbs.
Technical details
# Fail-closed mutating-gate test is heuristic-bound to MUTATING_TOOL_PREFIXES
## Affected sites
- `packages/codemode/src/mutating.ts:50-79` — `MUTATING_TOOL_PREFIXES` is the closed verb list `looksMutating` checks against.
- `apps/api/src/mcp/tools/mutating.test.ts:14-26` — the fail-closed test filters registry tools by `looksMutating(name)`, so any mutating tool whose name doesn't start with a listed prefix is never asserted against the set.
## Required outcome
- A new mutating registry tool cannot ship ungated inside `run_code` regardless of its name's verb prefix.
## Suggested approach (optional)
- Cheapest incremental: when adding a tool with a new mutating verb, also add the prefix here (the existing prefix-coverage test in `packages/codemode/src/mutating.test.ts` keeps the prefix list and set aligned).
- Structural (eliminates the heuristic entirely): mark mutating-ness at registration via a `mutating: true` flag on `MapleToolDefinition` and derive both the set and the `run_code` gate from it — this was the prior review's first-choice suggestion and removes the hand-maintained list.
## Open questions for the human
- Is keeping the prefix-heuristic acceptable as the long-term guard, or is the `mutating: true`-at-registration derivation worth a follow-up issue?Claude Opus | 𝕏
Review (awareness-only): the fail-closed test only catches an ungated mutating tool if its name starts with a verb in MUTATING_TOOL_PREFIXES. Broaden the list with the verbs called out (purge_/apply_/submit_/merge_/clear_/bulk_) plus other common mutating verbs (none collide with current read-only tools), shrinking the heuristic's blind spot. The structural fix (a `mutating: true` flag at tool registration) is left as a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Re: the ℹ️ heuristic-bound fail-closed test ( Hardened the heuristic by widening Open question — I kept the prefix heuristic for this PR (the prior review accepted it, the current set is complete, and the structural change is larger), and opened a follow-up for the structural fix: a 🤖 Addressed by Claude Code |
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — re-review of b613a346, which closes the lone informational finding from the prior pullfrog review (1d2e1221): the fail-closed mutating-gate test was heuristic-bound to MUTATING_TOOL_PREFIXES, so a future mutating tool with an out-of-list verb would slip past both the gate and the CI guard. No new issues introduced.
- Widened
MUTATING_TOOL_PREFIXES— added the exact verbs the prior review named as the gap (purge_,apply_,submit_,merge_,clear_,bulk_) plus a broader defensive set (send_,sync_,cancel_,revoke_,grant_,rotate_,import_,retry_,trigger_,dispatch_,move_,copy_,duplicate_,upsert_,toggle_,approve_,deny_,unassign_), shrinking the window where a conventionally-named mutating tool ships ungated insiderun_code. - Documented the heuristic-vs-structural tradeoff — the doc comment now states this is a heuristic guard, not a proof, and names the structural fix (a
mutating: trueflag declared at tool registration that derives both the set and the gate) as the tracked follow-up.
I confirmed no read-only registry tool name collides with any of the new prefixes (which would false-fail the looksMutating(name) && !MUTATING_TOOL_NAMES.has(name) === [] guard) — the read-only tools use find_/get_/list_/search_/inspect_/mine_/diagnose_/query_/service_/describe_/compare_/explore_/run_ and the prefix-less error_detail, none of which overlap. The codemode prefix-coverage and read-only-exclusion tests (get_/run_ still excluded) remain green. The change only loosens the heuristic toward more conservative gating, so it cannot regress safety.
Claude Opus | 𝕏
…ration Replaces the name-based gate with a per-tool structural flag, closing the heuristic's blind spot (a mutating tool named with an out-of-list verb could slip past the run_code sandbox). - McpToolRegistrar gains `mutatingTool()` alongside `tool()`; the 19 mutating tools now register via `server.mutatingTool(...)`, so each declares its own mutating-ness at registration (and a copied tool carries the gating with it). MapleToolDefinition gains `mutating: boolean`. - run_code's resolveCodeModeCall now blocks on `definition.mutating` (the structural flag), not just the name set — a mutating tool can't slip the gate regardless of its name. - mutating.test.ts asserts the flag set EXACTLY equals the shared MUTATING_TOOL_NAMES (both directions) so the structural flag and the static list (still needed by chat-flue + /chat/apply, which see tools over MCP) can't drift. The verb-prefix heuristic is kept as belt-and-suspenders for an unflagged, conventionally-named mutating tool. Closes the follow-up to the prior review's awareness-only finding. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Follow-up landed (
This resolves the open question from the prior review — the structural derivation is now in place rather than deferred. 24/24 packages typecheck, knip clean. 🤖 Addressed by Claude Code |
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — re-review of 2a4e677c, which replaces the verb-prefix heuristic from b613a346 with the structural mutating-tool gate the original review named as its first-choice fix. No new issues introduced.
mutatingToolregistration method —McpToolRegistrargains amutatingTool()alongsidetool()(types.ts); all 19 state-changing tools now register viaserver.mutatingTool(...)instead ofserver.tool(...), so a copied/new mutating tool carries its own gating rather than relying on a name list.- Structural
mutatingflag —MapleToolDefinitioncarriesreadonly mutating: boolean; the registry collector stamps it (add(true/false, …)), andresolveCodeModeCallnow gates ondefinition.mutating || MUTATING_TOOL_NAMES.has(name)— the flag is the primary gate, the set is belt-and-suspenders. - Drift-proofing test —
mutating.test.tsasserts the flagged set EXACTLY equalsMUTATING_TOOL_NAMESin both directions, plus retains thelooksMutating ⊆ setbelt-and-suspenders check;apps/api/.../mutating.tscollapses to a pure re-export of the codemode set.
I verified the 19 server.mutatingTool registrations match the 19 entries in @maple/codemode's MUTATING_TOOL_NAMES exactly; run_code itself registers via server.tool (mutating: false) and is handled by the existing RUN_CODE_TOOL_NAME recursion short-circuit; and server.ts's McpToolsLive registers every definition (including mutating ones) with the SDK, so direct invocation through the host's approval path is unaffected — the flag only gates the run_code sandbox. The api-side tests can't execute in this checkout (the @maple/* workspace symlinks are absent from node_modules, an install artifact unrelated to this PR), but @maple/codemode's own suite is green and the set/registration counts align by inspection. This is the cleaner long-term gate; nothing left open.
Claude Opus | 𝕏
dashboard-mutations.test.ts builds an McpToolRegistrar mock and calls registerUpdateDashboardTool to capture the handler. update_dashboard now registers via server.mutatingTool, so the tool()-only mock threw "server.mutatingTool is not a function". Capture from both tool() and mutatingTool(). Full apps/api suite green (614/614). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ristic Code Mode is a core capability, not a flagged experiment — rework it to its intended end state. - Remove MAPLE_CODE_MODE everywhere. Code Mode is active whenever the Worker Loader sandbox (`LOADER`) is bound; the agent (chat) and run_code tool (MCP) key off the binding's presence, degrading to the direct tools when it's absent (e.g. local dev). Both alchemy deploys now attach `LOADER: WorkerLoader()` unconditionally — deploying requires Worker Loader beta access on the account. - Drop the verb-prefix heuristic (MUTATING_TOOL_PREFIXES / looksMutating and its tests). It was a stepping stone before the structural `mutating` flag landed; with per-tool `server.mutatingTool(...)` registration + the exact flag<->list equality test, the heuristic is redundant scaffolding. typecheck 24/24, knip clean, apps/api 613 + codemode 25 tests green, chat-flue builds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — re-review of 1c612468, which finalizes Code Mode as a core capability by dropping the MAPLE_CODE_MODE flag and removing the verb-prefix mutating heuristic. No new issues introduced.
- Removed the
MAPLE_CODE_MODEflag entirely — activation now keys solely off theLOADER(worker_loader) binding's presence:run-code.tsgates onif (!loader), chatmaple-chat.tsonif (tools.length > 0 && ctx.env.LOADER), and the prompt block on the same. Grep confirms zero remaining references to the flag. WorkerLoader()now attached unconditionally in bothalchemy.run.tsfiles (was gated onprocess.env.MAPLE_CODE_MODE), with runtime degradation (if (!loader)) retained as the local-dev safety net.- Dropped
MUTATING_TOOL_PREFIXES/looksMutatingand their tests — redundant scaffolding now that the structuralmutatingflag (set atserver.mutatingTool(...)registration) is the load-bearing gate, backed by the exact flag↔list equality test inmutating.test.ts. Therun_codegate (definition.mutating || MUTATING_TOOL_NAMES.has(name)) and the 19-entry shared set are intact.
The structural mutating-gate guarantee is preserved: removing the prefix heuristic only drops a belt-and-suspenders check whose failures could only ever be false positives, since gating no longer depends on tool naming.
ℹ️ Unconditional WorkerLoader() makes Worker Loader beta access a hard deploy-time requirement
Previously the worker_loader binding deployed only when MAPLE_CODE_MODE was set, so accounts without Worker Loader beta access were unaffected and the runtime no-op was a soft fallback. With the binding now attached unconditionally in both alchemy.run.ts files, every deploy of apps/api and apps/chat-flue provisions it — so any stage whose Cloudflare account lacks beta access could fail to deploy, not just degrade at runtime.
The commit message states this is deliberate ("deploying requires Worker Loader beta access on the account"), so this is awareness-only — the runtime if (!loader) guard only covers unbound environments, not failed provisioning. Worth confirming every deploy target (dev / staging / prod) has beta access before this merges.
Technical details
# Unconditional WorkerLoader() binding → hard beta-access requirement on every stage
## Affected sites
- `apps/api/alchemy.run.ts:159` — `LOADER: WorkerLoader()` now unconditional (was `...(process.env.MAPLE_CODE_MODE?.trim() ? { LOADER: WorkerLoader() } : {})`).
- `apps/chat-flue/alchemy.run.ts:111` — same unconditional binding.
## Required outcome
- Every Cloudflare account/stage targeted by these two deploys has Worker Loader beta access, OR the team accepts that stages without it will fail to deploy until granted.
## Open questions for the human
- Do all current deploy targets (dev / staging / prod, plus any per-PR preview stages) have Worker Loader beta access? The runtime `if (!loader)` degradation does NOT cover a stage where alchemy can't provision the binding at all.Claude Opus | 𝕏
chat-flue was the only app whose dev script ran `flue dev` directly instead of
through portless, so portless never registered `chat-flue.localhost` and the web
app's `siblingUrl("chat-flue")` request got a bare portless 404 — surfacing in
the browser as a CORS error (the 404 has no CORS headers). The worker itself is
healthy on :3583 with correct CORS.
Mirror the other apps: `dev` → `portless`, `dev:app` runs `flue dev --port
${PORT:-3583}`, and a `portless` block names the host `chat-flue`. Pre-existing
gap (since the Flue rework), unrelated to Code Mode.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chat-flue's `flue dev` loaded no env (flue excludes .dev.vars/.env*, and no --env was passed), so the worker had no MAPLE_AUTH_MODE / CLERK_* / MAPLE_ROOT_ PASSWORD — every /agents/* request 401'd while the web app sent a Clerk token. Pass `--env ../../.env.local` (the shared dev secrets the other apps load via --env-file), which flue injects into the worker runtime env. Verified: with it, the internal-token /workflows guard passes (404 on unknown workflow) vs 401 without — so MAPLE_AUTH_MODE=clerk + CLERK_SECRET_KEY + INTERNAL_SERVICE_TOKEN + MAPLE_API_URL now reach the worker (fixes both /agents auth and the chat->MCP connection). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

What
Applies Cloudflare's Code Mode to both the AI chat (
apps/chat-flue) and the MCP server (apps/api). Instead of handing the model ~51 tools one at a time, it can write a JS snippet against a generated typedmaple.*API that runs in a Cloudflare Dynamic Worker isolate (network blocked); eachmaple.<tool>(input)RPCs back to the existing tools. Multi-step investigations ("find the worst service → fetch a sample trace → correlate") collapse from N model round-trips into one.Ships hybrid behind
MAPLE_CODE_MODE— the direct tools stay available as a fallback.How
packages/codemode/— new source-only shared package. Pure root barrel (buildApiDeclarationJSON-schema→TS,buildHarnessModule,formatRunResult) + a./sandboxsubpath (isolating thecloudflare:workersimport) withrunCodeInSandbox+MapleSupervisor(RpcTarget). The harness splices the model's JS into an async IIFE module — no runtimeeval.run_codeFlue tool injected only whenMAPLE_CODE_MODE=1and theLOADERbinding is present. Dispatch reuses the approval-gated tool execs, so mutatingmaple.*calls return proposals, collected into aproposed_batchenvelope the web renders as one approval card each (parseToolProposalBatch)./api/chat/applyis unchanged.run_codetool dispatching to registry handlers under the captured request runtime (FiberSet.makeRuntimePromise+Effect.scoped), preserving org scoping. Mutating tools are blocked inside code (call them directly so the host's approval applies).WorkerLoader()binding added to bothalchemy.run.ts, gated onMAPLE_CODE_MODE.Reviewer notes
worker_loaderbinding only deploys whenMAPLE_CODE_MODEis set, and the agent no-ops Code Mode without theLOADERbinding (direct tools only) — so default behavior is unchanged and accounts without beta access are unaffected. Live isolate execution can't be verified locally (flue dev/portless has no Worker Loader); it needs a deployed stage with beta access. If beta is unavailable, the planned fallback is a QuickJS-WASM backend behind the samerunCodeInSandboxseam (no rewrite of api-gen/approval/prompt layers).cloudflare:workersstays out of the Node-imported tool registry; the run-code↔registry require cycle is broken via a type-only import + dynamicmapleToolDefinitionsfetch.run_codetool is always registered (runtime-gated) so it shows in external MCP clients' tool lists even when the flag is off, returning a clean "not enabled" — deliberate, since the deploy-time flag isn't readable at module load in the Worker.Verification
codemode23 ·chat-flue51 ·apiMCP 103 +run_code7 ·web7. The harness runs end-to-end in Node (data-URL import with a fakeenv.MAPLE).cloudflare:workerssandbox bundled; bothalchemy.run.tsfiles bundle; the real system prompt renders correctly.Unrelated
apps/landing/*working-tree changes were intentionally left out of this PR.🤖 Generated with Claude Code