Architecture Documentation

Project: Code Executor MCP Version: 0.9.0 Last Updated: 2025-11-19

System Overview
Core Components
Progressive Disclosure Architecture
Security Architecture
Discovery System
Data Flow
Concurrency & Performance
Design Decisions
Resilience Patterns
CLI Setup Wizard Architecture
MCP Sampling Architecture (v1.0.0)

1. System Overview

Code Executor MCP is a universal MCP orchestration server that implements the progressive disclosure pattern to eliminate context bloat from exposing multiple MCP servers' tool schemas.

Problem Statement

Exposing 47 MCP tools directly to an AI agent consumes 141k tokens just for schemas, exhausting context before any work begins.

Solution

Two-tier access model:

Tier 1 (Top-level): 3 lightweight tools (~560 tokens)
- executeTypescript - Execute TypeScript code in Deno sandbox
- executePython - Execute Python code in Pyodide sandbox
- health - Server health check

Tier 2 (On-demand): All MCP tools accessible via code execution

// Inside sandbox, access any MCP tool on-demand
const result = await callMCPTool('mcp__zen__codereview', {...});

Result: 98% token reduction (141k → 1.6k tokens)

2. Core Components

2.1 Component Diagram

┌─────────────────────────────────────────────────────────────┐
│                        AI Agent (Claude)                    │
│                     (MCP Client Context)                    │
└────────────────┬────────────────────────────────────────────┘
                 │ MCP Protocol (STDIO)
                 │ Top-level tools: 3 tools, ~560 tokens
                 ▼
┌─────────────────────────────────────────────────────────────┐
│              Code Executor MCP Server (Node.js)             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │         MCP Proxy Server (HTTP Localhost)            │  │
│  │  • POST / (callMCPTool endpoint)                     │  │
│  │  • GET /mcp/tools (discovery endpoint - NEW v0.4.0)  │  │
│  │  • Bearer token authentication                       │  │
│  │  • Rate limiting (30 req/60s)                        │  │
│  │  • Audit logging (AsyncLock mutex)                   │  │
│  └──────────────┬───────────────────────────────────────┘  │
│                 │                                           │
│  ┌──────────────▼───────────────────────────────────────┐  │
│  │            MCP Client Pool                           │  │
│  │  • Manages connections to multiple MCP servers       │  │
│  │  • Parallel queries (Promise.all)                    │  │
│  │  • Resilient aggregation (partial failure handling)  │  │
│  │  • In-memory tool list (listAllTools)                │  │
│  └──────────────┬───────────────────────────────────────┘  │
│                 │                                           │
│  ┌──────────────▼───────────────────────────────────────┐  │
│  │            Schema Cache                              │  │
│  │  • LRU cache (max 1000 entries)                      │  │
│  │  • Disk persistence (~/.code-executor/cache.json)    │  │
│  │  • 24h TTL with stale-on-error fallback              │  │
│  │  • AsyncLock mutex (thread-safe writes)              │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │     Sandbox Executors (Deno/Pyodide subprocesses)    │  │
│  │  • Isolated execution context                        │  │
│  │  • Injected globals:                                 │  │
│  │    - callMCPTool(name, params)                       │  │
│  │    - discoverMCPTools(options) - NEW v0.4.0          │  │
│  │    - getToolSchema(toolName) - NEW v0.4.0            │  │
│  │    - searchTools(query, limit) - NEW v0.4.0          │  │
│  │  • Restricted permissions (allowlist, network, fs)   │  │
│  └──────────────────────────────────────────────────────┘  │
└────────────────┬────────────────────────────────────────────┘
                 │ MCP Protocol (STDIO)
                 │ External MCP Servers (parallel queries)
                 ▼
┌─────────────────────────────────────────────────────────────┐
│    External MCP Servers (filesystem, zen, linear, etc.)     │
│    • Queried in parallel via Promise.all (O(1) amortized)   │
│    • Each returns tools/list and tools/call responses        │
│    • Discovery: 50-100ms first call, <5ms cached             │
└─────────────────────────────────────────────────────────────┘

2.2 Component Responsibilities

Component	Responsibility (SRP)	Pattern	Concurrency Safe
MCP Proxy Server	Route HTTP requests, enforce auth/rate limiting, audit log	Proxy	Yes (AsyncLock on audit logs)
MCP Client Pool	Manage MCP connections, parallel query aggregation	Pool	Yes (read-only queries, write-once at startup)
Schema Cache	Cache tool schemas, disk persistence, LRU eviction	Cache	Yes (AsyncLock on disk writes)
Sandbox Executor	Execute untrusted code in isolated environment	Sandbox	Yes (independent subprocesses)
Discovery Functions	Provide in-sandbox tool discovery (v0.4.0)	Wrapper	Yes (stateless HTTP calls)

3. Progressive Disclosure Architecture

3.1 Token Budget Preservation

Design Goal: Maintain ~1.6k tokens for top-level tools (98% reduction from 141k baseline)

Achievement (v0.4.0):

Tool count: 3 tools (no increase from v0.3.x)
Token usage: ~560 tokens (well below 1.6k budget)
Discovery functions: Hidden from top-level (injected in sandbox only)

3.2 Two-Tier Access Model

Tier 1: Top-Level Tools (Exposed to AI Agent)

// AI agent sees only these in context:
- executeTypescript(code, allowedTools?, timeoutMs?, permissions?)
- executePython(code, allowedTools?, timeoutMs?, permissions?)
- health()

Tier 2: On-Demand Tools (Accessible Inside Sandbox)

// Inside executeTypescript code, AI agent can:

// 1. Execute any MCP tool (existing v0.3.x)
const result = await callMCPTool('mcp__zen__codereview', {
  step: 'Analysis',
  relevant_files: ['/path/to/file.ts'],
  // ... other params
});

// 2. Discover available tools (NEW v0.4.0)
const allTools = await discoverMCPTools();
// Returns: ToolSchema[] (name, description, parameters)

// 3. Search tools by keyword (NEW v0.4.0)
const fileTools = await searchTools('file read write', 10);
// Returns: Top 10 tools matching keywords (OR logic, case-insensitive)

// 4. Inspect tool schema (NEW v0.4.0)
const schema = await getToolSchema('mcp__filesystem__read_file');
// Returns: Full JSON Schema for tool parameters + outputSchema (v0.6.0)

3.3 Output Schema Support (NEW v0.6.0)

Design Goal: Enable AI agents to understand tool response structure without trial execution

Implementation:

All 3 code-executor tools provide Zod schemas for responses (outputSchema)
Uses MCP SDK native support (ZodRawShape format)
Graceful fallback for third-party tools without output schemas

Response Schemas:

// ExecutionResult (run-typescript-code, run-python-code)
{
  success: boolean,
  output: string,
  error?: string,
  executionTimeMs: number,
  toolCallsMade?: string[],
  toolCallSummary?: ToolCallSummaryEntry[]
}

// HealthCheck (health)
{
  healthy: boolean,
  auditLog: { enabled: boolean },
  mcpClients: { connected: number },
  connectionPool: { active, waiting, max },
  uptime: number,
  timestamp: string
}

Benefits:

✅ AI agents know response structure upfront
✅ No trial-and-error required for filtering/aggregation
✅ Better code generation (correct field access)
✅ Optional field - no breaking changes

Data Flow:

1. Tool registration: Zod schema → MCP SDK Tool.outputSchema
2. Discovery: MCPClientPool returns ToolSchema with outputSchema
3. Schema cache: CachedToolSchema.outputSchema persisted (24h TTL)
4. Graceful fallback: Third-party tools return outputSchema: undefined

3.4 OutputSchema Protocol Support (v0.7.1+)

✅ RESOLVED: MCP SDK v1.22.0 Native Support

Status: OutputSchema is now fully functional in the MCP protocol as of v0.7.1 (MCP SDK v1.22.0).

What Changed:

✅ MCP SDK v1.22.0 exposes outputSchema via tools/list protocol response
✅ All 5 code-executor tools expose response structure to AI agents
✅ External MCP clients can see outputSchema immediately
✅ No trial execution needed for response structure discovery

Protocol Response (v1.22.0):

{
  "tools": [
    {
      "name": "run-typescript-code",
      "description": "...",
      "inputSchema": { "type": "object", "properties": { ... } },
      "outputSchema": {  // ✅ NOW EXPOSED IN PROTOCOL
        "type": "object",
        "properties": {
          "success": { "type": "boolean" },
          "output": { "type": "string" },
          "error": { "type": "string" },
          "executionTimeMs": { "type": "number" }
        }
      }
    }
  ]
}

Verification Test:

node test-outputschema-v122.mjs
# Result:
# ✅ run-typescript-code: outputSchema: YES! (6 fields)
# ✅ run-python-code: outputSchema: YES! (6 fields)
# ✅ health: outputSchema: YES! (6 fields)
# 🎉 SUCCESS! All tools have outputSchema exposed in protocol!

Migration Details (v1.0.4 → v1.22.0):

Handler signatures updated: (params) → (args, extra)
Added RequestHandlerExtra for request context (cancellation signals, session tracking)
Runtime Zod validation preserved (zero functional changes)
All 620 tests passing, zero regressions

Impact:

Issue #28 RESOLVED: AI agents now see response structure upfront
No trial-and-error: Agents can write correct filtering/aggregation code immediately
Progressive disclosure intact: Still 98% token reduction (141k → 1.6k)
Future-proof: Ready for ecosystem-wide outputSchema adoption

4. Security Architecture

4.1 Security Boundaries

┌─────────────────────────────────────────────────────────────┐
│ Security Boundary 1: MCP Proxy Server (Auth + Rate Limit)   │
│  • Bearer token authentication (per-execution, 32-byte)      │
│  • Rate limiting (30 req/60s per client)                     │
│  • Query validation (max 100 chars, alphanumeric+safe chars) │
│  • Audit logging (all requests, success/failure)             │
└─────────────────────────────────────────────────────────────┘
                         │
┌─────────────────────────────────────────────────────────────┐
│ Security Boundary 2: Tool Allowlist (Execution Gating)      │
│  • Enforced by executeTypescript allowedTools parameter      │
│  • Discovery bypasses allowlist (read-only metadata)         │
│  • Execution still enforced (callMCPTool checks allowlist)   │
│  • Trade-off documented: discovery = read, execution = write │
└─────────────────────────────────────────────────────────────┘
                         │
┌─────────────────────────────────────────────────────────────┐
│ Security Boundary 3: Sandbox Isolation (Code Execution)     │
│  • Deno sandbox with restricted permissions                  │
│  • No filesystem access (unless explicitly allowed)          │
│  • No network access (except localhost proxy)                │
│  • No environment variable access                            │
│  • Memory limits enforced                                    │
└─────────────────────────────────────────────────────────────┘

4.2 Security Trade-Off: Discovery Allowlist Bypass

Decision (v0.4.0): Discovery functions bypass tool allowlist for read-only metadata access.

Rationale:

Problem: AI agents get stuck without knowing what tools exist (blind execution)
Solution: Allow discovery of tool schemas (read-only metadata)
Mitigation: Execution still enforces allowlist (two-tier security model)
Risk Assessment: LOW - schemas are non-sensitive metadata, no execution without allowlist

Security Model:

Operation	Allowlist Check	Auth Required	Rate Limited	Audit Logged
Discovery (discoverMCPTools)	❌ Bypassed	✅ Required	✅ Yes (30/60s)	✅ Yes
Execution (callMCPTool)	✅ Enforced	✅ Required	✅ Yes (30/60s)	✅ Yes

Constitutional Alignment: This intentional exception is documented in spec.md Section 2 (Constitutional Exceptions) as BY DESIGN per Principle 2 (Security Zero Tolerance).

5. Discovery System (NEW v0.4.0)

5.1 Discovery Architecture

Design Goal: Enable AI agents to discover, search, and inspect MCP tools without manual documentation lookup.

┌─────────────────────────────────────────────────────────────┐
│ Discovery Flow (Single Round-Trip)                          │
│                                                              │
│  AI Agent executes ONE TypeScript call:                     │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ const tools = await discoverMCPTools();             │   │
│  │ const schema = await getToolSchema('tool_name');    │   │
│  │ const result = await callMCPTool('tool_name', {...});│  │
│  └─────────────────────────────────────────────────────┘   │
│                                                              │
│  No context switching, variables persist across steps       │
└─────────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ Sandbox → Proxy: HTTP GET /mcp/tools                        │
│  • 500ms timeout (fast fail, no hanging)                    │
│  • Bearer token in Authorization header                     │
│  • Optional ?q=keyword1+keyword2 search                     │
└─────────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ Proxy → MCP Servers: Parallel Queries (Promise.all)         │
│  • Query all MCP servers simultaneously (O(1) amortized)    │
│  • Use Schema Cache for schemas (24h TTL, disk-persisted)   │
│  • Resilient aggregation (partial failures handled)         │
│  • Performance: First call 50-100ms, cached <5ms            │
└─────────────────────────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│ Response: ToolSchema[] (JSON)                               │
│  [                                                           │
│    {                                                         │
│      "name": "mcp__filesystem__read_file",                  │
│      "description": "Read file contents",                   │
│      "parameters": { /* JSON Schema */ }                    │
│    },                                                        │
│    ...                                                       │
│  ]                                                           │
└─────────────────────────────────────────────────────────────┘

5.2 Discovery Functions

discoverMCPTools(options?)

Purpose: Fetch all available tool schemas from connected MCP servers

Signature:

interface DiscoveryOptions {
  search?: string[]; // Optional keyword array (OR logic, case-insensitive)
}

async function discoverMCPTools(
  options?: DiscoveryOptions
): Promise<ToolSchema[]>

Implementation:

Injected into sandbox as globalThis.discoverMCPTools
Calls GET /mcp/tools endpoint (localhost proxy)
500ms timeout via AbortSignal.timeout(500)
Returns full tool schemas with JSON Schema parameters

Performance:

First call: 50-100ms (populates schema cache)
Subsequent calls: <5ms (from cache, 24h TTL)
Parallel queries across 3+ MCP servers: <100ms P95

getToolSchema(toolName)

Purpose: Retrieve full JSON Schema for a specific tool

Signature:

async function getToolSchema(
  toolName: string
): Promise<ToolSchema | null>

Implementation:

Wrapper over discoverMCPTools() (DRY principle)
Finds tool by name using Array.find()
Returns null if tool not found (no exceptions)

searchTools(query, limit?)

Purpose: Search tools by keywords with result limiting

Signature:

async function searchTools(
  query: string,
  limit?: number // Default: 10
): Promise<ToolSchema[]>

Implementation:

Splits query by whitespace: query.split(/\s+/)
Calls discoverMCPTools({ search: keywords })
Applies result limit via Array.slice(0, limit)
OR logic: matches if ANY keyword found in name/description

5.3 Parallel Query Pattern

Design Decision: Query all MCP servers in parallel using Promise.all for O(1) amortized latency.

Sequential vs Parallel:

// ❌ Sequential (3 servers × 30ms each = 90ms)
for (const client of mcpClients) {
  const tools = await client.listTools(); // Wait for each
  allTools.push(...tools);
}

// ✅ Parallel (max 30ms, O(1) amortized)
const queries = mcpClients.map(client => client.listTools());
const results = await Promise.all(queries); // All at once
const allTools = results.flat();

Resilient Aggregation:

// Handle partial failures gracefully
const queries = mcpClients.map(async client => {
  try {
    return await client.listTools();
  } catch (error) {
    console.error(`MCP server ${client.name} failed:`, error);
    return { tools: [] }; // Return empty, don't block others
  }
});

Performance Benefit:

1 MCP server: 30ms (baseline)
3 MCP servers (sequential): 90ms (3× slower)
3 MCP servers (parallel): 35ms (O(1) amortized)
10 MCP servers (parallel): 50ms (still O(1))

Target Met: P95 latency <100ms for 3 MCP servers (spec.md NFR-2)

5.4 Timeout Strategy

Design Decision: 500ms timeout for proxy→sandbox communication (fast fail, no retries).

Rationale:

AI agents prefer fast failure over hanging
500ms allows parallel queries (100ms + network overhead)
No retries: discovery errors should surface immediately
Clear error messages guide AI agent to retry if transient

Implementation:

// Sandbox side (fetch with timeout)
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 500);

try {
  const response = await fetch(url, {
    signal: controller.signal,
    headers: { 'Authorization': `Bearer ${token}` }
  });
  return await response.json();
} catch (error) {
  if (error.name === 'AbortError') {
    throw new Error('Discovery timeout (500ms exceeded). MCP servers may be slow.');
  }
  throw error;
} finally {
  clearTimeout(timeoutId);
}

6. Pyodide WebAssembly Sandbox (Python Executor)

6.1 Security Resolution: Issues #50/#59

Problem: Native Python executor (subprocess.spawn) had ZERO sandbox isolation.

Solution: Pyodide WebAssembly runtime with complete isolation.

6.2 Pyodide Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Python Code Execution                     │
└────────────────┬────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────┐
│          Pyodide WebAssembly Sandbox (v0.26.4)              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │           WebAssembly VM (Primary Boundary)          │  │
│  │  • No native syscall access                          │  │
│  │  • Memory-safe (bounds checking, type safety)        │  │
│  │  • Cross-platform consistency                        │  │
│  └──────────────┬───────────────────────────────────────┘  │
│                 │                                           │
│  ┌──────────────▼───────────────────────────────────────┐  │
│  │         Virtual Filesystem (Emscripten FS)           │  │
│  │  • In-memory only (no host access)                   │  │
│  │  • /tmp writable, / read-only                        │  │
│  │  • Host files completely inaccessible                │  │
│  └──────────────┬───────────────────────────────────────┘  │
│                 │                                           │
│  ┌──────────────▼───────────────────────────────────────┐  │
│  │       Network Access (pyodide.http.pyfetch)          │  │
│  │  • Localhost only (127.0.0.1)                        │  │
│  │  • Bearer token authentication required              │  │
│  │  • MCP proxy enforces tool allowlist                 │  │
│  └──────────────┬───────────────────────────────────────┘  │
│                 │                                           │
│  ┌──────────────▼───────────────────────────────────────┐  │
│  │          Injected MCP Functions                      │  │
│  │  • call_mcp_tool(name, params)                       │  │
│  │  • discover_mcp_tools(search_terms)                  │  │
│  │  • get_tool_schema(tool_name)                        │  │
│  │  • search_tools(query, limit)                        │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

6.3 Two-Phase Execution Pattern

Design: Based on Pydantic's mcp-run-python (production-proven).

Phase 1: Setup (Inject MCP Tool Access)

# Executed by Pyodide before user code
import js
from pyodide.http import pyfetch

async def call_mcp_tool(tool_name, params):
    # Call MCP proxy with bearer auth
    response = await pyfetch(
        f'http://localhost:{js.PROXY_PORT}',
        method='POST',
        headers={'Authorization': f'Bearer {js.AUTH_TOKEN}'},
        body=json.dumps({'toolName': tool_name, 'params': params})
    )
    return await response.json()

# Discovery functions also injected

Phase 2: Execute User Code

# User's code runs in sandboxed environment
# Has access to injected functions but not host system
result = await call_mcp_tool('mcp__filesystem__read_file', {...})

WHY Two-Phase?

Prevents user code from tampering with injection mechanism
Clear separation of setup vs execution
Injection happens in trusted context before untrusted code runs

6.4 Global Pyodide Cache

Problem: Pyodide initialization is expensive (~2-3s with npm package).

Solution: Global cached instance shared across executions.

let pyodideCache: PyodideInterface | null = null;

async function getPyodide(): Promise<PyodideInterface> {
  if (!pyodideCache) {
    console.error('🐍 Initializing Pyodide (first run, ~10s)...');
    pyodideCache = await loadPyodide({
      indexURL: 'https://cdn.jsdelivr.net/pyodide/v0.26.4/full/',
      stdin: () => { throw new Error('stdin disabled for security'); },
    });
  }
  return pyodideCache;
}

Performance:

First call: ~2-3s initialization (npm package includes files locally)
Subsequent calls: <100ms (cache hit)
Memory overhead: ~20MB (WASM module + Python runtime)

6.5 Security Boundaries

Boundary	Enforcement	Attack Prevention
WASM VM	V8 engine	No syscalls, no native code execution
Virtual FS	Emscripten	No host file access (/etc/passwd, ~/.ssh)
Network	Fetch API + proxy	No external network, only localhost MCP
MCP Allowlist	Proxy validation	No unauthorized tool execution
Timeout	Promise.race()	No infinite loops, resource exhaustion

Attack Surface Reduction: 99% vs native Python executor.

6.6 Limitations & Trade-offs

Acceptable Limitations:

Pure Python only - No native C extensions (unless WASM-compiled)
- ✅ Most Python stdlib works (json, asyncio, math, etc.)
- ❌ No numpy, pandas, scikit-learn (unless Pyodide-compiled versions)
10-30% slower - WASM overhead
- ✅ Acceptable for security-critical environments
- ✅ Still faster than Docker container startup
No multiprocessing/threading - Single-threaded WASM
- ✅ Use async/await instead (fully supported)
4GB memory limit - WASM 32-bit addressing
- ✅ Sufficient for most scripts
- ❌ Large ML models won't fit

Security Trade-off: Performance cost is acceptable for complete isolation.

6.7 Industry Validation

Production Usage:

Pydantic mcp-run-python - Reference implementation
JupyterLite - Run Jupyter notebooks in browser
Google Colab - Similar WASM isolation approach
VS Code Python REPL - Uses Pyodide for in-browser Python
PyScript - HTML tags powered by Pyodide

Security Review: Gemini 2.0 Flash validation via zen clink (research-specialist agent).

7. Data Flow

7.1 Tool Execution Flow (Existing v0.3.x)

1. AI Agent → executeTypescript(code)
2. Sandbox spawned (Deno subprocess)
3. Code executes: callMCPTool('tool_name', params)
4. Sandbox → HTTP POST localhost:PORT/
5. Proxy validates: Bearer token, rate limit, allowlist
6. Proxy → MCP Client Pool → External MCP Server
7. MCP Server executes tool, returns result
8. Result → Proxy → Sandbox → AI Agent

6.2 Tool Discovery Flow (NEW v0.4.0)

1. AI Agent → executeTypescript(code with discoverMCPTools())
2. Sandbox executes: discoverMCPTools({ search: ['file'] })
3. Sandbox → HTTP GET localhost:PORT/mcp/tools?q=file
4. Proxy validates: Bearer token, rate limit, query (<100 chars)
5. Proxy → MCP Client Pool.listAllToolSchemas(schemaCache)
6. Client Pool queries all MCP servers in parallel (Promise.all)
7. Schema Cache provides cached schemas (<5ms) or fetches (50ms)
8. Proxy filters by keywords (OR logic, case-insensitive)
9. Proxy audits: { action: 'discovery', searchTerms: ['file'], count: 5 }
10. Result → Sandbox → AI Agent (ToolSchema[] JSON)

6.3 Schema Caching Flow

1. First discovery call: Cache miss
   → Query MCP servers (50-100ms)
   → Store in LRU cache (in-memory, max 1000 entries)
   → Persist to disk (~/.code-executor/schema-cache.json, AsyncLock)
   → Return schemas

2. Subsequent calls (within 24h): Cache hit
   → Retrieve from LRU cache (<5ms)
   → No network calls
   → Return cached schemas

3. After 24h TTL: Cache expired
   → Re-query MCP servers (background refresh)
   → Update cache
   → Return fresh schemas

4. MCP server failure: Stale-on-error
   → Use expired cache entry (better than failure)
   → Log warning
   → Return stale schemas

7. Concurrency & Performance

7.1 Concurrency Safety (AsyncLock)

Shared Resources Protected:

Resource	Lock Name	Why Protected	Performance Impact
Schema Cache Disk Writes	`schema-cache-write`	Prevent file corruption from concurrent updates	Negligible (writes rare, 24h TTL)
Audit Log Appends	`audit-log-write`	Prevent interleaved log entries	Negligible (<1ms lock hold)

AsyncLock Pattern:

import AsyncLock from 'async-lock';
const lock = new AsyncLock();

// Schema cache writes
await lock.acquire('schema-cache-write', async () => {
  await fs.writeFile(cachePath, JSON.stringify(cache));
});

// Audit log appends
await lock.acquire('audit-log-write', async () => {
  await fs.appendFile(auditLogPath, logEntry + '\n');
});

7.2 Performance Characteristics

Operation	First Call	Cached Call	Target	Actual (v0.4.0)
discoverMCPTools (1 server)	30ms	<5ms	<50ms	✅ 30ms / 3ms
discoverMCPTools (3 servers)	50-100ms	<5ms	<100ms P95	✅ 60ms / 4ms
discoverMCPTools (10 servers)	80-150ms	<10ms	<150ms P95	✅ 120ms / 8ms
getToolSchema (specific tool)	50ms	<5ms	N/A	✅ Same as discover
searchTools (keyword filter)	50ms	<5ms	N/A	✅ Same as discover

Key Optimizations:

✅ Parallel queries (Promise.all) → O(1) amortized complexity
✅ Schema Cache with 24h TTL → 20× faster (100ms → 5ms)
✅ In-memory LRU cache (max 1000 entries) → No disk I/O on hits
✅ Disk persistence → Survives restarts, no re-fetching
✅ Stale-on-error fallback → Resilient to transient failures

7.3 Memory & Storage

Memory Footprint:

Schema Cache (in-memory): ~1-2MB (1000 schemas × ~1-2KB each)
MCP Client connections: ~100KB per server
Sandbox subprocesses: ~50MB per execution (isolated, cleaned up)

Disk Storage:

Schema Cache: ~/.code-executor/schema-cache.json (~500KB-1MB)
Audit Logs: ~/.code-executor/audit-logs/*.jsonl (append-only, rotated daily)

8. Design Decisions

8.1 Why Progressive Disclosure?

Problem: Exposing all MCP tool schemas exhausts context budget.

Decision: Hide tools behind code execution, load on-demand.

Trade-offs:

✅ Benefit: 98% token reduction (141k → 1.6k)
✅ Benefit: Zero context overhead for unused tools
❌ Cost: Two-step process (discover → execute)
✅ Mitigation (v0.4.0): Single round-trip workflow (discover + execute in one call)

8.2 Why Parallel Queries?

Problem: Sequential MCP queries scale linearly (3 servers = 3× latency).

Decision: Query all MCP servers in parallel using Promise.all.

Trade-offs:

✅ Benefit: O(1) amortized latency (max of all queries, not sum)
✅ Benefit: Meets <100ms P95 target for 3 servers
❌ Cost: More complex error handling (partial failures)
✅ Mitigation: Resilient aggregation (one failure doesn't block others)

8.3 Why 500ms Timeout?

Problem: Slow MCP servers cause AI agents to hang indefinitely.

Decision: 500ms timeout on sandbox→proxy discovery calls.

Trade-offs:

✅ Benefit: Fast fail (AI agent gets immediate feedback)
✅ Benefit: Allows parallel queries (100ms + 400ms network/overhead)
❌ Cost: May timeout on legitimately slow servers (10+)
✅ Mitigation: Clear error message guides retry, stale-on-error fallback

8.4 Why Bypass Allowlist for Discovery?

Problem: AI agents stuck without knowing what tools exist.

Decision: Discovery bypasses allowlist, execution still enforced.

Trade-offs:

✅ Benefit: AI agents can self-discover tools (no manual docs)
✅ Benefit: Read-only metadata, no execution without allowlist
❌ Risk: Information disclosure (tool names/descriptions visible)
✅ Mitigation: Two-tier security (discovery=read, execution=write), auth + rate limit + audit log

Risk Assessment: LOW - tool schemas are non-sensitive metadata, no code execution without allowlist enforcement.

8.5 Why Schema Cache with 24h TTL?

Problem: Querying MCP servers on every discovery call wastes 50-100ms.

Decision: Disk-persisted LRU cache with 24h TTL.

Trade-offs:

✅ Benefit: 20× faster (100ms → 5ms) on cache hits
✅ Benefit: Survives server restarts (disk persistence)
❌ Cost: Stale schemas if MCP servers update within 24h
✅ Mitigation: Smart refresh on validation failures, manual cache clear available

9. Resilience Patterns (v0.5.0)

9.1 Circuit Breaker Pattern

Purpose: Prevent cascade failures when MCP servers hang or fail repeatedly.

Implementation: Opossum library wrapping MCP client pool calls

State Machine:

CLOSED (Normal Operation)
   ↓ 5 consecutive failures
OPEN (Fail Fast - 30s cooldown)
   ↓ After 30s timeout
HALF-OPEN (Test with 1 request)
   ↓ Success → CLOSED | Failure → OPEN

Configuration:

Failure Threshold: 5 consecutive failures
Cooldown Period: 30 seconds
Half-Open Test: 1 request

WHY 5 failures?

Low enough to detect problems quickly
High enough to avoid false positives from transient errors
Balances responsiveness with stability

WHY 30s cooldown?

Kubernetes default terminationGracePeriodSeconds is 30s
AWS ALB deregistration delay is also 30s default
Allows time for failing server to recover or be replaced

Metrics Exposed:

circuit_breaker_state (gauge): 0=closed, 1=open, 0.5=half-open
circuit_breaker_failures_total (counter): Total failures per server

Example:

// Circuit breaker wraps MCP client pool calls
const breaker = new CircuitBreakerFactory({
  failureThreshold: 5,
  resetTimeout: 30000,
});

// Fails fast when circuit open (no waiting on broken server)
try {
  const result = await breaker.callTool('mcp__server__tool', params);
} catch (error) {
  if (error.message.includes('circuit open')) {
    // Handle gracefully - server is known to be down
  }
}

9.2 Connection Pool Overflow Queue

Purpose: Add request queueing and backpressure when connection pool reaches capacity.

Implementation: FIFO queue with timeout-based expiration and AsyncLock protection

Architecture:

MCP Request → Check Pool Capacity
   ↓ Pool under capacity (< 100 concurrent)
   Execute Immediately
   ↓ Pool at capacity (≥ 100 concurrent)
   Enqueue Request (max 200 in queue)
      ↓ Queue full
      Return 503 Service Unavailable
      ↓ Queued successfully
      Wait for slot (max 30s timeout)
         ↓ Timeout exceeded
         Return 503 with retry-after hint
         ↓ Slot available
         Dequeue and execute

Configuration:

Pool Capacity: 100 concurrent requests (configurable via POOL_MAX_CONCURRENT)
Queue Size: 200 requests (configurable via POOL_QUEUE_SIZE)
Queue Timeout: 30 seconds (configurable via POOL_QUEUE_TIMEOUT_MS)

WHY 100 concurrent requests?

Balances throughput vs MCP server resource consumption
Most MCP servers handle 100 concurrent requests comfortably
Configurable for tuning based on actual MCP server capacity

WHY 200 queue size?

Provides 2× buffer beyond concurrency limit
Balances memory usage (~40KB at 200 requests) vs utility
More conservative than Nginx default (512)

WHY 30s timeout?

Reasonable wait time for legitimate traffic
Prevents queue from filling with stale requests
Matches circuit breaker cooldown (30s recovery window)

Metrics Exposed:

pool_active_connections (gauge): Current concurrent requests
pool_queue_depth (gauge): Number of requests waiting in queue
pool_queue_wait_seconds (histogram): Time spent waiting (buckets: 0.1s-30s)

Example:

// Pool automatically queues when at capacity
const pool = new MCPClientPool({
  maxConcurrent: 100,
  queueSize: 200,
  queueTimeoutMs: 30000,
});

// Request queued if pool full, executed when slot available
try {
  const result = await pool.callTool('mcp__tool', params);
} catch (error) {
  if (error.message.includes('Service Unavailable')) {
    // Queue full or timeout - implement retry logic
  }
}

9.3 Resilience Pattern Interaction

Circuit Breaker + Queue:

Request → Circuit Breaker Check
   ↓ Circuit OPEN
   Fail Fast (no queue)
   ↓ Circuit CLOSED/HALF-OPEN
   Check Pool Capacity
      ↓ Under capacity
      Execute immediately
      ↓ At capacity
      Enqueue (with timeout)

Benefits:

Circuit breaker prevents queueing requests to known-bad servers
Queue provides graceful degradation under load
Combined: Fast failure for broken servers, queueing for healthy ones

Failure Modes:

MCP Server Down: Circuit breaker opens → immediate 503 (no queueing)
MCP Server Slow: Queue fills → 503 after 30s timeout
High Load: Queue drains as capacity frees → requests succeed with delay

9.4 Backpressure Signaling

HTTP Status Codes:

200 OK - Request succeeded (no backpressure)
429 Too Many Requests - Rate limit exceeded (per-client limit hit)
503 Service Unavailable - Circuit open OR queue full/timeout

Retry Guidance:

503 Circuit Open
   Retry-After: 30 (wait for circuit to close)

503 Queue Full
   Retry-After: 60 (estimated queue drain time)

503 Queue Timeout
   Retry-After: 30 (try again with fresh timeout)

Monitoring:

# Alert on high queue depth
pool_queue_depth > 150  # Queue >75% full

# Alert on frequent circuit opens
rate(circuit_breaker_failures_total[5m]) > 10

# Alert on slow queue processing
histogram_quantile(0.95, pool_queue_wait_seconds) > 15

9.5 Performance Impact

Latency Overhead:

Circuit Breaker: <1ms per request (state check)
Queue Check: <1ms per request (counter comparison)
Queue Wait: 0-30s (depends on load)

Memory Overhead:

Circuit Breaker: ~10KB per server (state tracking)
Connection Queue: ~200 bytes per queued request (max ~40KB)

Total Overhead: Negligible (<0.1% CPU, <1MB RAM)

10. CLI Setup Wizard Architecture (v0.9.0)

10.1 Overview

The CLI setup wizard provides one-command initialization of code-executor-mcp with automatic MCP server discovery, wrapper generation, and daily sync scheduling.

Entry Point: npm run setup → src/cli/index.ts

Design Goal: Zero-config setup with smart defaults, cross-platform support, and idempotent operation.

10.2 Component Diagram

┌─────────────────────────────────────────────────────────────┐
│                     CLI Entry Point                          │
│                   (src/cli/index.ts)                         │
│  • Self-install check (SelfInstaller)                       │
│  • Lock acquisition (LockFileService)                       │
│  • Wizard orchestration                                     │
└────────────────┬────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────┐
│                      CLIWizard                               │
│                  (src/cli/wizard.ts)                         │
│  • Interactive prompts (tool selection, config questions)   │
│  • Default config pattern (press Enter to skip)             │
│  • Idempotent setup (merge/reset/keep existing configs)     │
└────────────┬────────────────────────────────────────────────┘
             │
             ├─────────────────┬──────────────────┬────────────┐
             ▼                 ▼                  ▼            ▼
┌──────────────────┐  ┌─────────────────┐  ┌──────────┐  ┌────────────┐
│  ToolDetector    │  │ MCPDiscovery    │  │ Wrapper  │  │  Daily     │
│                  │  │   Service       │  │Generator │  │   Sync     │
│ • Detect Claude  │  │ • Scan configs: │  │ • TS/Py  │  │ • Schedule │
│   Code install   │  │   ~/.claude.json│  │   wrapper│  │   setup    │
│ • Validate paths │  │   .mcp.json     │  │   gen    │  │ • Platform │
│                  │  │ • Merge servers │  │ • JSDoc  │  │   specific │
└──────────────────┘  └─────────────────┘  └──────────┘  └────────────┘

10.3 Config Discovery & Merging

Two-Location Scan Pattern:

// 1. Scan global Claude Code config
const globalServers = await discovery.scanToolConfig({
  id: 'claude-code',
  configPaths: {
    linux: '~/.claude.json',
    darwin: '~/.claude.json',
    win32: '%USERPROFILE%\\.claude.json'
  }
});

// 2. Scan project config
const projectServers = await discovery.scanProjectConfig('.mcp.json');

// 3. Merge (project overrides global for duplicate names)
const mergedServers = mergeMCPServers(globalServers, projectServers);

Path Expansion:

~ → os.homedir() (Linux/macOS)
%USERPROFILE% → process.env.USERPROFILE (Windows)
%APPDATA% → process.env.APPDATA (Windows)

Fallback Behavior:

Config file not found → Prompt user for custom path or skip
Invalid JSON → Log error, skip tool
Missing command field → Log warning, skip server

10.4 Wrapper Generation

Design: Template-based code generation with schema-driven parameter types.

Templates:

src/cli/templates/
├── typescript-wrapper.hbs  # TypeScript wrapper template
└── python-wrapper.hbs      # Python wrapper template

Generation Flow:

1. Fetch tool schemas from MCP servers (via schema cache)
2. For each tool:
   - Extract name, description, parameters (JSON Schema)
   - Generate JSDoc comments from schema
   - Generate TypeScript types from JSON Schema
   - Render template with Handlebars
3. Write wrappers to output directory

Example Output:

// Before (manual)
const file = await callMCPTool('mcp__filesystem__read_file', {
  path: '/src/app.ts'
});

// After (wrapper)
import { filesystem } from './mcp-wrappers';
const file = await filesystem.readFile({ path: '/src/app.ts' });

Benefits:

Type-safe with IntelliSense/autocomplete
Self-documenting JSDoc from schemas
No manual tool name lookups
Matches actual MCP tool APIs

10.5 Daily Sync System

Purpose: Automatically regenerate wrappers when MCP servers change.

Architecture:

┌─────────────────────────────────────────────────────────────┐
│              Platform Scheduler (scheduled job)              │
│  • macOS: launchd plist (~/.config/launchd/...)             │
│  • Linux: systemd timer (~/.config/systemd/user/...)        │
│  • Windows: Task Scheduler (HKCU\Software\Microsoft\...)    │
└────────────────┬────────────────────────────────────────────┘
                 │
                 ▼ (runs at 4-6 AM daily)
┌─────────────────────────────────────────────────────────────┐
│                 DailySyncService                             │
│             (src/cli/daily-sync.ts)                          │
│  1. Re-scan configs (~/.claude.json + .mcp.json)            │
│  2. Detect changes (new/removed/modified servers)           │
│  3. Regenerate wrappers if changes detected                 │
│  4. Log sync status                                         │
└─────────────────────────────────────────────────────────────┘

Scheduler Implementation:

Platform	Mechanism	Config Location	Command
macOS	launchd plist	`~/Library/LaunchAgents/com.code-executor.daily-sync.plist`	`launchctl load/unload`
Linux	systemd timer	`~/.config/systemd/user/code-executor-daily-sync.timer`	`systemctl --user enable/disable`
Windows	Task Scheduler	`HKCU\Software\Microsoft\Windows\CurrentVersion\Run`	`schtasks /create /delete`

Sync Execution:

# Command executed by scheduler
npm run setup --sync-only --non-interactive

Sync Logic:

Reads last sync state from ~/.code-executor/last-sync.json
Compares current MCP servers with last sync
If changes detected → regenerate wrappers
Update last sync state
Exit 0 (success) or 1 (failure)

10.6 Lock File System

Purpose: Prevent concurrent wizard runs (race condition protection).

Implementation:

class LockFileService {
  private lockPath = '~/.code-executor/setup.lock';

  async acquire(): Promise<void> {
    if (await fs.exists(this.lockPath)) {
      throw new Error('Setup wizard already running');
    }
    await fs.writeFile(this.lockPath, JSON.stringify({
      pid: process.pid,
      timestamp: Date.now()
    }));
  }

  async release(): Promise<void> {
    await fs.unlink(this.lockPath);
  }
}

Protection Against:

Multiple users running setup simultaneously
Concurrent daily sync + manual setup
Race conditions in wrapper file writes

10.7 Security Considerations

Input Validation:

MCP server names: [a-zA-Z0-9_-]+ only (no special chars)
Config paths: No directory traversal (., .., ~/../etc)
Template variables: Escaped before rendering (XSS prevention)

Dangerous Pattern Detection:

MCP names with code injection patterns rejected (not escaped)
Validation happens BEFORE template rendering (defense-in-depth)
Tests: tests/security/template-injection.test.ts (387 lines)

Privilege Escalation:

Wizard runs with user privileges (no sudo/admin required)
Platform schedulers run as current user (not system-wide)
Lock files in user home directory (no /tmp race conditions)

10.8 Component Responsibilities (SRP)

Component	Responsibility	Why Separated
CLIWizard	Interactive prompts, user flow	UI/UX logic separate from business logic
ToolDetector	Detect AI tool installations	Tool-specific logic centralized
MCPDiscoveryService	Scan configs for MCP servers	Config parsing separate from UI
WrapperGenerator	Generate TS/Py wrappers	Code generation separate from discovery
DailySyncService	Daily sync orchestration	Scheduling logic separate from setup
PlatformScheduler	Platform detection	OS-specific logic encapsulated
LockFileService	Concurrent access control	Shared resource protection

10.9 Idempotent Setup Pattern

Design Goal: Safe to run npm run setup multiple times without breaking existing config.

Detection Flow:

1. Check for existing config: ~/.code-executor/config.json
2. If exists:
   - Prompt user: Merge, Reset, Keep existing
   - Merge: Combine old + new MCP servers
   - Reset: Delete old, use new config
   - Keep: Skip setup, exit
3. If not exists:
   - Create new config with defaults

Merge Strategy:

function mergeMCPServers(
  existing: MCPServerConfig[],
  new: MCPServerConfig[]
): MCPServerConfig[] {
  const merged = new Map<string, MCPServerConfig>();

  // Add existing servers
  for (const server of existing) {
    merged.set(server.name, server);
  }

  // Override with new servers (project overrides global)
  for (const server of new) {
    merged.set(server.name, server);
  }

  return Array.from(merged.values());
}

10.10 Performance Characteristics

Operation	First Run	Subsequent Runs	Notes
Tool detection	50-100ms	<10ms	File system checks
MCP discovery	100-200ms	50-100ms	Schema cache helps
Wrapper generation	200-500ms	200-500ms	Template rendering dominant
Daily sync	500ms-1s	500ms-1s	Full re-scan + regeneration

Optimization Opportunities:

Schema cache reduces discovery latency (24h TTL)
Template caching (compile once, render many)
Parallel wrapper generation (Promise.all)

Architecture Validation Checklist

Constitutional Compliance

Principle 1 (Progressive Disclosure): Token impact 0% (3 tools maintained, ~560 tokens)
Principle 2 (Security): Zero tolerance met (auth, rate limit, audit, validation, intentional exception documented)
Principle 3 (TDD): Red-Green-Refactor followed, 95%+ discovery coverage, 90%+ overall
Principle 4 (Type Safety): TypeScript strict mode, no any types (use unknown + guards)
Principle 5 (SOLID): SRP verified (each component single purpose), DIP via abstractions
Principle 6 (Concurrency): AsyncLock on shared resources (cache writes, audit logs)
Principle 7 (Fail-Fast): Descriptive errors with schemas, no silent failures
Principle 8 (Performance): Measurement-driven (<100ms P95 met), parallel queries O(1)
Principle 9 (Documentation): Self-documenting code, WHY comments, architecture.md complete

Quality Metrics

Test Coverage: 95%+ (discovery endpoint), 90%+ (overall), 85%+ (integration)
Performance: P95 <100ms (3 MCP servers), <5ms cached
Security: Auth + rate limit + audit log + validation all enforced
Token Usage: 3 tools, ~560 tokens (within 1.6k budget, 98% reduction maintained)

11. MCP Sampling Architecture (v1.0.0)

Release: v1.0.0 (2025-01-20) Status: Beta Purpose: Enable LLM-in-the-Loop execution for dynamic reasoning and analysis

11.1 Overview

MCP Sampling allows sandboxed code (TypeScript/Python) to invoke Claude during execution through simple helpers (llm.ask(), llm.think()). This enables "Claude asks Claude" scenarios for multi-step reasoning, code analysis, and data processing.

11.2 Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                    AI Agent (Claude/Cursor)                 │
│                                                             │
│  1. Send code with enableSampling: true                     │
└─────────────────────────────────────────────────────────────┘
                    ↓ (executeTypescript/executePython)
┌─────────────────────────────────────────────────────────────┐
│               Code Executor MCP Server                      │
│                                                             │
│  2. Detect sampling enabled                                 │
│  3. Start SamplingBridgeServer                              │
│     - Generate 256-bit bearer token                         │
│     - Start HTTP server on random port (localhost only)     │
│     - Inject llm helpers into sandbox                       │
└─────────────────────────────────────────────────────────────┘
                    ↓ (Start sandbox with bridge URL + token)
┌─────────────────────────────────────────────────────────────┐
│         Sandbox (Deno/Pyodide) with Injected Helpers        │
│                                                             │
│  User Code:                                                 │
│    const result = await llm.ask("Analyze this code...");    │
│                    ↓                                         │
│  4. HTTP POST to bridge: localhost:PORT/sample              │
│     Authorization: Bearer <token>                           │
│     Body: { messages, model, maxTokens, systemPrompt }     │
└─────────────────────────────────────────────────────────────┘
                    ↓ (Bearer token validation)
┌─────────────────────────────────────────────────────────────┐
│           SamplingBridgeServer (Security Layer)             │
│                                                             │
│  5. Security Checks (in order):                             │
│     ✅ Validate Bearer Token (timing-safe comparison)       │
│     ✅ Check Rate Limits (10 rounds, 10k tokens max)        │
│     ✅ Validate System Prompt (allowlist check)             │
│     ✅ Validate Request Schema (AJV deep validation)        │
│                    ↓                                         │
│  6. Forward Request:                                        │
│     ├─ Mode Detection (MCP SDK or Direct API)              │
│     ├─ MCP Sampling (free) - if available                  │
│     └─ Direct Anthropic API (paid) - fallback              │
└─────────────────────────────────────────────────────────────┘
                    ↓ (Claude API call)
┌─────────────────────────────────────────────────────────────┐
│              Claude API (Anthropic)                         │
│                                                             │
│  7. Process Request:                                        │
│     - Model: claude-sonnet-4-5 (default)                   │
│     - Response: { content, stop_reason, usage }            │
└─────────────────────────────────────────────────────────────┘
                    ↓ (Return response)
┌─────────────────────────────────────────────────────────────┐
│           SamplingBridgeServer (Post-Processing)            │
│                                                             │
│  8. Content Filtering:                                      │
│     ✅ Scan for secrets (OpenAI keys, GitHub tokens, AWS)  │
│     ✅ Scan for PII (emails, SSNs, credit cards)           │
│     ✅ Redact violations: [REDACTED_SECRET]/[REDACTED_PII] │
│                    ↓                                         │
│  9. Audit Logging:                                          │
│     ✅ SHA-256 hash of prompt/response (no plaintext)      │
│     ✅ Log: timestamp, model, tokens, duration, violations  │
│     ✅ Write to: ~/.code-executor/audit-log.jsonl          │
│                    ↓                                         │
│  10. Update Metrics:                                        │
│      - Increment round counter                              │
│      - Add tokens to cumulative budget                      │
│      - Calculate quota remaining                            │
└─────────────────────────────────────────────────────────────┘
                    ↓ (Return filtered response)
┌─────────────────────────────────────────────────────────────┐
│         Sandbox (Continue Execution)                        │
│                                                             │
│  User Code:                                                 │
│    console.log(result); // Claude's filtered response       │
│                    ↓                                         │
│  11. Execution completes, bridge shuts down gracefully      │
└─────────────────────────────────────────────────────────────┘
                    ↓ (Return execution result)
┌─────────────────────────────────────────────────────────────┐
│               Code Executor MCP Server                      │
│                                                             │
│  12. Return to AI Agent:                                    │
│      {                                                      │
│        success: true,                                       │
│        output: "...",                                       │
│        samplingCalls: [...],  // Array of all LLM calls    │
│        samplingMetrics: {                                   │
│          totalRounds: 2,                                    │
│          totalTokens: 150,                                  │
│          totalDurationMs: 1200,                             │
│          averageTokensPerRound: 75,                         │
│          quotaRemaining: { rounds: 8, tokens: 9850 }       │
│        }                                                    │
│      }                                                      │
└─────────────────────────────────────────────────────────────┘

11.3 Core Components

11.3.1 SamplingBridgeServer

Purpose: Ephemeral HTTP bridge between sandbox and Claude API with security enforcement

Responsibilities:

Lifecycle Management
- Start: Generate bearer token, find random port, start HTTP server
- Stop: Drain active requests (max 5s), close server gracefully
- Lifecycle: One bridge per execution, destroyed after completion
Security Enforcement
- Bearer token validation (timing-safe comparison)
- Rate limiting (rounds and tokens)
- System prompt allowlist validation
- Content filtering (secrets/PII redaction)
Request Proxying
- Mode detection: MCP SDK (free) or Direct API (paid)
- Request forwarding with proper authentication
- Response filtering and audit logging

Key Methods:

start(): Promise<{port, authToken}> - Start bridge server
stop(): Promise<void> - Graceful shutdown with request draining
getSamplingMetrics(): Promise<SamplingMetrics> - Get current metrics
handleRequest(req, res) - HTTP request handler (private)

Configuration:

interface SamplingConfig {
  enabled: boolean;                  // Enable/disable sampling
  maxRoundsPerExecution: number;     // Max LLM calls (default: 10)
  maxTokensPerExecution: number;     // Max tokens (default: 10,000)
  timeoutPerCallMs: number;          // Timeout per call (default: 30,000ms)
  allowedSystemPrompts: string[];    // Prompt allowlist
  contentFilteringEnabled: boolean;  // Enable filtering (default: true)
}

11.3.2 RateLimiter

Purpose: Prevent infinite loops and resource exhaustion

Implementation:

Round Counter: Tracks number of sampling calls
Token Budget: Cumulative token count across all calls
AsyncLock Protection: Thread-safe counters for concurrent access
Quota Calculation: Real-time remaining rounds/tokens

Methods:

async checkLimit(tokensRequested): Promise<{exceeded, metrics}> - Check if request would exceed limits
async incrementUsage(tokensUsed): Promise<void> - Increment counters after successful call
async getMetrics(): Promise<{roundsUsed, tokensUsed}> - Get current usage
async getQuotaRemaining(): Promise<{rounds, tokens}> - Get remaining quota

Test Coverage:

✅ T033-T036: Rate limiting tests (10 rounds, 10k tokens, 429 responses)
✅ T037: Concurrent access protection (AsyncLock verification)

11.3.3 ContentFilter

Purpose: Detect and redact secrets/PII from Claude responses

Patterns Detected:

Secrets: OpenAI keys (sk-*), GitHub tokens (ghp_*), AWS keys (AKIA*), JWT tokens (eyJ*)
PII: Emails, SSNs, credit card numbers

Methods:

scan(content): {violations, filtered} - Detect violations and return redacted content
filter(content, rejectOnViolation): string - Filter with optional rejection mode
hasViolations(content): boolean - Quick check for any violations

Redaction Format:

Secrets: [REDACTED_SECRET]
PII: [REDACTED_PII]

Test Coverage:

✅ T022-T026: Pattern detection tests (98%+ coverage)
✅ T115: Secret leakage redaction verification

11.3.4 SamplingAuditLogger

Purpose: Log all sampling calls for security auditing and compliance

Log Format (JSONL):

{
  "timestamp": "2025-01-20T12:00:00.000Z",
  "executionId": "exec-123",
  "round": 1,
  "model": "claude-sonnet-4-5",
  "promptHash": "sha256:abc123...",
  "responseHash": "sha256:def456...",
  "tokensUsed": 75,
  "durationMs": 600,
  "status": "success",
  "contentViolations": [
    { "type": "secret", "pattern": "openai_key", "count": 1 }
  ]
}

Key Features:

SHA-256 Hashing: No plaintext secrets in logs
AsyncLock Protection: Thread-safe concurrent writes
JSONL Format: One entry per line, easy to parse
Location: ~/.code-executor/audit-log.jsonl

Test Coverage:

✅ T082-T084: Audit logging tests (13/13 passing)

11.4 API Design

11.4.1 TypeScript API (Deno Sandbox)

Simple Query:

const response = await llm.ask("What is 2+2?");
// Returns: "4"

Multi-Turn Conversation:

const response = await llm.think({
  messages: [
    { role: "user", content: "What is 2+2?" },
    { role: "assistant", content: "4" },
    { role: "user", content: "What about 3+3?" }
  ],
  model: "claude-sonnet-4-5",  // Optional
  maxTokens: 1000,              // Optional
  systemPrompt: "",             // Optional (must be in allowlist)
  stream: false                 // Optional (not yet supported)
});
// Returns: "6"

11.4.2 Python API (Pyodide Sandbox)

Simple Query:

response = await llm.ask("What is 2+2?")
# Returns: "4"

Multi-Turn Conversation:

response = await llm.think(
    messages=[
        {"role": "user", "content": "What is 2+2?"},
        {"role": "assistant", "content": "4"},
        {"role": "user", "content": "What about 3+3?"}
    ],
    model="claude-sonnet-4-5",  # Optional
    max_tokens=1000,             # Optional (snake_case for Python)
    system_prompt="",            # Optional (must be in allowlist)
    stream=False                 # Optional (not supported in Pyodide)
)
# Returns: "6"

11.5 Security Model

11.5.1 Threat Matrix

Threat	Likelihood	Impact	Mitigation	Test
Infinite loop API cost	High	High	Rate limiting (10 rounds)	T112 ✅
Token exhaustion	Medium	High	Token budget (10k tokens)	T113 ✅
Prompt injection	Medium	Medium	System prompt allowlist	T114 ✅
Secret leakage	Low	Critical	Content filtering + SHA-256 logs	T115 ✅
Timing attacks	Low	Medium	Constant-time comparison	T116 ✅
Unauthorized access	Low	Medium	Bearer token + localhost binding	T014/T011 ✅

11.5.2 Defense Layers

Authentication Layer: 256-bit bearer token (unique per execution)
Rate Limiting Layer: 10 rounds, 10,000 tokens per execution
Validation Layer: System prompt allowlist, AJV schema validation
Content Filtering Layer: Secrets/PII redaction before returning
Audit Layer: SHA-256 hashed logs for forensic analysis

11.6 Performance Characteristics

Metric	Target	Measured	Status
Bridge startup time	<50ms	~30ms	✅ PASS
Per-call overhead	<100ms	~60ms	✅ PASS
Memory footprint	<50MB	~15MB	✅ PASS
Token validation	<10ms	~5ms	✅ PASS
Content filtering	<50ms	~15ms	✅ PASS

11.7 Configuration Hierarchy

Priority (highest to lowest):

Per-execution parameters (enableSampling, maxSamplingRounds, maxSamplingTokens)
Environment variables (CODE_EXECUTOR_SAMPLING_ENABLED, CODE_EXECUTOR_MAX_SAMPLING_ROUNDS)
Configuration file (~/.code-executor/config.json)
Default values (enabled: false, maxRounds: 10, maxTokens: 10,000)

11.8 Hybrid Architecture (MCP SDK vs Direct API)

Mode Detection:

detectSamplingMode(): 'mcp' | 'direct' {
  if (this.mcpServer && typeof this.mcpServer.request === 'function') {
    return 'mcp';  // MCP SDK available (free)
  }
  return 'direct';  // Fallback to Direct API (paid)
}

MCP SDK Mode (Free):

Uses Claude Desktop's MCP SDK for sampling
No additional API costs
Requires Claude Desktop with MCP support

Direct API Mode (Paid):

Uses Anthropic API directly
Requires ANTHROPIC_API_KEY
Pay-per-token pricing

User Experience:

Automatic detection and fallback
Clear logging of which mode is active
Same API surface regardless of mode

11.9 Docker Support

Detection:

Checks for /.dockerenv file
Checks for Docker cgroup signatures in /proc/self/cgroup

Bridge URL Handling:

Host execution: http://localhost:PORT
Docker execution: http://host.docker.internal:PORT

Docker Compose Example:

services:
  code-executor:
    image: aberemia24/code-executor-mcp:1.0.0
    environment:
      - CODE_EXECUTOR_SAMPLING_ENABLED=true
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    extra_hosts:
      - "host.docker.internal:host-gateway"

11.10 Test Coverage

Total Sampling Tests: 74/74 passing (100%)

Component	Tests	Status
Bridge Server	15/15	✅ PASS
Content Filter	8/8	✅ PASS
TypeScript API	4/4	✅ PASS
Python API	3/3	✅ PASS
Config Schema	23/23	✅ PASS
Audit Logging	13/13	✅ PASS
Security Attacks	8/8	✅ PASS

Key Tests:

T010-T016: Bridge server lifecycle (startup, shutdown, token validation)
T022-T026: Content filtering (secrets, PII detection and redaction)
T033-T037: Rate limiting (rounds, tokens, concurrent access)
T044-T047: System prompt allowlist validation
T053-T056: TypeScript sampling API
T063-T066: Python sampling API
T082-T084: Audit logging with SHA-256 hashes
T112-T116: Security attack tests (infinite loop, token exhaustion, prompt injection, secret leakage, timing attacks)

11.11 Design Rationale

Why Ephemeral Bridge Server?

Security: Unique bearer token per execution prevents cross-execution attacks
Isolation: Localhost binding ensures no external access
Lifecycle: Bridge destroyed after execution, no lingering processes

Why Rate Limiting?

Cost Control: Prevent infinite loops from causing API cost explosions
Resource Management: Prevent token exhaustion from overwhelming Claude API
User Protection: Default limits protect users from accidental abuse

Why Content Filtering?

Secret Protection: Prevent API keys, tokens, credentials from leaking into logs
Compliance: PII redaction helps meet privacy regulations (GDPR, CCPA)
Defense-in-Depth: Even if Claude accidentally generates secrets, they're redacted

Why System Prompt Allowlist?

Prompt Injection Defense: Prevents attackers from bypassing security via custom system prompts
Controlled Behavior: Ensures Claude operates within intended parameters
Auditability: Limited set of prompts makes behavior predictable

Why SHA-256 Audit Logs?

Forensics: Enable investigation of security incidents without exposing secrets
Deduplication: Same prompt = same hash, enables pattern detection
Compliance: Meets audit requirements without storing plaintext data

Document Version: 1.2.0 (Added MCP Sampling Architecture for v1.0.0) Contributors: Alexandru Eremia Last Review: 2025-11-19

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture Documentation

Table of Contents

1. System Overview

Problem Statement

Solution

2. Core Components

2.1 Component Diagram

2.2 Component Responsibilities

3. Progressive Disclosure Architecture

3.1 Token Budget Preservation

3.2 Two-Tier Access Model

3.3 Output Schema Support (NEW v0.6.0)

3.4 OutputSchema Protocol Support (v0.7.1+)

✅ RESOLVED: MCP SDK v1.22.0 Native Support

4. Security Architecture

4.1 Security Boundaries

4.2 Security Trade-Off: Discovery Allowlist Bypass

5. Discovery System (NEW v0.4.0)

5.1 Discovery Architecture

5.2 Discovery Functions

discoverMCPTools(options?)

getToolSchema(toolName)

searchTools(query, limit?)

5.3 Parallel Query Pattern

5.4 Timeout Strategy

6. Pyodide WebAssembly Sandbox (Python Executor)

6.1 Security Resolution: Issues #50/#59

6.2 Pyodide Architecture

6.3 Two-Phase Execution Pattern

6.4 Global Pyodide Cache

6.5 Security Boundaries

6.6 Limitations & Trade-offs

6.7 Industry Validation

7. Data Flow

7.1 Tool Execution Flow (Existing v0.3.x)

6.2 Tool Discovery Flow (NEW v0.4.0)

6.3 Schema Caching Flow

7. Concurrency & Performance

7.1 Concurrency Safety (AsyncLock)

7.2 Performance Characteristics

7.3 Memory & Storage

8. Design Decisions

8.1 Why Progressive Disclosure?

8.2 Why Parallel Queries?

8.3 Why 500ms Timeout?

8.4 Why Bypass Allowlist for Discovery?

8.5 Why Schema Cache with 24h TTL?

9. Resilience Patterns (v0.5.0)

9.1 Circuit Breaker Pattern

9.2 Connection Pool Overflow Queue

9.3 Resilience Pattern Interaction

9.4 Backpressure Signaling

9.5 Performance Impact

10. CLI Setup Wizard Architecture (v0.9.0)

10.1 Overview

10.2 Component Diagram

10.3 Config Discovery & Merging

10.4 Wrapper Generation

10.5 Daily Sync System

10.6 Lock File System

10.7 Security Considerations

10.8 Component Responsibilities (SRP)

10.9 Idempotent Setup Pattern

10.10 Performance Characteristics

Architecture Validation Checklist

Constitutional Compliance

Quality Metrics

11. MCP Sampling Architecture (v1.0.0)

11.1 Overview

11.2 Architecture Diagram

11.3 Core Components

11.3.1 SamplingBridgeServer

11.3.2 RateLimiter

11.3.3 ContentFilter