Skip to content

Experimental: MCP server for Genie Code integration#156

Open
datasciencemonkey wants to merge 40 commits intomainfrom
coda-mcp
Open

Experimental: MCP server for Genie Code integration#156
datasciencemonkey wants to merge 40 commits intomainfrom
coda-mcp

Conversation

@datasciencemonkey
Copy link
Copy Markdown
Owner

Summary

  • Adds MCP server endpoint at /mcp for Databricks Genie Code integration
  • 5 MCP tools: create_session, run_task, get_status, get_result, close_session
  • File-based session/task state for stateless HTTP transport
  • 57 tests passing (37 unit + 15 server + 5 integration)

Status

Experimental — needs real-world testing with Genie Code on a deployed Databricks App.

Disk-based state manager for MCP sessions and tasks.
Pure Python module with no Flask dependency — just file I/O.

Manages session directories at ~/.coda/sessions/{session-id}/
with tasks as subdirectories containing prompt.txt, status.jsonl,
and result.json. Includes SessionBusyError/SessionNotFoundError
exceptions and the ---CODA-TASK--- prompt wrapping convention.

37 tests covering full session/task lifecycle, edge cases,
and error handling — all using tmp_path isolation.
Implements coda_create_session, coda_run_task, coda_get_status,
coda_get_result, and coda_close_session via FastMCP with ToolAnnotations.
Delegates disk state to task_manager.py; PTY ops via optional app hooks.
Background watcher thread polls for result.json with timeout support.
Includes 15 tests covering tool registration, disk-only mode, PTY hook
integration, busy-session errors, and all CRUD paths.
Exercises the full MCP flow with mocked PTY hooks:
- Happy-path: create session, run task, poll status, get result, close
- Busy session rejects second task
- context_hint=new_topic written to prompt.txt
- permissions=yolo produces --yolo flag
- Closing nonexistent session returns error
Replace the 5-tool poll-heavy MCP API with a 3-tool fire-and-forget model:
- coda_run: auto-creates ephemeral session, returns immediately
- coda_inbox: dashboard of all background tasks (no polling needed)
- coda_get_result: pull full structured result for completed tasks

Key changes:
- Sessions are ephemeral (auto-close on task completion)
- Task chaining via previous_session_id (reads prior session results)
- meta.json tracks task metadata for inbox scanning
- Concurrency limit configurable via CODA_MAX_CONCURRENT env var
- 24h TTL cleanup for expired sessions
- Hermes instructions updated for ephemeral sessions + prior context
- 22 tests covering full flow, chaining, concurrency, auto-close, cleanup
Documents the 3-tool fire-and-forget + inbox pattern with sequence
diagram, data model, tool reference, migration guide, and limitations.
uvicorn + mcp_asgi.py wraps Flask in Starlette's WSGIMiddleware, which
asserts scope["type"] == "http" — WebSocket upgrades (scope type
"websocket") cause AssertionError, forcing Socket.IO to fall back to
HTTP polling with visible jank.

gunicorn + gthread + simple-websocket handles WebSocket natively.
MCP is already served via Flask Blueprint (mcp_endpoint.py) at /mcp —
no ASGI bridge needed.
Three tests assumed v1 behavior (long-lived, reusable sessions):
- test_marks_session_idle → test_marks_session_closed (sessions auto-close)
- test_can_create_new_task_after_complete → test_closed_session_rejects_new_task
- test_multiple_completed_tasks_accumulate → test_multiple_tasks_across_sessions
  (each task gets its own session, verified via list_all_tasks)
Gateway discovery (3): Added SKIP_CLAUDE_INSTALL env var to bypass
curl|bash in tests. Replaced vacuous `if settings_path.exists()` guards
with `assert` so missing files fail loudly instead of silently passing.

Session detach (3): Mocked subprocess.run (pgrep/ps) in process
detection tests — sandbox blocks sysmon access. Mocked pty.openpty in
EOF cleanup test — sandbox denies /dev/pty allocation.

npm version (1): Added functional npm probe to skip condition — npm
cache is root-owned on this machine, so npm commands fail with EPERM.

task_manager (3): Already fixed in prior commit — tests updated for v2
ephemeral session model.
Reduces root-level clutter by organizing 8 setup_*.py files into setup/
and 3 install_*.sh files into scripts/. Updated all subprocess paths in
app.py, added PYTHONPATH injection in _run_step() so setup scripts can
still import from utils.py at repo root, and updated test path references.

275 tests passing. Post-commit hook unchanged (references sync_to_workspace.py
at $APP_DIR root).
Moves mcp_server.py, mcp_endpoint.py, mcp_asgi.py, and task_manager.py
into a coda_mcp/ package. Uses coda_mcp (not mcp/) to avoid shadowing
the pip mcp package used by FastMCP imports. Updated all cross-imports
in source and test files.

275 tests passing.
- Updated project structure tree for setup/, scripts/, coda_mcp/ layout
- Added CoDA MCP server section with value proposition and usage examples
  for Genie Code, Claude Desktop, Cursor, and any MCP client
- Added /mcp to API endpoints table
- Fixed setup_mlflow.py path reference
- Updated CLAUDE.md with CoDA MCP server entry
- MLflow tracing: README said MLFLOW_CLAUDE_TRACING_ENABLED=true but
  code sets "false" (intentional per b8a06c9). Updated README to match.
- Parallel setup: README said "7" but code runs 6 parallel + 1 sequential.
  Fixed to "6".
- Skills count: README said 39 but directory has 43 (4 BDD skills were
  unlisted). Updated badge, heading, and added BDD skills table.
- CLAUDE.md: updated skills count to 43, MCP servers to 3.
Security audit findings:
- Removed _check_origin() from mcp_endpoint.py — was defined but never
  called, creating false confidence that origin validation existed.
  Removed unused os and ensure_https imports.
- Added os.chmod(path, 0o600) to all config file writes in cli_auth.py
  (settings.json, auth.json, .env, config.yaml) so tokens aren't
  world-readable. Matches pat_rotator.py's existing chmod on
  ~/.databrickscfg.
…run_step

Closes critical and high test coverage gaps identified by audit:
- content_filter_proxy.py: 45 tests covering message sanitization,
  orphaned tool_result stripping, SSE streaming, tool name remapping,
  token caching
- sync_to_workspace.py: 11 tests covering path-escape guard, OAuth
  env stripping, config reading, error handling
- _run_step (app.py): 7 tests covering DATABRICKS_CLIENT_ID/SECRET
  stripping, PYTHONPATH injection, PATH setup

275 → 338 tests passing.
The PAT reconfiguration path (line 329) runs setup scripts via
subprocess.run but didn't inject PYTHONPATH like _run_step does.
After the Tier 1 move to setup/, the scripts couldn't resolve
`from utils import ...` during PAT rotation reconfiguration.
Covers the PAT reconfiguration subprocess path that was missing
PYTHONPATH injection — the exact bug caught in production.
Genie Code requires FastMCP's native transport (streamable_http_app)
per docs. The Flask Blueprint reimplementation at /mcp didn't satisfy
the MCP protocol expectations, causing "MCP server could not be added".

Switch app.yaml from gunicorn to uvicorn with mcp_asgi.py which mounts
FastMCP natively at /mcp and Flask via WSGIMiddleware for everything else.
WebSocket falls back to HTTP polling under ASGI (documented, works).
WSGIMiddleware cannot handle WebSocket upgrades, causing Socket.IO to
fall back to HTTP polling under uvicorn. Add a python-socketio
AsyncServer that intercepts /socket.io/ at the ASGI level before
WSGIMiddleware, enabling native WebSocket alongside MCP.

Architecture: socketio.ASGIApp → mcp_starlette(/mcp) → WSGI(Flask)
python-socketio 5.16.1 uses other_asgi_app, not other_app.
Databricks Apps proxy injects identity headers (X-Forwarded-Email) on
HTTP requests but not on WebSocket upgrade requests. Starting with
polling ensures auth succeeds during the HTTP handshake, then Socket.IO
transparently upgrades to WebSocket without re-triggering auth.

Also adds diagnostic logging to the ASGI connect handler to trace
proxy header presence on future connection issues.
The app's own URL (mcp-test-coda-*.databricksapps.com) differs from
DATABRICKS_HOST (workspace URL). Socket.IO was rejecting the app origin
as not in ALLOWED_ORIGINS. Since Databricks proxy handles authentication,
Socket.IO CORS can safely use '*'.
Make fire-and-forget pattern unmistakable in both server instructions
and coda_run docstring. Explicitly tell LLM clients: do NOT follow up
with coda_inbox after submitting — only check when user asks.
Databricks Apps proxy requires OAuth, not PATs. This bridge script
translates between Claude Code's stdio MCP transport and the app's
Streamable HTTP endpoint, injecting fresh OAuth tokens via
`databricks auth token` on each request.

Config via env vars (CODA_MCP_URL, DATABRICKS_PROFILE) in
Claude Code settings.json — no hardcoded values in the script.
Databricks Apps use OAuth, not PATs. Updated the MCP client section
to document the stdio bridge approach (tools/coda-bridge.py) and
added tools/ to the project structure.
Prevents Hermes from executing destructive operations (DROP, DELETE,
truncate, CLI deletes, permission changes) via prompt-level instructions.
Destructive ops require explicit approval via needs_approval status.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant