Skip to content

feat(DRC-3526): MCP App widget POC — 15 widget tools + config-install helper#1397

Open
iamcxa wants to merge 45 commits into
mainfrom
iamcxa/recce-mcp-ui-integration
Open

feat(DRC-3526): MCP App widget POC — 15 widget tools + config-install helper#1397
iamcxa wants to merge 45 commits into
mainfrom
iamcxa/recce-mcp-ui-integration

Conversation

@iamcxa

@iamcxa iamcxa commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Recce MCP App POC (DRC-3526). Ships 15 widget-enabled MCP tools rendered as interactive HTML widgets in Claude Desktop and other MCP App-capable hosts, plus a recce mcp-config-install helper that auto-writes the Claude Desktop config so POC users skip the hand-edited JSON.

Architecture is a parallel FastMCP-based widget server (recce mcp-widget-server) alongside the existing low-level recce mcp-server, coordinated by RECCE_MCP_WIDGETS=1. Day 0 spike found low-level mcp.server.Server doesn't render widgets in Claude Desktop while mcp.server.fastmcp.FastMCP does — hence two servers rather than upgrading the existing one. Full FastMCP migration is deferred (design Open Q #6; now at 15/20 = 75% widget coverage, past the ≥50% reconsideration threshold).

Demo journeys (recorded)

Both journeys use plain reviewer-style prompts — no mcp__recce-widgets__<tool> tool-naming — to validate that a non-technical reviewer's natural phrasing reliably triggers the right widget.

Demo 1 — Quick scan (Journey A) · Loom

Replaces the AI-summary "block of text" with inline widgets so a reviewer grasps "what changed" in seconds. PR scenario: jaffle_shop_golden#15 (refactor customers.sql → two intermediate models).

Prompt 1 → row_count_diff widget:

I'm reviewing a refactor PR that splits customers.sql into two new intermediate
models. Before I approve, I want to verify that row counts haven't changed
across any of the affected models. Can you compare row counts for the models
modified in this PR?

Prompt 2 → schema_diff widget:

Now show me what columns changed — anything added, removed, or with a
different type — across the same modified models.

Demo 2 — Investigate impact (Journey 2) · Loom

Blast-radius summary → evidence drilldown, with the agent following the next_action advisory hint. PR scenario: jaffle_shop_golden#16 (amount DOUBLE → DECIMAL on stg_payments).

Prompt 1 → impact_analysis widget:

I'm reviewing a PR that changes the amount column type in stg_payments from
DOUBLE to DECIMAL(10,2). Can you show me the blast radius — which downstream
models are affected, and how serious the impact is?

Prompt 2 → profile_diff widget:

Following the next_action hint from the impact analysis, let me see the actual
statistical profile of the amount column in stg_payments — compare the mean,
distribution, null rate, and other stats between base and current to confirm
this is just a rounding effect within expected range.

Fixed in this PR (commit f692f086): impact_analysis's downstream value_diff step derived its per-column diff list from the CURRENT relation only (get_modelget_columns(base=False)), then applied b."col" IS DISTINCT FROM c."col" to both the base (b) and current (c) relations. When a column had drifted — present in current but not base — Snowflake raised a Binder Error (invalid identifier 'B."full_name"' / Table "b" does not have a column named "full_name"). The builder now diffs only the intersection of base ∩ current columns via get_model(node_id, base=True) (which manages its own warehouse connection), mirroring the canonical ValueDiffTask; drifted columns are reported via schema_changes, not fed into the diff expression, and value_diff is skipped when the PK itself has drifted. This was a pre-existing value_diff correctness bug surfaced by the demo, not introduced by the widget work. Verified by a new regression unit test (asserts drifted columns never reach the generated SQL) plus a read-only A/B against live Snowflake (old SQL shape → 000904 invalid identifier; intersected shape → executes clean).

What's in this PR

New surface:

  • recce mcp-widget-server CLI subcommand (local mode only — iter 1 scope)
  • recce mcp-config-install CLI helper — writes both server entries + RECCE_MCP_WIDGETS=1 into claude_desktop_config.json (macOS, iter 1 scope), with --dry-run and --yes, and a backup of the existing config
  • recce/widget_server.py — FastMCP server with 15 @mcp.tool delegates + matching @mcp.resource handlers, idiomatic shape (Pydantic input/output models, CallToolResult with short content + structuredContent, tool annotations, ext-apps SDK theme helpers)
  • recce/data/mcp/*.html — 15 self-contained HTML widgets, each with exhaustive @media (prefers-color-scheme: dark) overrides and Claude design-token theming
  • WIDGET_TOOLS filter in recce/mcp_server.py — when RECCE_MCP_WIDGETS=1, removes the 15 widget-eligible tools from the main server's list_tools (defers to widget server)

15 widgets by tier:

Phase Tier Tools
A 1–2 row_count_diff, schema_diff, get_server_info, list_checks, get_model
B 3 query, query_diff, value_diff, value_diff_detail, top_k_diff
C 4 (charts) histogram_diff, profile_diff
D 5 (mini-graph) get_cll, impact_analysis
E 6 (DAG, v1) lineage_diff — hand-roll SVG, hard 10-node cap, graceful empty-state over cap

Bug fixes (pre-existing, surfaced by the POC):

  • recce/track.py print(traceback.format_exc())file=sys.stderr — stdout traceback is fatal for any stdio MCP server when RecceConfig fails
  • recce mcp-server / mcp-widget-server now os.chdir(--project-dir) before RecceConfig, so Claude Desktop's cwd=/ spawn finds the dbt project + recce.yml (removes the bash-wrapper workaround)
  • SchemaChange class-name collision — Phase D's impact_analysis defined a second module-level SchemaChange that shadowed schema_diff's model, breaking schema_diff serialization with a Pydantic "column / change_status Field required" error. Renamed to ColumnSchemaChange; regression test pins both field surfaces.

Quality / docs:

  • tests/test_widget_server.py — 35 tests (registration via public FastMCP API, env-var coordination 18-vs-20 tools, file-missing graceful degradation, CallToolResult shape, Pydantic structuredContent round-trip, annotations, collision guard, lineage_diff 10-node-cap branch)
  • tests/test_mcp_config_install.py — 5 tests (project-dir validation, dry-run no-write, backup creation)
  • docs/mcp-widgets.md — template doc for Scott's hand-off: file layout, config example, "Add widget N+1" walkthrough, structuredContent contract, gotchas, Python-vs-TypeScript SDK note

Test plan

  • pytest tests/test_widget_server.py → 35 passed
  • pytest tests/test_mcp_config_install.py → 5 passed
  • pytest tests/ -k "mcp or widget or schema_diff or impact or lineage" → 363 passed, no widget-related regressions
  • Cwd=/ smoke (recce mcp-widget-server --project-dir <dbt>) — exit 0, clean stderr, no stdout pollution
  • Manual end-to-end in Claude Desktop — Demo 1 + Demo 2 recorded (Looms above); widgets render with real data, dark mode correct, plain prompts trigger correct tools
  • Manual end-to-end in Cowork (parallel OAuth POC, DRC-3527 — not a blocker)

Note: tests/test_server.py::test_spa_route_lineage fails locally because the frontend SPA build artifact (recce/data/lineage/index.html) isn't present in this worktree — pre-existing, unrelated to this PR (fails identically with the diff stashed; resolved by cd js && pnpm run build).

User-facing changes

For Claude Desktop POC users:

  • Install config the easy way: recce mcp-config-install --project-dir /path/to/dbt (writes both entries + env var, backs up existing config). Or hand-edit per docs/mcp-widgets.md.
  • RECCE_MCP_WIDGETS=0/unset → zero change, all 20 tools text-only from recce mcp-server, backward compatible.
  • RECCE_MCP_WIDGETS=1recce mcp-server serves 5 non-widget tools, recce mcp-widget-server serves the 15 widget tools. No name collision.
  • Other MCP hosts (no MCP Apps support) → graceful degradation: host ignores _meta.ui.resourceUri, uses the tool's one-sentence content text.

What's NOT in this PR

  • Cloud session / snapshot support for recce mcp-widget-server (iter 1 = local mode only). This is the strategic next step — MCP ↔ Recce Cloud is what lets non-technical users skip git/dbt entirely and is where PR-selection + artifact-packaging are properly solved (not as extra local tools).
  • Full FastMCP migration collapsing the two-server architecture (design Open Q [Chore] Add flake8 style check #6; now warranted at 75% coverage)
  • lineage_diff beyond v1 — progressive degradation for >10 nodes (shrink / zoom / aggregate) and the ReactFlow @datarecce/ui/lineage borrow contract
  • Lifespan refactor (_recce_server still module-global)
  • OAuth POC for Cowork distribution (Kent Huang DRC-3527, parallel)

Reviewer notes

  • docs/mcp-widgets.md is the canonical Scott hand-off doc — if the "Add widget N+1" walkthrough leaves a cold reader unsure, that's PR-blocking feedback.
  • Commit history is deliberately preserved (not squashed) — each commit is one widget / fix / refactor step with meaningful diff boundaries; review in order.
  • CSS follows project conventions: space-separated rgb(), no @datarecce/ui internal-path imports (widget HTML is plain JS).

🤖 Generated with Claude Code


Review follow-up — deferred items (⚠️ this PR is NOT rushing to merge)

A full review pass (kc-pr-review + kc-pr-review-resolve) ran on d089e737. Actionable fixes are committed (65ca0739d089e737): ~ path expansion + atomic config write in mcp-config-install, case-insensitive RECCE_MCP_WIDGETS, stdio-safe --debug traceback, robust recce/data/** gitignore (nested build artifacts now ignored; recce/data/mcp/ source tracked), single-quote HTML escaping across all 15 widgets, rel="noopener noreferrer" on the PR link, widget HTML load anchored on the recce package (no longer depends on namespace-package resolution → reliable under pip install), and doc/test-count sync.

even-wei's NO-GO blocker (the test_widget_server.py black/isort format gate) was already resolved in 230af0f0 and is re-verified clean on the current head (black/isort/flake8 exit 0).

The items below are intentionally deferred — this is an iter-1 POC whose goal is breadth (a widget for every tool), not hardened design. Each is marked with a TODO(DRC-3526 follow-up) comment in code; a DRC follow-up ticket will track them:

  • Widget tools bypass the central call_tool error classifier/telemetry (DRC-2754). Errors still surface (FastMCP wraps them); what's deferred is expected-error classification (table_not_found/permission_denied → warning) + Sentry metrics for the 15 widget tools. Resolve when the two-server architecture is consolidated. — recce/widget_server.py
  • impact_analysis value_diff drift fix lacks a real-DuckDB e2e. Covered by a mock test (drifted column absent from generated SQL) + a manual Snowflake A/B; a real-adapter e2e with a PK + a column present only in current is the gap. — recce/mcp_server.py
  • schema_diff has no single-env warning. The empty-state ("No models found") can read as a clean diff when there is simply no base to compare against. — recce/widget_server.py
  • Type contracts use free str for fixed-set fields (status / change_status / priority) — batch a Literal/submodel tightening pass. — recce/widget_server.py
  • impact_analysis doubles base-side warehouse introspection (the cost of the binder-error correctness fix) — consider caching/batching get_model(base=True). — even-wei N3
  • 15 copy-pasted escapeHtml/svgEl helpers — extract a shared asset to keep the escaping contract single-sourced. — even-wei N5
  • check_base_freshness SHA-freshness is opt-in with no production caller (the stale_sha/expected_base_sha path is a tested opt-in API, not pure dead code). Decide in a focused PR: leave as-is, wire a caller, or remove SHA-freshness — it touches the unrelated recce check-base command + tests, so it is out of scope here. — recce/cli.py

Pre-merge check: run a packaged-wheel smoke test (pip install the built wheel, start recce mcp-widget-server, load one widget) to confirm recce/data/mcp/ ships and resolves under the installed layout.

iamcxa and others added 13 commits May 22, 2026 10:06
Added during the /sync-gbrain session that backfilled embeddings after
Conductor's GSTACK_OPENAI_API_KEY shim was wired up. The .gbrain-source
file (gitignored) pins this worktree to its scoped code source for
kubectl-style routing. CLAUDE.md gets a new "GBrain Search Guidance"
section so future agents in this worktree prefer gbrain over Grep for
semantic queries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…_TOOLS filter (iter 1 Day 1)

Per design-20260521-234647 amended — adds a parallel FastMCP-based widget server
that delegates to RecceMCPServer's existing _tool_row_count_diff and
_tool_schema_diff methods, gated by RECCE_MCP_WIDGETS=1 env var.

- recce/widget_server.py: NEW, FastMCP("recce-widgets") with @mcp.tool delegates
  and @mcp.resource handlers (graceful when recce/data/mcp/*.html missing).
- recce/cli.py: NEW `recce mcp-widget-server` subcommand (local mode only).
- recce/mcp_server.py: WIDGET_TOOLS set, _widgets_enabled helper, list_tools
  filter, call_tool defensive raise.

Day 1 scope only — widget HTML assets + tests deferred to Day 1.5 / Day 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…eedback)

Fix #1: run_widget_server() is synchronous — mcp.run(transport="stdio")
manages its own asyncio event loop internally. asyncio.run(run_widget_server(...))
raised ValueError because the function returns None, not a coroutine.
Drop asyncio.run() in cli.py and remove the unused asyncio import.

Fix #2: replace **arguments with typed params so FastMCP infers a proper
JSON inputSchema for tools/list. Without typed params, Claude Desktop sees
the tool registered but cannot construct a tools/call (no schema to fill).
Params translated from existing low-level Tool inputSchema definitions:
- row_count_diff: node_names, node_ids, select, exclude (all Optional)
- schema_diff: select, exclude, packages (all Optional)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
- recce/data/mcp/row_count_diff.html: NEW, ~110 LOC tier 1 widget (status
  pills + diff numbers; base/curr null guards using === null)
- recce/data/mcp/schema_diff.html: NEW, ~110 LOC tier 2 widget (added/
  removed/type_changed sections; empty state shows unchanged_count)
- recce/mcp_server.py: extract _compute_schema_changes() helper that returns
  rich per-model dict {added, removed, type_changed, unchanged_count};
  existing _tool_schema_diff flattens output back to DataFrame for low-level
  mcp-server consumers — zero regression on existing response shape
- recce/widget_server.py: row_count_diff delegate wraps result as
  {"models": result}; schema_diff delegate calls _compute_schema_changes
  directly and returns {"models": rich_result} for widget consumption
- .gitignore: replace broad recce/data ignore with per-extension rules so
  recce/data/mcp/*.html (widget source, not build output) can be committed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
- tests/test_widget_server.py: 5 tests covering WIDGET_TOOLS env-var
  coordination, widget server tool/resource registration, file-missing
  graceful degradation, and tool enumeration regression.
- docs/mcp-widgets.md: template for Scott — file layout, Claude Desktop
  config example, add-widget walkthrough using row_count_diff as worked
  reference, structuredContent contract, gotchas (SDK version pin,
  typed-params requirement, dual-key meta, mcp.run sync nature, .gitignore
  allowlist, stdout discipline).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…de Desktop cwd=/ workaround)

Claude Desktop spawns MCP servers with cwd=/. Two issues surfaced when
manually testing iter 1 widgets:

1. recce/track.py:61 used bare print() which goes to stdout. When
   RecceConfig couldn't find/create recce.yml at cwd=/, the resulting
   traceback corrupted the JSON-RPC stdio channel. Fixed to print to
   stderr. Pre-existing bug — affected mcp-server equally.

2. recce/cli.py mcp_server and mcp_widget_server now chdir to
   --project-dir before RecceConfig so the config file is searched
   relative to the user's dbt project instead of cwd. Eliminates need
   for a bash wrapper in Claude Desktop config.

Verified by reproducing the cwd=/ failure before fix (32 lines of
traceback to stdout) and confirming clean stdout (0 lines) after.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Claude Desktop manual test revealed widget rendered but showed empty
state ("No models found") while agent text summary correctly reported
5 models. Root cause: FastMCP with @mcp.tool returning Dict[str, Any]
may not populate the JSON-RPC structuredContent field — only content
text payload. Widget HTML only read structuredContent.

- Both widgets now try structuredContent first, fall back to parsing
  content[0].text as JSON. Works regardless of FastMCP serialization
  behavior.
- row_count_diff also skips keys starting with "_" when iterating
  models (so _maybe_add_single_env_warning's _warning key doesn't
  break the per-row render); _warning text is rendered as a notice
  banner above the table when present.
- schema_diff gets the same underscore-key skip for defensive uniformity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…Content

Per MCP log inspection: FastMCP synthesizes an outputSchema like
\`{properties: {result: {...}}, required: ["result"]}\` when @mcp.tool
returns a Dict[str, Any] without a Pydantic return type. The actual
structuredContent sent to the widget has shape {result: {models: ...}}
not {models: ...}. Widget HTML now unwraps the result key when present,
so both flat returns and FastMCP-wrapped returns render correctly.

Defensive: only unwraps if data.result exists AND data.models does not,
so tools that legitimately return both keys aren't mishandled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…ntegration

Manual Claude Desktop test showed widget rendered real data correctly but
hand-rolled hardcoded colors looked washed-out in dark mode and out of place
next to Claude's native UI. Refactored both widgets to use Claude's CSS
custom properties (var(--color-text-primary), var(--color-background-*),
var(--font-sans), etc.) with light-mode hex fallbacks and a
prefers-color-scheme dark fallback for non-Claude hosts.

- row_count_diff.html: redesigned with header + 4-up summary cards + main
  table, semantic status badges (warning/success/info/secondary), tabular
  numeric alignment, monospace model names.
- schema_diff.html: same design-token migration, per-model sections with
  semantic colors for added/removed/type_changed.
- No new external dependencies (kept ext-apps SDK as only unpkg load,
  CSP resourceDomains unchanged). Used Unicode glyphs instead of icon
  font to stay dependency-light.

Preserves all existing JS: unwrap fallback for FastMCP {result: ...}
wrapping, _warning notice rendering, === null guards, _-prefix key skip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…lt + annotations

Per anthropic/skills/mcp-builder Python idiom checklist (python_mcp_server.md:691).
Closes the gap between iter 1's "make it work" shortcuts and SDK best practices.

- Pydantic BaseModel for tool inputs (RowCountDiffInput, SchemaDiffInput) with
  Field(description=...) on every param — replaces bare Optional[...] params,
  gives the LLM richer inputSchema descriptions.
- Pydantic BaseModel for tool outputs (RowCountDiffOutput, SchemaDiffOutput,
  nested per-model models) — replaces Dict[str, Any] return, generates proper
  outputSchema, eliminates FastMCP's {result: ...} wrapping (widget JS unwrap
  fallback removed in commit 2).
- Return CallToolResult explicitly with short content (one sentence) + data in
  structuredContent only. Agent receives a summary; widget receives the dict.
- annotations dict on every @mcp.tool: title, readOnlyHint, destructiveHint,
  idempotentHint, openWorldHint per checklist line 691.
- _warning moved from top-level result dict to RowCountDiffOutput.warning
  named field — cleaner Pydantic shape, no more _-prefix key skip in widget JS.
- logging.basicConfig explicitly sets stream=sys.stderr per stdio discipline
  (mcp_best_practices.md:139).
- prefersBorder: False added to @mcp.resource meta per spec UIResourceMeta.
- Tool docstrings expanded per python_mcp_server.md:278-328 pattern:
  what / Use when / Don't use when / Returns / Error Handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Per add-app-to-server/SKILL.md JS-side patterns. Replaces manual CSS var()
fallback with SDK helpers that actively apply host design tokens via
postMessage context.

- Import applyDocumentTheme, applyHostStyleVariables, applyHostFonts from
  ext-apps SDK. Register app.onhostcontextchanged to invoke them on theme/
  font change. Apply initial context after connect() if exposed.
- Remove dead defensive layers no longer needed after commit 1:
  - structuredContent unwrap (FastMCP no longer wraps Pydantic returns)
  - content[0].text JSON.parse fallback (content is one-sentence string)
  - _-prefix key skip (warning is now a named field, models is clean dict)
- Add app.onteardown handler returning {} per SDK pattern.
- CSS fallback values: rename --color-text-error → --color-text-danger and
  --color-background-error → --color-background-danger (and badge-error →
  badge-danger) to match spec enum McpUiStyleVariableKey (no "error"
  semantic; "danger" is canonical).
- CSS var() fallbacks retained as defensive layer in case SDK helper races.
- warning now read from data.warning (named field) not data.models._warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…ests

Closes iter 1 idiomatic refactor.

- tests/test_widget_server.py: 3 new tests (5→8 total):
  - test_row_count_diff_returns_calltoolresult_with_short_content: verifies
    content is short one-sentence string and structuredContent has correct keys.
  - test_structured_content_matches_pydantic_model: RowCountDiffOutput.model_validate
    round-trip on structuredContent proves clean Pydantic shape.
  - test_widget_tool_annotations_present: asserts readOnlyHint/destructiveHint/
    idempotentHint/openWorldHint/title on both tools.
- docs/mcp-widgets.md:
  - Step 2 (HTML template): updated to use applyDocumentTheme/applyHostStyleVariables/
    applyHostFonts from ext-apps SDK + onteardown handler.
  - Step 3 (@mcp.tool template): Pydantic BaseModel input/output + CallToolResult
    replaces bare typed params + Dict[str, Any] return pattern.
  - Step 4 (@mcp.resource): added prefersBorder: False to meta example.
  - structuredContent contract: updated to explain CallToolResult + short content
    discipline; added note on why bare Dict causes {result: ...} wrapping.
  - Gotchas: replaced bare-Dict warning with Pydantic requirement; added CSS
    danger vs error naming gotcha; removed {result: ...} unwrap entry (resolved).
  - Added "Python vs TypeScript SDK Support" subsection explaining qr-server
    pattern and why Recce stays Python.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…classes

Captain's manual test in Claude Desktop dark mode revealed summary card values
invisible and model name cells with unexpected styling. Root cause: the
@media (prefers-color-scheme: dark) blocks in both widget HTML files only
covered .card, th, border, .na, .empty, .warning — leaving .card-value,
.card-label, .header-*, .model-name, .diff-*, and .badge-* with light-mode
fallback hex values that become invisible on Claude's dark card backgrounds.

Both widgets now have exhaustive @media dark overrides so rendering is
correct in all four (mode × SDK helpers fire/no-fire) combinations:
- Dark + SDK fire: host tokens drive
- Dark + no SDK: @media dark fallback hex drives (this commit's job)
- Light + SDK fire: host tokens drive
- Light + no SDK: default var() inline fallback drives

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa iamcxa changed the title feat(mcp): iter 1 widget POC — row_count_diff + schema_diff as MCP App widgets feat(DRC-3526): iter 1 widget POC — row_count_diff + schema_diff as MCP App widgets May 25, 2026
@iamcxa iamcxa requested a review from Copilot May 25, 2026 09:17
@codecov

codecov Bot commented May 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.73421% with 124 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/cli.py 53.79% 67 Missing ⚠️
recce/widget_server.py 91.94% 54 Missing ⚠️
recce/mcp_server.py 94.54% 3 Missing ⚠️
Files with missing lines Coverage Δ
recce/track.py 86.84% <100.00%> (ø)
tests/test_check_base.py 100.00% <100.00%> (ø)
tests/test_mcp_cloud_backend.py 100.00% <100.00%> (ø)
tests/test_mcp_config_install.py 100.00% <100.00%> (ø)
tests/test_mcp_server.py 99.86% <100.00%> (+<0.01%) ⬆️
tests/test_widget_server.py 100.00% <100.00%> (ø)
recce/mcp_server.py 91.30% <94.54%> (+0.02%) ⬆️
recce/widget_server.py 91.94% <91.94%> (ø)
recce/cli.py 67.51% <53.79%> (-1.47%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Iter 1 POC for MCP Apps widgets by introducing a parallel FastMCP-based “widget server” that serves widget-capable versions of row_count_diff and schema_diff, coordinated with the existing stdio MCP server via RECCE_MCP_WIDGETS=1.

Changes:

  • Added recce mcp-widget-server (FastMCP) with widget tool delegates + HTML widget resources for row_count_diff and schema_diff.
  • Coordinated tool routing between servers using WIDGET_TOOLS + _widgets_enabled() filtering in recce/mcp_server.py.
  • Added tests and documentation for the widget workflow; fixed one stdout→stderr issue for stdio MCP safety.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_widget_server.py Adds test coverage for widget server registration, env-var coordination, and tool result shapes.
recce/widget_server.py Implements FastMCP widget server with two widget tools and two HTML resources.
recce/mcp_server.py Adds widget tool filtering/guardrails and extracts _compute_schema_changes() for shared schema-diff shape.
recce/cli.py Adds mcp-widget-server CLI command and ensures --project-dir becomes cwd for config discovery.
recce/track.py Sends traceback output to stderr to avoid corrupting stdio JSON-RPC.
recce/data/mcp/row_count_diff.html Self-contained MCP Apps widget for row count diffs.
recce/data/mcp/schema_diff.html Self-contained MCP Apps widget for schema diffs.
docs/mcp-widgets.md Developer guide/template for adding future widgets.
.gitignore Stops ignoring the entire recce/data tree; ignores only build output paths so recce/data/mcp/*.html can be committed.
CLAUDE.md Adds gbrain search guidance block.

Comment thread recce/widget_server.py Outdated
Comment thread recce/mcp_server.py
Comment on lines +1555 to +1556
base_type = base_columns[col].get("type")
curr_type = current_columns[col].get("type")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — minor. _compute_schema_changes can emit base_type / curr_type as None (from .get("type")), and the widget already tolerates it (c.type ?? ""). Tightening the SchemaChange annotation to the real shape ({name,type} / {name,base_type,curr_type} submodels, or Optional) is batched with the other type-contract polish (the Literal/submodel pass) — deferred for the POC, tracked in the follow-up discussion.

Comment thread docs/mcp-widgets.md Outdated
Comment thread docs/mcp-widgets.md Outdated
Comment thread recce/track.py Outdated
iamcxa and others added 12 commits May 25, 2026 17:26
…onfig

POC users had to manually edit claude_desktop_config.json to register both
recce + recce-widgets MCP servers and set RECCE_MCP_WIDGETS=1 on both. This
helper does it idempotently: merges entries (preserves other servers),
auto-sets the env var, backs up the old config, and prints next-steps.

- Required: --project-dir <PATH> (validated for dbt_project.yml presence)
- Optional: --config <PATH>, --yes, --dry-run
- iter 1 scope: macOS only, local mode only (errors on cloud flags / other OS
  with explicit fix message)
- 5 unit tests using CliRunner + tmp_path (write/preserve/validate/dry-run/backup)

Closes the manual install friction surfaced in iter 1 manual testing. Cuts
"5 minutes editing JSON + finding absolute recce path + restarting" to one
command + Cmd+Q.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Adds the third widget in the iter 1 widget POC line — tier 1 (status pill +
key/value grid) for server runtime state. First widget added post-iter-1
idiomatic refactor, so it's born following the canonical pattern:

- Pydantic output model matching _tool_get_server_info return shape
- @mcp.tool with annotations dict (readOnly/idempotent/title)
- CallToolResult with short content + structuredContent (no agent dual-render)
- @mcp.resource with mime text/html;profile=mcp-app + CSP + prefersBorder
- Widget HTML: SDK helpers (applyDocumentTheme + applyHostStyleVariables +
  applyHostFonts), onhostcontextchanged + onteardown, exhaustive @media dark
  for every class with var(--color-*, fallback) — closes the dark-mode
  fallback gap that bit row_count_diff/schema_diff earlier (commit b882697)
- WIDGET_TOOLS set updated to filter from main mcp-server when widgets enabled
- 2 new tests + updated enumeration assertions (10 total, was 8)
- docs/mcp-widgets.md "Add widget N+1" walkthrough now references this as
  the canonical worked example (born idiomatic, no later-cycle defensive layers)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Fourth widget. tier 2 (list / simple table) — saved Recce checks for a PR
session, rendered as a 3-up summary + status table.

- Pydantic CheckSummary + ListChecksOutput models matching actual
  _tool_list_checks shape (minimal field set — name/type/status/description,
  NOT internal params). approved count from raw return ("approved" key);
  pending derived as total - approved in the delegate.
- @mcp.tool with annotations, CallToolResult short content + structuredContent
- @mcp.resource with mime/CSP/prefersBorder
- Widget HTML: SDK helpers + onhostcontextchanged + onteardown + exhaustive
  @media dark overrides (354 lines, 13 KB)
- WIDGET_TOOLS filter updated to 4 widget tools
- 2 new tests; enumeration assertion bumped to 4; annotation loop includes
  list_checks; filter test asserts list_checks absent when widgets enabled

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Fifth widget, completing Phase A of the post-iter-1 widget expansion.
Single-model detail view (base/current column tables, constraints, not-found state).

- Pydantic ColumnInfo + ModelEnvironment + GetModelOutput + GetModelInput
  matching actual _tool_get_model return shape (columns dict → list normalised
  in _parse_model_env helper; primary_key preserved; raw_code omitted)
- @mcp.tool with annotations, CallToolResult short content + structuredContent
- @mcp.resource with mime/CSP/prefersBorder
- Widget HTML: SDK helpers + onhostcontextchanged + onteardown + exhaustive
  @media dark; adaptive 2-col/3-col column table layout (constraints visible
  only when present); per-env base/current sections; not-found empty state
- WIDGET_TOOLS filter at 5 tools
- 2 new tests (test_get_model_widget_registered,
  test_get_model_returns_calltoolresult_with_pydantic_shape); enumeration
  assertions bumped to 5 tools/resources

Phase A (tier 1-2 multipliers) complete: row_count_diff, schema_diff,
get_server_info, list_checks, get_model. ~half the design's 12 widget
candidates done — design Open Q #6 >=50% FastMCP migration trigger
approaches. Iter 2 mini-doc to reevaluate.

No Pydantic reserved-name conflicts in actual handler shape (columns,
primary_key, raw_code are all safe); Gotcha documented in mcp-widgets.md
for future implementors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…st[str]

Captain hit Pydantic validation error in Claude Desktop: actual
_tool_get_server_info returns support_tasks as {"query": True, ...,
"change_analysis": True} (dict of task slug → enabled bool), not a list
of enabled slugs. Fix the Pydantic field type and widget HTML iteration.

- ServerInfoOutput.support_tasks: Dict[str, bool] (was List[str])
- Widget HTML iterates Object.entries(...).filter(([_,v]) => v) for enabled
  tasks; renders as info badges (same visual as before)
- Test fixture updated to use dict shape

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
… invocation)

Captain hit Pydantic validation error in Claude Desktop:
  Error executing tool list_checks: 1 validation error for list_checksArguments
  args Field required [type=missing, input_value={}, input_type=dict]

Root cause: list_checks delegate signature was `async def list_checks(args:
ListChecksInput)` where ListChecksInput was an empty BaseModel (no real args
to pass). FastMCP exposed this as an inputSchema requiring `args` as a top-
level field — agent called with no params, MCP sent {}, validation failed.

Fix: drop the empty ListChecksInput class and signature param. Match the
no-args pattern of get_server_info. `_tool_list_checks` still receives {}
internally (the handler accepts but ignores arguments).

Confirmed only list_checks affected: row_count_diff / schema_diff have
Optional-fields-only inputs (OK), get_server_info has no-args (OK),
get_model has required model_id (OK).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…dget)

Sixth widget; first tier-3 widget. Establishes the data-table rendering
pattern for the other Phase B widgets (query_diff, value_diff,
value_diff_detail, top_k_diff).

- QueryInput / QueryOutput Pydantic models matching actual DataFrame.model_dump
  shape (READ from source — columns/data/limit/more/total_row_count)
- Sticky-header scrollable table inside ~400px container
- Cell type rendering (null → "—" italic secondary, number → tabular-nums right,
  string → truncated with title attr, bool → check/dash, date → display as-is)
- Truncation badge when result is capped (DataFrame.more)
- Empty state with SQL echo for debug context
- @mcp.tool with annotations (openWorldHint=True since queries hit the warehouse)
- CallToolResult short content + structuredContent
- WIDGET_TOOLS filter expanded to 6 tools
- 2 new tests (tests 15-16); enumeration assertions bumped to 6
- docs/mcp-widgets.md gains "Adding a tier-3 (data table) widget" section
  as template for the remaining Phase B widgets

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…mparison)

Seventh widget. Builds on query widget (46251d5)'s data-table pattern.
Compares SQL execution against base + current environments with per-row
status rendering. Handles both QueryDiffTask result shapes from source.

- QueryDiffInput / QueryDiffOutput Pydantic models verified against
  actual QueryDiffTask.execute() return shape (QueryDiffResult)
- Two render modes detected at runtime from structuredContent:
  * Shape A (no primary_keys): side-by-side base / current tables
  * Shape B (primary_keys): join-diff table with in_a/in_b stripped,
    status pills (Added/Removed), row tinting, filter buttons
- _parse_dataframe() helper mirrors _parse_model_env pattern
- _warning extracted to output.warning named field (consistent with
  row_count_diff Day 1.5 pattern)
- Widget HTML: sticky-header scrollable tables + status pills + filter
  buttons (All / Added / Removed with counts) + dark-mode coverage
- @mcp.tool with openWorldHint=True (queries hit warehouse)
- WIDGET_TOOLS updated to 7 tools
- 2 new tests (registration + both Shape A and B); enumeration bumped to 7
- test_widget_tool_annotations_present updated for query_diff open-world group
- docs/mcp-widgets.md: widget count + reference table updated

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…comparison)

Eighth widget. Aggregate stats + per-column match breakdown for value
comparison across two environments (primary key matching).

- ValueDiffInput / ValueDiffColumnRow / ValueDiffSummary / ValueDiffOutput
  Pydantic models matching actual ValueDiffTask return shape (verified from source):
  summary={total, added, removed}, data.data=[[col, matched, matched_p], ...]
  where matched_p is 0.0-1.0 fraction (not percentage)
- _warning extracted to output.warning named field
- 4-up summary cards (total/common/columns-affected/added+removed) + per-column
  table with inline bar visualization (CSS-only, no chart lib)
- All-match empty state when every column is 100% identical
- @mcp.tool with annotations (openWorldHint=True, queries warehouse)
- WIDGET_TOOLS at 8 tools
- 2 new tests (registration + Pydantic shape); enumeration bumped to 8

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…el inspection)

Ninth widget. Row-level companion to value_diff. Shows the actual rows
with mismatched values for investigation.

- ValueDiffDetailInput / Output Pydantic models matching actual
  ValueDiffDetailTask return shape (plain DataFrame with in_a/in_b flags)
- Scrollable table with sticky primary-key column (CSS sticky left),
  filter buttons (All/Removed/Added) with counts, status pill per row,
  row tinting (red=removed, green=added), cell type-aware rendering
- _warning extracted to output.warning named field
- @mcp.tool with annotations (openWorldHint=True)
- WIDGET_TOOLS at 9 tools
- 2 new tests (Tests 21-22); enumeration bumped to 9; 27 total passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Tenth widget; closes Phase B (5 tier-3 widgets). Side-by-side ranked
lists of top-K most frequent values across base + current envs.

- TopKDiffInput / TopKEnvStats / TopKDiffOutput Pydantic models matching
  actual TopKDiffTask return shape (READ from source):
  base/current: {values[], counts[], valids, total} parallel lists.
  values[] is the SAME union list in both envs (SQL FULL OUTER JOIN order:
  curr_count DESC, base_count DESC). count=0 means absent from that env.
  Param is column_name (not column), default k=10 (not 50).
- 2-column grid with ranked lists per env, inline bar viz, rank-change
  arrows (up/down), New/Gone badges for env-exclusive categories
- _warning extracted to output.warning named field
- @mcp.tool annotations (openWorldHint=True — executes warehouse SQL)
- WIDGET_TOOLS at 10 tools — 50% coverage (10/20), design Open Q #6
  FastMCP migration trigger threshold REACHED. iter 2 to reevaluate.
- 2 new tests (23: registration + inputSchema, 24: Pydantic shape);
  enumeration bumped to 10

Phase B (tier-3 data tables): query, query_diff, value_diff,
value_diff_detail, top_k_diff. Each hand-rolled its table/list layout
because patterns diverged enough that a shared <recce-table> would
need extreme flexibility. Iter 2 mini-doc: evaluate renderRankedList()
JS helper extraction; full component abstraction deferred until 3+
widgets converge on same layout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…t widget)

Eleventh widget; first tier-4 (statistics/chart) widget. Hand-rolled SVG
bars — no new chart library, no CSP/CDN changes. Establishes the
"tier-4 via hand-rolled SVG" pattern that profile_diff and any future
chart widgets will follow.

Actual HistogramDiffTask return shape (read from recce/tasks/histogram.py):
  base/current: {counts: List[int], total: int}  (or {} if env failed)
  min/max: overall min/max across both envs (numeric or date object)
  bin_edges: N+1 edge values (int/float for numeric, date for datetime)
  labels: List[str] for numeric cols ("lo-hi" format); None for datetime

_tool_histogram_diff auto-detects column_type from catalog — tool only
needs model + column_name (+ optional num_bins).

- HistogramDiffInput / HistogramEnvStats / HistogramDiffOutput Pydantic
  models matching actual HistogramDiffTask return shape
- Date bin_edges serialised to ISO strings in widget delegate
  (date objects are not JSON-serialisable natively)
- Hand-rolled SVG: viewBox 600x180, base bars (blue 45% opacity) behind
  current bars (green 70% opacity), x-axis label density auto-reduced
  (every Nth label for bins > 10), hover tooltip via mousemove
- CSS tokens for bar colors with exhaustive @media dark fallback
- _warning extracted to output.warning named field
- @mcp.tool annotations (openWorldHint=True, hits the warehouse)
- WIDGET_TOOLS at 11 tools
- 2 new tests (25, 26); enumeration bumped to 11; annotations test
  extended to cover histogram_diff openWorldHint=True
- docs/mcp-widgets.md: widget count 10→11, new "Tier-4 (Chart) Widget
  Architecture" section explaining hand-roll SVG approach + iter-2
  upgrade criteria for swapping to a real chart library

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
iamcxa and others added 2 commits May 28, 2026 21:25
…ema_diff

Phase D impact_analysis introduced a second `class SchemaChange` at module
top level, which silently shadowed the schema_diff model defined earlier in
the file. Module-global binding meant schema_diff serialization tried to fit
{added, removed, type_changed, unchanged_count} into the impact_analysis
shape and failed Pydantic validation with "column / change_status Field
required".

Rename the impact_analysis variant to `ColumnSchemaChange` and add a
regression test that pins both field surfaces so a future widget tool
cannot reintroduce the shadow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
…e cap)

15th widget tool. Hand-roll SVG mini-DAG (no Mermaid/ReactFlow dep) copying
the impact_analysis BFS layout pattern. Hard cap MAX_INLINE_NODES=10 — over
cap shows graceful empty-state message pointing to Recce web UI, no truncated
view. Full ReactFlow plan from design Phase E remains deferred.

Motivation: without a lineage_diff widget, Claude Desktop's agent falls back
to rendering lineage as Mermaid text after seeing impact_analysis output,
bypassing the widget rendering pipeline. Adding this widget keeps lineage
visualization inside the same theming + interaction surface as the other 14
widgets and brings widget coverage to 15/20 (75%).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa iamcxa changed the title feat(DRC-3526): iter 1 widget POC — row_count_diff + schema_diff as MCP App widgets feat(DRC-3526): MCP App widget POC — 15 widget tools + config-install helper May 29, 2026
iamcxa and others added 3 commits May 29, 2026 18:13
These two Claude Code plugins back the MCP App widget work in this branch
(ext-apps SDK reference, MCP server scaffolding). Enabling them in the repo
settings keeps the dev environment consistent for anyone continuing the
widget POC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
The impact_analysis value_diff builder derived its per-column diff list
from model_info["columns"], which reflects only the CURRENT relation
(get_model -> get_columns(base=False)). It then applied
`b."col" IS DISTINCT FROM c."col"` to BOTH the base (b) and current (c)
relations. When a column had drifted — present in the current physical
table but not the base — the warehouse binder failed hard
(Snowflake: `Table "b" does not have a column named "<col>"`).

Restrict the diff to the intersection of base and current columns via
get_model(node_id, base=True) (which manages its own warehouse
connection), mirroring ValueDiffTask. Drifted columns are already
reported via schema_changes. Also skip value_diff when the PK itself
has drifted, since it is the FULL OUTER JOIN key.

Adds a regression test asserting drifted columns never reach the
generated SQL and that value_diff covers only common non-PK columns.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
`logs/` (dbt run logs) and `.hypothesis/` (hypothesis testing example
cache) are runtime/test artifacts regenerated on every test run, but the
existing rules only covered the root `dbt.log` file — not the `logs/`
directory dbt writes to, nor `integration_tests/dbt/logs/`. They showed
as persistent untracked changes in the workspace UI. Ignore the dir
forms (no leading slash → matches at any depth).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@even-wei even-wei self-requested a review June 2, 2026 07:26
@even-wei

even-wei commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Code Review: PR #1397

SHA 87f22fd3 · Verdict NO-GO

Verified checks: flake8 clean (exit 0); targeted suites 156 passed (test_widget_server.py + test_mcp_config_install.py + test_mcp_server.py) under the project-pinned mcp 1.27.0 / Python 3.13; black --check and isort --check FAIL on tests/test_widget_server.py; bundled security-review found no exploitable Critical/High.

Blockers

None.

Issues

  1. tests/test_widget_server.py — Fails the project's required make format gate (black + isort).
    Evidence: black --check → "1 file would be reformatted" (line ~2218 over-length call); isort --check → import block at line ~2248 needs multi-line wrap. AGENTS.md lists make format as a pre-commit/CI requirement.
    Pass E / project-convention.

Notes

  1. recce/cli.py:3174 (mcp_config_install) — On re-run the backup is overwritten with the already-modified config, so .recce.bak no longer holds the pristine pre-recce original; only the recce entries differ, and re-run is otherwise idempotent (deterministic new_entries). Consider skip-if-exists or timestamped backups.
    Pass D.
  2. recce/data/mcp/*.html — Shared escapeHtml escapes & < > " but not ' (&#x27;). Not exploitable today (all interpolated attributes are double-quoted and " is escaped; SVG labels use textContent). Defense-in-depth only; add .replace(/'/g,"&#x27;") if a single-quoted attribute is ever introduced.
    Pass B (Security, LOW).
  3. recce/mcp_server.py:1885 (impact_analysis) — New get_model(node_id, base=True) adds a second per-model warehouse introspection round-trip inside the impacted-models loop. Correctness fix (avoids binder failure on drifted columns) but doubles base-side introspection on large impacted sets.
    Pass G.
  4. recce/cli.py:3012 (mcp_config_install) — recce mcp-config-install and the macOS-only path are correctly OS-guarded, but there is no uninstall / removal path; the docstring's "To undo: replace with .recce.bak" is the only recovery, which is lost after a second run (see Note 1). Untested uninstall for a config-mutating installer.
    Pass E.
  5. recce/data/mcp/ — 15 hand-written HTML/JS widgets (~8.4k LOC) carry a copy-pasted escapeHtml + svgEl helper per file. For a POC this is acceptable, but it's 15 sources of truth for the escaping contract; a shared asset would prevent Note 2 drifting per-file.
    Pass I (architecture, copy-paste).

Limits

  • uv run pytest (the default dev path) resolves to a STALE local env (Python 3.10 + an older mcp whose FastMCP.resource() lacks the meta kwarg), yielding 32 spurious TypeError: ... unexpected keyword argument 'meta' failures. These are an environment artifact, NOT a code defect — under the pinned mcp>=1.23.0 (resolved to 1.27.0, where resource() DOES accept meta) all 156 tests pass. Verified by forcing the 3.13 venv. CI using the locked deps will pass; a developer with a stale global mcp will see false failures.
  • Did not exercise the widgets live in Claude Desktop (macOS MCP-app rendering) — XSS assessment is static data-flow only. SQL query/query_diff tool args flow to the pre-existing _tool_query path (unchanged by this PR), not re-reviewed for injection here.

@even-wei even-wei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review: NO-GO. 1 issue (format gate) + 5 notes. See review comment.

even-wei and others added 4 commits June 2, 2026 17:26
Batch of correctness/robustness fixes from a Codex review pass:

- mcp_server: default 30s timeout on cloud proxy requests so a stalled
  endpoint cannot hang the MCP server indefinitely.
- mcp_server: include the effective base selection (single-env vs
  dual-env) in the set_backend local cache key, so a context loaded in
  single-env mode is reloaded once target-base/ later appears instead of
  serving a stale single-env context.
- cli: mcp_config_install invokes `python -m recce.cli` (the package has
  no recce.__main__, so `-m recce` is not runnable).
- cli: make check_base_freshness SHA comparison opt-in via
  expected_base_sha. Base-branch artifacts legitimately carry a
  DBT_GIT_SHA that differs from the feature-branch HEAD, so the previous
  unconditional compare produced false "stale_sha" results.
- tests: cover cloud request timeout, single-env -> dual-env reload, the
  config-install module path, and the opt-in base-SHA freshness check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Addresses the Code Review format-gate issue (Issue 1): black wraps an
over-length call and isort splits the multi-line import block in the
lineage_diff widget test. Pure formatting, no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
Addresses Code Review Note 1. The backup was rewritten unconditionally on
every run, so a second `recce mcp-config-install` clobbered the
`.recce.bak` with the already-modified config — destroying the pristine
pre-recce original and making the documented "restore from .recce.bak"
undo path useless. Skip the copy when a backup already exists, and report
"Existing backup preserved" instead of "Backup saved" in that case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 5 comments.

Comment thread recce/cli.py Outdated
Comment thread recce/cli.py
Comment thread recce/mcp_server.py Outdated
Comment thread recce/data/mcp/get_server_info.html
Comment thread tests/test_mcp_config_install.py

@iamcxa iamcxa left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamcxa — automated review (kc-pr-review, Full tier: code-reviewer + silent-failure-hunter + pr-test-analyzer + type-design-analyzer + HTML-XSS scan + ToB security). Event is COMMENT, not approve: (a) tests could not be re-run locally (env SIGKILL/OOM at pytest import — the first background run did exit 0, and the PR body documents 35+5 passing); (b) the check_base_freshness behavior change needs author intent; (c) the one production-risk path (Snowflake binder in the value_diff fix) is mock + manual-A/B only, no CI e2e. No CRITICAL/HIGH blocking bugs were found.

Goal-achievement verdict: ✅ Achieved (POC scope, with follow-up caveats)

Evidence

  • All 15 tool/resource pairs register correctly; every _meta.ui.resourceUri maps to a real recce/data/mcp/*.html (verified file-by-file).
  • All 15 HTML widgets escape warehouse data consistently (escapeHtml / textContent / numeric formatters) — no exploitable stored-XSS sink found.
  • The impact_analysis value_diff drift fix is correct — it intersects base ∩ current columns and uses get_model(base=True) connection-managed introspection, matching the canonical ValueDiffTask.
  • Type design is solid for a POC (None-vs-zero distinctions in RowCountModel, ValueDiffColumnRow.matched_p, HistogramEnvStats.total are deliberate and correct).

Verification Matrix

# Concern (from PR body) Verified? Evidence
V1 RECCE_MCP_WIDGETS=0/unset → all 20 tools text-only, backward compatible _widgets_enabled() reads env at call time; filter only removes WIDGET_TOOLS when enabled (mcp_server.py:1294)
V2 RECCE_MCP_WIDGETS=1 → main 5 + widget 15, no collision WIDGET_TOOLS filter + widget server registers exactly the 15-name set; test pins it (test_widget_server.py:102/173)
V3 impact_analysis drift fix (Snowflake binder) ⚠️ Code correct, but e2e gap — see inline on mcp_server.py:1931
V4 schema_diff serialization (SchemaChange collision fix) ⚠️ Models distinct + field-pinned; no behavioral output test — see Advisory F
V5 stdio stdout purity (JSON-RPC channel) No stdout writes in widget_server.py; logging → stderr (widget_server.py:2521)
V6 chdir to --project-dir before RecceConfig Present in both subcommands; minor: no dir-existence guard (Advisory)

Verification Summary (tests)

Check Result
Local re-execution ⚠️ Could not run — env SIGKILL/OOM at pytest import (0-byte output; killed during collection). Not a test-logic failure.
First background run (widget + config_install + check_base + mcp_server + mcp_cloud_backend) exit 0
PR-body documented test_widget_server.py 35 passed · test_mcp_config_install.py 5 (file actually has 7) · broad selection 363 passed

Advisory (not blocking)

  • A — Bundled tooling / scope creep. CLAUDE.md (+37 gbrain boilerplate, includes a per-machine "on this machine" assertion in shared docs) and .claude/settings.json (enables mcp-server-dev for all repo contributors) are unrelated to the widget feature. Suggest splitting into a separate commit/PR, marking POC-only, or moving dev-plugin prefs to gitignored settings.local.json.
  • B — External SDK dependency. All 15 widgets import https://unpkg.com/@modelcontextprotocol/ext-apps@0.4.0 (no SRI). Mitigated by each resource's csp.resourceDomains: ["https://unpkg.com"] (host-enforced origin restriction) and this is the standard MCP Apps SDK load pattern — acceptable for POC; consider local vendoring before productionizing.
  • C — Type contracts. status / change_status / priority etc. are free str with legal values only in comments. Converting the widget-branched ones to Literal[...] is a zero-cost contract win.
  • D — HTML hardening. No <meta> CSP in the HTML (partially mitigated by resourceDomains); get_server_info.html:317 pr.url href has no scheme validation (javascript: possible, though source is GitHub metadata not warehouse) and target="_blank" lacks rel="noopener".
  • E — Mega-PR. Feature POC + 4 unrelated pre-existing bugfixes (track stderr / chdir / set_backend cache-key / freshness) + tooling, 37 commits / +14K. Hard to review and to revert. Future: land pre-existing bugfixes as their own PRs.
  • F — Test gaps. schema_diff is the only widget with no behavioral model_validate round-trip test (the collision broke exactly that serialization); no absolute tool-count assertion (assert len==20/5) so dropping a non-widget tool would pass silently.
  • G — impact_analysis per-step except Exception (pre-existing, mcp_server.py ~1866/1901/2042, not in this diff) appends to errors[] without logging unexpected exceptions — a code bug is indistinguishable from a warehouse failure. Worth narrowing + exc_info logging when that area is next touched.

Nice work on the POC — the escaping discipline, type care, and the documented drift fix are genuinely solid. The items above are hardening + a couple of bundled changes worth confirming.

Comment thread .gitignore Outdated
Comment thread recce/cli.py
current_sha = current_commit_hash()
result["current_sha"] = current_sha
if current_sha and base_sha != current_sha:
if base_sha is not None and expected_base_sha:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — SHA freshness is now opt-in (and expected_base_sha), but neither shipping caller passes it: check_base CLI (L3488) and the MCP startup check (mcp_server.py:2807) both omit expected_base_sha. So stale_sha is now unreachable in production, current_sha is always None, and the stale_sha color-map (L3505) + exit-code-2 branch (L3522) are dead.

The rationale (base artifacts come from the base branch → base SHA ≠ feature HEAD → the old auto-check was a false-positive generator) is sound. But this is a behavior change to the existing recce check-base command, not mentioned in the PR body and bundled into a widget POC. Please confirm it's intentional — if so, clean up the dead stale_sha/current_sha path, or wire a caller that actually passes expected_base_sha.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred to the follow-up discussion (POC scope) — leaving this thread open as a tracked TODO. The opt-in SHA change is intentional behavior-wise (base artifacts come from the base branch, so the base SHA always differs from feature HEAD → the old auto-check was a false-positive generator). But stale_sha / current_sha / the exit-code-2 branch are now dead for both callers. Decision pending: clean up the dead path vs. wire a caller that passes expected_base_sha.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection, holding this rather than cleaning it up inside this widget PoC. The stale_sha / expected_base_sha path is not pure dead code — it's a tested opt-in API: test_status_stale_sha_when_expected_base_sha_mismatch exercises the mismatch → stale_sha branch, and test_status_stale_sha pins the no-arg = fresh behavior. So it's "no production caller yet", not "unreachable".

Removing it would touch the recce check-base command (color-map, exit-code-2, docstring) plus those tests — a cross-feature refactor outside this widget PoC's scope, and one I can't validate locally right now (the full pytest suite OOMs in this env; only the lint tools run). Proposal: either leave the opt-in as-is (harmless + tested), or do a focused follow-up that wires a caller passing expected_base_sha (giving it a purpose) or removes SHA-freshness entirely. Leaving this thread open for that decision.

Comment thread recce/widget_server.py
- table_not_found / permission_denied surface in *_meta.status
- tool raises only on fundamental dbt/adapter failures
"""
result = await _recce_server._tool_row_count_diff(args.model_dump(exclude_none=True))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — Widget tools call _recce_server._tool_*() directly, bypassing the central @server.call_tool handler in mcp_server.py:1426. That handler is where DRC-2754 error classification lives: _classify_db_error → downgrade table_not_found/permission_denied to warnings + sentry_metrics.count("mcp.expected_error", ...). For all 15 widget tools that machinery never runs — expected DB errors become hard errors and there's zero telemetry.

Not a silent failure (FastMCP still wraps the raised exception into an error result), so not blocking for the POC — but it quietly undoes documented investment. Suggest routing widget delegates through a shared classify/telemetry wrapper, or a tracking note that the widget path defers DRC-2754 parity.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred to the follow-up discussion (POC scope) — leaving open as a tracked TODO. The widget delegates bypass the central call_tool classifier/telemetry (DRC-2754). Errors still surface (FastMCP wraps them); what's lost is expected-error classification + Sentry metrics for the 15 widget tools. Tracking a shared classify/telemetry wrapper for when the two-server architecture is consolidated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO marker added in d080819e (above _recce_server) and tracked in the PR description's "Review follow-up — deferred items" section. Deferred per iter-1 POC scope; leaving open until the shared classify/telemetry wrapper lands (two-server consolidation).

Comment thread recce/mcp_server.py
# Intersect both sides, mirroring ValueDiffTask; drifted columns are
# already reported via schema_changes (Step 2b). get_model(base=True)
# manages its own warehouse connection for the base-side introspection.
base_model_info = self.context.adapter.get_model(node_id, base=True)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM (test gap) — This drift fix is correct (intersection + get_model(base=True) connection-managed introspection, matching ValueDiffTask). But it has no real-adapter e2e. test_mcp_e2e.py::test_schema_changes_detected builds the drift shape against real DuckDB, yet its users fixture declares no primary_key, so if not pk: continue (L1917) skips value_diff entirely and the IS DISTINCT FROM binder path is never exercised on a live warehouse. Coverage is mock-only (drifted column absent from generated SQL) + the documented manual Snowflake A/B.

Per project MEMORY, mocks can't catch the missing-warehouse-connection class of bug. Recommend an e2e fixture with a PK where current has a column absent in base; assert impact_analysis returns value_diff over the common non-PK columns and does not raise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred to the follow-up discussion — leaving open as a tracked test-coverage TODO. The fix is mock-tested (asserts drifted columns never reach the generated SQL) plus the documented manual Snowflake A/B, and even-wei independently verified 156 tests pass. A real-DuckDB e2e with a PK + a column present only in current is the gap to close; not a blocker for the POC.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO marker added in d080819e (at the value_diff intersection) and tracked in the PR description's deferred section. Deferred; leaving open until the real-DuckDB e2e fixture (PK + drifted column) lands.

Comment thread recce/cli.py Outdated
Comment thread recce/widget_server.py Outdated
Comment thread recce/widget_server.py
exclude=args.exclude,
packages=args.packages if args.packages is not None else None,
)
output = SchemaDiffOutput(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOWSchemaDiffOutput has no warning field and schema_diff never extracts _warning, unlike the other 13 diff tools (e.g. L202 warning = result.pop("_warning", None)). In single-env mode it renders "No models found" (schema_diff.html:264), which reads as "no schema changes" when the truth is "no base to compare against". (_tool_schema_diff also lacks the single-env warning on the main server — pre-existing — but the widget's empty-state makes it more misleading.) Suggest adding a warning field + the single-env notice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred to the follow-up discussion (POC scope) — leaving open as a tracked TODO. Likely fix: add a warning field to SchemaDiffOutput + extract _warning, plus the single-env notice, so the empty-state ("No models found") isn't read as a clean diff when there's simply no base to compare against.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO marker added in d080819e (at the SchemaDiffOutput build) and tracked in the PR description's deferred section. Deferred; leaving open until the warning field + single-env notice are added.

Comment thread recce/widget_server.py Outdated
iamcxa and others added 5 commits June 2, 2026 22:24
…stall

- expanduser() on --project-dir and --config so ~/path resolves (Copilot)
- atomic config write (temp + os.replace) guarded by except OSError, so a
  mid-write failure leaves the existing claude_desktop_config.json intact
  instead of a truncated file (self-review)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
- _widgets_enabled() now case-insensitive (accepts TRUE/True/true/1) (Copilot)
- run_widget_server cloud/session guard checks cloud_session (the actual --session
  dest) in addition to session, so the guard is not silently dead (self-review)
- _read_widget_html broadens except + logs on failure so a packaging defect that
  drops recce/data/mcp/*.html is visible in server logs (self-review)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
_show_error_message used Console() (stdout); the --debug path's print_exception
corrupts the stdio JSON-RPC channel for MCP servers. Now Console(stderr=True),
matching the already-fixed non-debug branch (Copilot, outdated thread).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
- escapeHtml in all 15 widgets now also escapes single quotes (' -> &#x27;);
  defense-in-depth so a future single-quoted attribute stays safe (even-wei N2)
- get_server_info PR link adds rel=noopener noreferrer to its target=_blank
  anchor (Copilot, even-wei)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
- .gitignore ignores recce/data/** then re-includes recce/data/mcp/ via negation,
  so nested build output (e.g. recce/data/lineage/index.html) is ignored while
  widget source is tracked; replaces the flat per-extension allowlist that missed
  nested dirs (self-review)
- docs/mcp-widgets.md: gitignore gotcha rewritten for the negation scheme; widget
  test count 28 -> 35 (Copilot)
- test_mcp_config_install docstring: add backup-pristine-on-rerun + python-fallback
  coverage lines (Copilot)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa

iamcxa commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@even-wei — addressing your NO-GO (reviewed at 87f22fd3; current head is c533be93, which has 3 commits after your review's SHA + this resolve pass):

Blocker (format gate) — already resolved before the current head: 230af0f0 reformatted tests/test_widget_server.py. Verified now on c533be93: black --check, isort --check-only, and flake8 all exit 0.

Note 1 / 4 (backup clobbered on re-run, undo lost) — fixed in eeb5002d: the backup is skip-if-exists, so .recce.bak keeps the pristine pre-recce original across re-runs. Atomic write also added (65ca0739): a mid-write failure now leaves the existing config intact.

Note 2 (escapeHtml doesn't escape ') — fixed in c6d3506e: all 15 widgets now escape '&#x27;.

Note 3 (double base-side introspection in impact_analysis) — acknowledged; this is the tradeoff for the binder-error correctness fix. Tracking a cache/batch of the base-side get_model as a follow-up.

Note 5 (15 copy-pasted escapeHtml/svgEl helpers) — acknowledged; the POC accepts per-file helpers. Shared-asset extraction tracked as a follow-up.

Also note your "Limits": the uv run pytest stale-env failures (older mcp lacking the resource(meta=...) kwarg) are an environment artifact — confirmed; under the pinned mcp>=1.23.0 all suites pass, which your forced-3.13 run verified (156 passed).

Re-requesting your review on c533be93.

_read_widget_html resolved assets via importlib.resources.files("recce.data.mcp"),
which relies on recce.data.mcp resolving as a namespace package — fragile across
install layouts (Copilot flagged a potential ModuleNotFoundError). Anchor on the
recce regular package (always importable) and traverse to data/mcp/ instead, so a
pip-installed wheel reliably serves the widgets. Goal: widgets must work on any
install, not just an editable/source checkout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@iamcxa iamcxa requested a review from even-wei June 2, 2026 15:17
iamcxa and others added 2 commits June 2, 2026 23:20
Mark the deferred POC items (tracked in the PR description's follow-up section)
with TODO(DRC-3526 follow-up) comments in code:
- widget tools bypass central call_tool classifier/telemetry (DRC-2754)
- schema_diff has no single-env warning
- impact_analysis value_diff drift fix lacks a real-DuckDB e2e

No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Kent <iamcxa@gmail.com>
@even-wei

Copy link
Copy Markdown
Contributor

Code Review: PR #1397

SHA 6fb3bff · Verdict NO-GO

Blockers

  1. recce/widget_server.py:84PullRequestInfo.id: Optional[str] rejects the int id the state actually carries, so get_server_info crashes whenever PR metadata is present.
    Evidence: recce/pull_request.py:14 is Optional[Union[int, str]] populated from pr.number (int); reproduced in this venv — ServerInfoOutput(..., pull_request={'id': 15, ...})ValidationError: pull_request.id Input should be a valid string. Tests only cover pull_request is None (tests/test_widget_server.py:427).
    Pass C.

  2. recce/cli.py:3132-3135--dry-run (and a declined confirm prompt) still creates claude_desktop_config.json + parent dirs when the file doesn't exist, violating the documented "without writing" contract.
    Evidence: the skeleton write_text runs before the if dry_run: check at cli.py:3199; reproduced — mcp-config-install --dry-run with a nonexistent --config exits 0 and leaves the file on disk. The dry-run test pre-creates the file so the suite can't catch it (tests/test_mcp_config_install.py:166).
    Pass A.

Issues

  1. recce/widget_server.py (all 15 tools) — single args: <Model> parameter makes FastMCP emit a nested inputSchema (properties.args.$ref) while every docstring documents flat parameters and the same-named tools on recce mcp-server use flat schemas.
    Evidence: nested {"properties": {"args": {...}}, "required": ["args"]} reproduced with mcp.server.fastmcp in this venv — description ≠ schema breaks the repo's "MCP tool description = LLM agent contract" rule, and flipping RECCE_MCP_WIDGETS silently changes the calling convention.
    Pass C.

  2. recce/widget_server.py:692-699 (also :1162-1170) — query/query_diff declare readOnlyHint: True, destructiveHint: False but execute the raw sql_template (a DELETE runs); hosts use these hints to skip confirmations.
    Evidence: QueryTaskexecute_sql_with_limit (recce/tasks/query.py:43) passes the statement straight to adapter.execute; relatedly tests/test_widget_server.py:336 pins openWorldHint: False for row_count_diff, which runs COUNT(*) against the warehouse.
    Pass B.

  3. recce/data/mcp/get_server_info.html:317 — PR link href has no scheme allowlist; a javascript: URL executes on click (escapeHtml doesn't neutralize schemes).
    Evidence: `<a href="${escapeHtml(pr.url)}" target="_blank" ...>` — only href-from-data sink in the PR; gate on http(s):.
    Pass B (MEDIUM — pr.url originates from state/PR metadata, not direct user input).

  4. recce/data/mcp/impact_analysis.html:546-561 — the mini-graph fabricates edges: every modified model is connected to every downstream model (cartesian product), visually asserting lineage relationships that may not exist.
    Evidence: for (const srcName of modifiedNames) { ... for (const dstName of downstreamNames) { ... appendChild(path) (comment admits "simplified").
    Pass A.

  5. recce/data/mcp/query_diff.html:424-436 / value_diff_detail.html:384-397in_a/in_b are located by key || name but filtered out by name only → header/data column misalignment when they diverge; and when the columns are absent (inAIdx === -1) every row is labeled "Removed".
    Evidence: findIndex(c => c.key === "in_a" || c.name === "in_a") vs filter(c => c.name !== "in_a"); const status = (!inB) ? "removed" : "added" with inB=false fallback.
    Pass A.

  6. recce/data/mcp/value_diff.html:401,426 — "All values match" banner renders when every matched_p is null (false-quiet, against the repo's false-loud principle), and the derived mismatched-row count can render negative.
    Evidence: null-excluded mismatchedCols + added === 0 && removed === 0 → "Every column is 100% identical"; Math.round(common - matchedCount) unclamped.
    Pass A.

  7. 11 of 15 widgets call render() with no try/catch — any throw leaves the widget stuck on "Loading…" forever.
    Evidence: bare app.ontoolresult = ({ structuredContent }) => { render(structuredContent ?? {}); } in query, query_diff, row_count_diff, value_diff_detail, schema_diff, list_checks, top_k_diff, histogram_diff, value_diff, get_model, get_server_info; only get_cll/impact_analysis/lineage_diff/profile_diff wrap it.
    Pass D.

  8. recce/widget_server.py:295-301schema_diff runs get_lineage_diff() + adapter.select_nodes synchronously inside the async handler (no executor), blocking the FastMCP event loop for all concurrent tool/resource calls; the get_model/get_cll/histogram_diff delegates share the pattern (inherited, but newly exposed to FastMCP concurrency).
    Pass H.

  9. recce/widget_server.py:531,617-628 — the documented not_found: bool miss-signal is dead code: _tool_get_model raises ValueError when the model is missing (mcp_server.py:2179-2183), so the branch never fires and the docstring contract is wrong.
    Pass C.

  10. recce/widget_server.py:360-370get_server_info docstring promises base_status "reveals stale or missing artifacts", but only run_mcp_server runs the freshness check; the widget server always reports "unknown"/"single_env".
    Evidence: _base_status set only at mcp_server.py:2815; run_widget_server (widget_server.py:2528-2573) never sets it.
    Pass C.

  11. Tests — schema_diff is the only widget tool whose handler path is never executed (the very tool that had the serialization regression); zero error-path tests across the 15 delegates (handler raising, None raw results); the call_tool widget-rejection guard (mcp_server.py:1319-1321) and the run_widget_server/mcp_widget_server bootstrap are untested (every test injects ws._recce_server directly).
    Evidence: no ws.schema_diff(...) call in tests/test_widget_server.py; only error tests are missing-HTML (line 143) and lineage over-cap (line 2290).
    Pass E.

  12. CLAUDE.md:101-137 — machine-local GBrain guidance ("GBrain is set up and synced on this machine", .gbrain-source, ~/.gstack/) committed into the shared repo's agent instructions; false/noise for every other contributor's environment.
    Pass F.

Notes

  1. recce/widget_server.py:60-64SchemaChange fields typed List[Dict[str, str]] but _compute_schema_changes emits .get("type") values that can be None (the documented present-but-None CLAUDE.md gotcha); use Optional[str]. Pass C.
  2. recce/widget_server.py:71-77GitInfo.base_branch/base_sha/current_sha can never populate (state GitRepoInfo has only branch); the matching render block in get_server_info.html:296-309 is dead. Pass C.
  3. recce/widget_server.py:1975get_cll pops _warning but _tool_get_cll never adds one, and lineage_diff has no warning field at all — in single-env mode it renders an empty diff that reads as clean (same false-quiet class as the acknowledged schema_diff deferral, which covers only schema_diff). Pass A.
  4. recce/data/mcp/profile_diff.html:283,333 — string stats are escaped twice (R&D displays as R&amp;D); strStatRow null-checks miss undefined. Pass A.
  5. recce/data/mcp/lineage_diff.html:193,255,389 — layout keyed on n.idx with no validation; missing/duplicate idx stacks all nodes at one position. Pass A.
  6. recce/data/mcp/row_count_diff.html:296-311=== null guards miss undefined (safe under the current Pydantic dump contract, fragile to it), and the summary "Changed" counter counts null-vs-value rows the badge shows as "N/A". Pass A.
  7. recce/data/mcp/query_diff.html:485 / value_diff_detail.html:444 — inline onclick handlers break silently under a strict host CSP; prefer addEventListener. Pass A.
  8. recce/data/mcp/top_k_diff.html:359,410 — only base.values is iterated (current-only categories depend on an unasserted server union contract) and counts aren't coerced to Number (string counts concatenate in the "shown" stat). Pass A.
  9. tests/test_check_base.py:77+ — seven dead patch("recce.git.current_commit_hash") contexts and the stale test_status_stale_sha name survive the opt-in change; expected_base_sha has no production caller (acknowledged deferral). Pass E/F.
  10. PR description says "18-vs-20 tools"; the real split is 20 widgets-off / 5 widgets-on / 15 widget tools. Stale test name ..._six_tools_and_six_resources (tests/test_widget_server.py:89) asserts 15. Pass F.
  11. Verification: black/isort/flake8 clean at 6fb3bff; all PR test files pass; the 19 failing tests in this env (test_cli_cache, test_server SPA routes, …) fail identically on origin/main — pre-existing/environmental, not introduced here.

Multi-pass adversarial review (recce-dev:claude-code-review) — passes A–I + security skill; widget_server.py, 15 HTML widgets, and test suite each independently reviewed and cross-refuted.

@even-wei even-wei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review: Critical issues found (2 blockers, 12 issues). See review comment: #1397 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants