Skip to content

Scaffold server query pipeline & streaming API #353

Open
sagrimson wants to merge 6 commits into
datacommonsorg:mainfrom
madebypxlp:scott/api-route
Open

Scaffold server query pipeline & streaming API #353
sagrimson wants to merge 6 commits into
datacommonsorg:mainfrom
madebypxlp:scott/api-route

Conversation

@sagrimson
Copy link
Copy Markdown

@sagrimson sagrimson commented Jun 3, 2026

Overview

Adds the streaming query API route and supporting server-side pipeline that powers Data Weaver's natural-language-to-chart workflow. A user's query flows through safety checks, analysis, Gemini tool-calling with the Data Commons MCP server, and streams structured chart specifications back to the client via SSE.

Changes Made

  • Streaming API route (/api/query) — POST endpoint that accepts a natural-language query and streams SSE events (status, analysis, tool_call, chart_spec, metadata, error, complete)
  • Server pipeline (src/server/)
    • safety.ts — layered prompt safety gate (regex patterns + LLM classification)
    • analyze/ — extracts places, topic, titles, and date ranges from the query via Gemini
    • query/ — Gemini tool-loop that iteratively calls MCP tools to resolve variables
    • observations/ — fetches variable metadata/facets from Data Commons
    • mcp.ts — JSON-RPC client for the Data Commons MCP server
    • gemini.ts — shared Gemini client factory
    • dc-api.ts — Data Commons REST API helpers
    • config.ts + config/ — service config and skill prompt loading
    • types.ts — shared types for the query pipeline and SSE events
  • Client streaming hook (use-streaming-query.ts) — SSE parser with abort support and typed event dispatch

@sagrimson sagrimson requested a review from a team as a code owner June 3, 2026 00:43
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the 'Atlas' scope component, integrating tldraw to render interactive text and chart cards on a canvas, alongside supporting elements like buttons, skeletons, and custom icons. On the backend, it implements a streaming query API powered by Gemini and Data Commons MCP tools, complete with query analysis, safety filtering, and metadata fetching. The review feedback focuses on aligning with repository conventions—such as renaming files to snake_case and applying category-first naming to components—and ensuring strict TypeScript compliance. Additionally, the reviewer recommends enhancing robustness by adding unmount cleanup hooks to prevent memory leaks, caching compiled regular expressions for better performance, and wrapping JSON parsing in try-catch blocks to prevent runtime TypeErrors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread dataweaver/apps/web/src/hooks/use_cached_resize_values.ts Outdated
Comment thread dataweaver/apps/web/src/hooks/use-streaming-query.ts Outdated
Comment thread dataweaver/apps/web/src/hooks/use_streaming_query.ts
Comment thread dataweaver/apps/web/src/components/scopes/atlas/components/grid.tsx Outdated
Comment thread dataweaver/apps/web/src/components/elements/card/chart/conditional_tabs.tsx Outdated
Comment thread dataweaver/apps/web/src/server/query/index.ts Outdated
Comment thread dataweaver/apps/web/src/hooks/use_streaming_query.ts
Comment thread dataweaver/apps/web/src/server/dc_api.ts
Comment thread dataweaver/apps/web/src/components/dev/test/test_input.tsx Outdated
Comment thread dataweaver/apps/web/src/components/dev/test/test_output.tsx Outdated
@sagrimson sagrimson force-pushed the scott/api-route branch 3 times, most recently from b80edeb to c108463 Compare June 3, 2026 17:16
@sagrimson sagrimson requested review from beets and nick-nlb June 3, 2026 17:46
Copy link
Copy Markdown
Contributor

@beets beets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The scaffolding and overall flow for the data discovery query pipeline are well-designed. Since this is an initial commit, most of these comments are fine to be added as TODO's in the code for future refactor.

headers: { 'Content-Type': 'application/json' },
});
}
const { query, atlasContext } = body;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selectedEntityDcids is defined in the request body but not yet destructured here. For follow-up queries (where parsed.places is empty), falling back to selectedEntityDcids will ensure previous place context is preserved.

Also, adding a brief verification check ensuring query is a valid string before invoking safety filters will safeguard against runtime TypeErrors.

Comment on lines +155 to +156
const entityDcid =
parsedResponse.entityDcid || resolvedPlaceDcid || place;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a place cannot be resolved to a valid DCID, report a warning status or fail gracefully rather than calling the observations API with the raw name.

Comment on lines +93 to +94
? `\n\nDATE CONSTRAINT: The user wants data limited to ${parsed.dateRange.start && parsed.dateRange.end ? `years ${parsed.dateRange.start} through ${parsed.dateRange.end}` : parsed.dateRange.start ? `years from ${parsed.dateRange.start} onward` : `years up to ${parsed.dateRange.end}`}.`
: '';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date constraints might not be limited to years. Could we pass the dates directly in the prompt without assumptions, perhaps as a structured input.

});

if (response.functionCalls && response.functionCalls.length > 0) {
const fc = response.functionCalls[0];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini models frequently output multiple tool calls concurrently. Currently, this extracts only response.functionCalls[0]. As we iterate, let's loop over all calls in response.functionCalls, execute each, and push their tool responses back in sequence to avoid API conversation history mismatch errors. Could add a TODO for now as a reminder.

Comment on lines +130 to +133
const cleaned = responseText
.replace(/```json/g, '')
.replace(/```/g, '')
.trim();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of stripping out the json markdown, consider searching for the json block instead to be resilient against unexpected model commentary:

.match(/\{[\s\S]*\}/)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consolidate all our agent instructions into the skills folder?

Comment on lines +32 to +34
.replace(/```json/g, '')
.replace(/```/g, '')
.trim();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to earlier comment on handling possible LLM preamble.

const jsonMatch = responseText.match(/\{[\s\S]*\}/);
const cleaned = jsonMatch ? jsonMatch[0] : responseText;
parsedResponse = JSON.parse(cleaned);


const response = await genAI.models.generateContent({
model: config.models.parseQuery,
contents: `${systemPrompt}\n\nQuery: "${query}"\nReturn ONLY its JSON object, no markdown, no other text or explanation.`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the return formatting instruction is already part of the system prompt

@@ -0,0 +1,59 @@
import { getServiceConfig } from './config';

export interface ObservationResponse {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :) thanks for adding the type for this.

Comment on lines +100 to +105
| { type: 'status'; message: string }
| { type: 'parsed_query'; data: ParsedQuery }
| { type: 'tool_call'; tool: string; args: Record<string, unknown> }
| { type: 'query_result'; result: QueryResult; place: string }
| { type: 'complete' }
| { type: 'error'; message: string };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we lift out the types of these events to string constants?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants