feat: Add "Convert to SQL" option for streaming supervisors by beenhead · Pull Request #19547 · apache/druid

beenhead · 2026-06-03T12:46:07Z

Have you ever stood up a streaming supervisor, only to later wish you could replay that same data as a batch (SQL-based) ingestion? Well, be prepared to have your experience taken to the next level!

This PR adds a Convert supervisor to SQL flow to the web console query view. It complements the existing Convert ingestion spec to SQL tool: where that one migrates native batch and Hadoop specs, this one takes a streaming supervisor (Kafka/Kinesis) and generates an equivalent Multi-Stage Query INSERT statement that reads from files instead of a stream — handy for backfills, reprocessing, and one-off migrations.

Convert supervisor to SQL dialog

A new Convert supervisor to SQL item is available from the ... menu in the query view.

Clicking it opens a dialog that walks you through the conversion.

Pick your supervisor

You can either select an existing supervisor (the dialog fetches the list from /druid/indexer/v1/supervisor and loads the spec on selection) or paste a supervisor JSON spec directly. Pasting is validated as you type, so malformed JSON surfaces an inline error rather than failing silently.

Point it at your data

Because a streaming supervisor has no batch input source, the dialog asks where the equivalent files live. It pre-populates the file location from the supervisor's ioConfig.inputSource when one is present, and you can pick the file type (JSON, CSV, Parquet, or ORC). The location scheme is used to build the right input source:

s3://… → an s3 input source (with an objectGlob added automatically for directory locations)
gs://… → a google input source
http://… / https://… → an http input source
anything else (including file://…) → a local input source

Generate the SQL

Clicking Generate SQL converts the spec and drops the resulting query into a new tab for you to review and edit before running. The conversion:

Builds a SELECT … FROM TABLE(EXTERN(…)) over the chosen files
Maps the supervisor's metricsSpec aggregators to their SQL equivalents (longSum → SUM, thetaSketch → APPROX_COUNT_DISTINCT_DS_THETA, HLLSketchBuild → APPROX_COUNT_DISTINCT_DS_HLL, and so on), adding a GROUP BY when rollup aggregations are present
Parses the timestamp via TIME_PARSE using the supervisor's timestampSpec
Emits PARTITIONED BY DAY and CLUSTERED BY the leading dimensions

Tests

Added supervisor-conversion.spec.ts covering the conversion helper: rollup vs. non-rollup queries, the full set of supported metric aggregations and their non-default arguments, dropping of unsupported metrics, input-source detection for each scheme, timestamp handling, partitioning/clustering, and the error paths.
The existing supervisor-to-sql-dialog.spec.tsx covers basic rendering and the close action.

FrankChen021

Severity	Findings
P0	0
P1	2
P2	4
P3	0
Total	6

Reviewed 9 of 9 changed files.

This is an automated review by Codex GPT-5.5

FrankChen021 · 2026-06-04T13:03:22Z

+  }
+
+  // Build EXTERN expression with proper escaping
+  const columnSchema = uniqueExternColumns.map(col => ({name: col, type: 'string'}));


[P1] EXTERN declares numeric inputs as strings

The generated EXTERN signature marks every raw column as string, including metric fields used by SUM, MIN, MAX, sketches, and typed dimensions. A common longSum or doubleSum supervisor will therefore generate numeric aggregations over VARCHAR inputs, which can fail planning or ingest with wrong types. Preserve metric and dimension native types as the existing spec converter does.

Good catch. The EXTERN signature now declares each column with its native type instead of string. Dimensions map via dimensionTypeToSignatureType (long/float/double preserved, json → COMPLEX), and metric fieldNames map via metricFieldSignatureType, mirroring the ingestion-spec converter (longSum → LONG, doubleSum → DOUBLE, sketches read from a STRING input, etc.). A dimension's type wins when a name is both a dimension and a metric field, so SUM/MIN/MAX/sketches now aggregate over correctly-typed inputs. Added a test asserting the resulting signature types.

FrankChen021 · 2026-06-04T13:03:22Z

+  sqlString = `INSERT INTO ${C(datasource)}\n${sqlString}`;
+
+  // Append PARTITIONED BY
+  sqlString += `\nPARTITIONED BY DAY`;


[P1] Supervisor granularity is ignored

The conversion always appends PARTITIONED BY DAY and never applies dataSchema.granularitySpec.queryGranularity. Supervisors commonly use HOUR segment granularity or non-none query granularity, so the generated batch SQL can create different segment layout and different rollup grouping from the streaming supervisor.

Fixed in aaa3143. Both halves are now handled:

Segment granularity → PARTITIONED BY: instead of the hardcoded PARTITIONED BY DAY, we read dataSchema.granularitySpec.segmentGranularity — all → ALL TIME, otherwise uppercased (e.g. HOUR → PARTITIONED BY HOUR), falling back to DAY only when the spec omits it.

Query granularity → rollup grouping: the parsed timestamp is wrapped in TIME_FLOOR(..., '<period>') based on granularitySpec.queryGranularity (hour → PT1H, etc.; none/unset → no floor). The same floored expression is used in both SELECT ... AS "__time" and GROUP BY, so the rollup buckets match the streaming supervisor.

So a segmentGranularity: HOUR / queryGranularity: hour supervisor now generates TIME_FLOOR(TIME_PARSE("timestamp", 'iso'), 'PT1H') AS "__time" … GROUP BY TIME_FLOOR(...) … PARTITIONED BY HOUR. Added tests covering the HOUR/hour case and the queryGranularity: none` (no TIME_FLOOR) case.

Note: this currently maps the named granularities; a custom { "type": "period", "period": "..." } query granularity falls through to no floor. Happy to extend it to pass the period through if that's a case you'd like covered.

FrankChen021 · 2026-06-04T13:03:22Z

+  const columnSchema = uniqueExternColumns.map(col => ({name: col, type: 'string'}));
+  const columnSchemaJson = JSON.stringify(columnSchema);
+  const inputSourceJson = JSON.stringify(inputSource);
+  const inputFormatJson = JSON.stringify({type: inputFormatType});


[P2] Input format details are discarded

The input format is reduced to only {type: fileType}, dropping the supervisor's ioConfig.inputFormat settings such as CSV columns/header handling, JSON flattenSpec, Kafka metadata formats, Avro/Protobuf schemas, and other parser options. Those supervisors convert to SQL that reads different columns or cannot parse the backfill files.

Addressed. When the supervisor has an ioConfig.inputFormat, its settings (CSV columns/header, JSON flattenSpec, etc.) are now preserved and only type is overridden with the file type chosen in the dialog. When no inputFormat is present we still fall back to { type: fileType }. Added a test confirming flattenSpec is retained while the type is replaced.

FrankChen021 · 2026-06-04T13:03:22Z

+    return (
+      <SupervisorToSqlDialog
+        onConvert={(sql: string) => {
+          this.handleQueryStringChange(sql);


[P2] Conversion overwrites the active tab

Supervisor conversion writes the generated SQL into the current workbench tab, unlike the existing ingestion-spec conversion that opens a new tab. This can replace an unsaved query and also preserves the current tab's explicit engine/context, so an INSERT/EXTERN query can remain on sql-native or another unsuitable engine.

Fixed to match the ingestion-spec flow. Conversion now opens a new workbench tab via handleNewTab(WorkbenchQuery.blank()…) titled Convert <datasource>, rather than calling handleQueryStringChange on the current tab. Using a blank query also means the engine/context are derived fresh for the INSERT/EXTERN instead of inheriting the active tab's, so an unsaved query is no longer replaced.

FrankChen021 · 2026-06-04T13:03:22Z

+  }
+
+  function parsePastedSupervisor() {
+    if (!pastedSupervisor.trim()) {


[P2] Paste mode can submit stale specs

When the pasted JSON is empty, parsePastedSupervisor returns without clearing supervisorSpec. Switching from selected-supervisor mode to paste mode, or clearing a previously valid paste, can leave Generate SQL enabled and convert a hidden stale supervisor instead of the visible input.

Fixed. parsePastedSupervisor now clears supervisorSpec (and the error) when the textarea is empty, and the paste effect always reparses — so switching from select to paste mode, or clearing a valid paste, drops the previously loaded spec and disables Generate SQL. The select effect likewise clears the spec when nothing is selected.

FrankChen021 · 2026-06-04T13:03:22Z

+  const timeParseExpression =
+    timestampFormat === 'auto'
+      ? F('TIME_PARSE', C(timestampColumn))
+      : F('TIME_PARSE', C(timestampColumn), SqlExpression.parse(`'${timestampFormat}'`));


[P2] Timestamp formats are not SQL-escaped

Custom timestamp formats are interpolated directly into a quoted SQL literal. Valid Druid/Joda formats often contain single quotes, for example to quote a literal T, and those produce invalid SQL here. Build this argument with the query toolkit literal helper or otherwise escape single quotes.

Fixed. Custom timestamp formats are now emitted through the query-toolkit L() literal helper, which doubles embedded single quotes — so a format like yyyy-MM-dd'T'HH:mm:ss produces valid SQL. (I also switched the EXTERN JSON arguments to L() for the same escaping guarantee.) Added a test covering the quoted-T case.

- Preserve native column types in the EXTERN signature so numeric metric and typed dimension fields are not declared as strings - Apply the supervisor's segment granularity to PARTITIONED BY and query granularity via TIME_FLOOR (also used in GROUP BY) - Preserve the supervisor's inputFormat settings, overriding only the type - Escape custom timestamp formats with the query-toolkit literal helper - Clear stale specs in the paste-mode dialog so Generate SQL can't submit a hidden supervisor - Open the converted query in a new workbench tab instead of overwriting the active tab - Add tests for the new behaviors and update snapshots Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

vogievetsky · 2026-06-10T22:29:52Z

+
+import { C, F, L, SqlExpression, SqlQuery } from 'druid-query-toolkit';
+
+interface MetricSpec {


This interface already exists in src/druid-models/metric-spec/metric-spec.tsx it should be used here (and extended if needed)

Resolved by deleting it entirely. Rather than reuse just the interface, the converter now rewrites the supervisor into an index_parallel spec and delegates to the existing convertSpecToSql, so there's no metric handling left in this file at all.

vogievetsky · 2026-06-10T22:30:40Z

+  );
+}
+
+function metricSpecToSqlExpression(metricSpec: MetricSpec): SqlExpression | undefined {


This function is a metric spec utility and should live in src/druid-models/metric-spec/metric-spec.tsx

Also gone — metricSpecToSqlExpression/extraArgs were duplicates of what convertSpecToSql already does. Delegating to it removed ~400 lines of parallel logic, so there's nothing left to relocate.

vogievetsky · 2026-06-10T22:31:46Z

+  }
+}
+
+export interface SupervisorSpec {


Can we reuse the IngestionSpec interface here?

Done. convertSupervisorToSql now takes IngestionSpec (a kafka/kinesis supervisor already fits it), and the dialog uses IngestionSpec too. The bespoke SupervisorSpec interface is deleted. The only supervisor-specific work left is swapping the streaming ioConfig for a file inputSource/inputFormat before handing off to convertSpecToSql.

vogievetsky · 2026-06-10T22:35:05Z

+    color: #ff7373;
+  }
+
+  .bp4-form-group {


are these selectors doing anything? We use blueprint 5 not 4 so at the very least they should be .bp5-... really they should use .#{$bp-ns} (search for that usage in the code to see some examples) or possibly assign actual classes to these things

Fixed — the scss now @imports ../../variables and uses .#{$bp-ns}-*. I also removed the raw bp4-select/bp4-fill markup in the TSX, replacing the native <select> with a Blueprint Button + Menu dropdown that matches the engine-selector styling.

vogievetsky · 2026-06-10T22:35:49Z

+ * limitations under the License.
+ */
+
+export * from './supervisor-to-sql-dialog';


In other components we avoid adding an index file with a single export

Removed the folder index.ts; dialogs/index.ts now exports ./supervisor-to-sql-dialog/supervisor-to-sql-dialog directly, matching spec-dialog and the others.

vogievetsky · 2026-06-10T22:37:18Z

+      <div className={Classes.DIALOG_BODY}>
+        <p>
+          Convert a streaming supervisor specification to an MSQ (Multi-Stage Query) ingestion SQL
+          statement.{' '}


I feel like this text should make it very clear that the resulting SQL statement is a one shot thing and will not be streaming.

Reworded the intro: it now states this "generates a one-time batch ingestion that reads the supplied files — it does not start a streaming ingestion and will not continuously ingest new data."

vogievetsky · 2026-06-10T22:40:07Z

+        fileType: 'json',
+      });
+
+      expect(converted.queryString).toMatchSnapshot();


I think it would be cleaner to use toMatchInlineSnapshot in these files

Done — the full-output cases use toMatchInlineSnapshot and the external .snap file is deleted.

Reuse the existing ingestion-spec converter rather than reimplementing it: - convertSupervisorToSql now rewrites the supervisor into an index_parallel spec (file inputSource/inputFormat, default segment granularity, leading dimension clustering) and delegates to convertSpecToSql, so column types, granularity, timestamp parsing, and metric aggregation are all shared - Reuse the IngestionSpec interface instead of a bespoke SupervisorSpec; drop the duplicated MetricSpec interface and metric-to-SQL helpers (~400 lines) - Convert the conversion tests to inline snapshots - Dialog: use IngestionSpec, fix Blueprint 5 scss namespace (.#{$bp-ns}), replace the native select with a Button + Menu dropdown matching the console style, and clarify the SQL is a one-time batch (not streaming) - Don't auto-select the first supervisor; the button shows "Select supervisor" until one is chosen - Remove the single-export index.ts barrel Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

beenhead · 2026-06-11T09:27:20Z

Thanks @vogievetsky — addressed all of these in e8db8e9f55. The main change: instead of paralleling the existing converter, convertSupervisorToSql now rewrites the supervisor spec into an index_parallel spec and delegates to convertSpecToSql. That deletes ~400 lines and resolves the MetricSpec/util/IngestionSpec reuse comments at the root — and it inherits the existing converter's column-type, granularity, and timestamp handling (which also covers the earlier P1/P2 feedback). Remaining comments (Blueprint 5 scss + dropdown, index.ts, one-shot wording, inline snapshots) are all done.

FrankChen021

Reviewed 7 of 7 changed files for this reply follow-up. The latest revision resolves the previously raised issues: supervisor conversion now delegates to the ingestion-spec converter for shared type, timestamp, granularity, input-format, and rollup behavior; workbench conversion opens a new tab; and paste/select mode no longer keeps stale specs. No further inline reply is useful.

This is an automated review by Codex GPT-5.5

beenhead added 2 commits June 3, 2026 10:43

Add supervisor-to-SQL dialog

a67d8e1

Added tests

5347e6b

github-actions Bot added the Area - Web Console label Jun 3, 2026

FrankChen021 reviewed Jun 4, 2026

View reviewed changes

beenhead changed the title ~~Add "Convert to SQL" option for streaming supervisors~~ feat: Add "Convert to SQL" option for streaming supervisors Jun 10, 2026

vogievetsky requested changes Jun 10, 2026

View reviewed changes

FrankChen021 reviewed Jun 11, 2026

View reviewed changes


		import { C, F, L, SqlExpression, SqlQuery } from 'druid-query-toolkit';

		interface MetricSpec {

Conversation

beenhead commented Jun 3, 2026

Convert supervisor to SQL dialog

Pick your supervisor

Point it at your data

Generate the SQL

Tests

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beenhead commented Jun 11, 2026

Uh oh!

FrankChen021 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants