Skip to content

Check in oxql benchmark.#10124

Merged
jmcarp merged 1 commit intomainfrom
jmcarp/oxql-field-bench
Mar 24, 2026
Merged

Check in oxql benchmark.#10124
jmcarp merged 1 commit intomainfrom
jmcarp/oxql-field-bench

Conversation

@jmcarp
Copy link
Copy Markdown
Contributor

@jmcarp jmcarp commented Mar 23, 2026

Add oxql field lookup benchmark. To avoid the complication of generating realistic synthetic data, we provide scripts for fetching and loading fields from a running ClickHouse instance.

Checking in to benchmark changes in #10110.

@jmcarp jmcarp requested a review from bnaecker March 23, 2026 17:44
Copy link
Copy Markdown
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thanks! I have a few small suggestions, but they're just that. Take them or leave them :)

Comment on lines +30 to +33
echo "Backing up $DATABASE.$table ($count rows) to $output"
clickhouse client --port "$PORT" \
--query "SELECT * FROM $DATABASE.$table FORMAT Native" \
| gzip > "$output"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClickHouse has specific syntax for creating a backup, including putting compression on it. If that doesn't work, I'd also recommend using SELECT ... INTO OUTFILE like this link. It's not a huge deal, but it does avoid the pipeline and redirect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up doing this because (1) clickhouse on the rack doesn't enable backups (although I'm going to fix that in a minute) and (2) when I'm copying data from a real clickhouse to a benchmark instance, I don't have easy access to the local disk of the source clickhouse. Added comments about this.

fi
echo "Loading $table"
gunzip -c "$input" | clickhouse client --port "$PORT" \
--query "INSERT INTO $DATABASE.$table FORMAT Native"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the BACKUP statement above, we can use RESTORE here as well. See https://clickhouse.com/docs/operations/backup/disk#backup-and-restore-a-table.


const DEFAULT_CLICKHOUSE_PORT: u16 = 9000;

/// Timeseries for field lookup benchmarks, with their field table counts.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to avoid hardcoding these. We could use the use_timeseries! macro directly and then count the fields? Thank kind of sucks too. At least make a note that we need to change these if the schema change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a little less bad. I'm still hard-coding series names, but I changed to having the benchmark script look up the number of field tables, plus cardinality, of each of those metrics.

const DEFAULT_CLICKHOUSE_PORT: u16 = 9000;

/// Timeseries for field lookup benchmarks, with their field table counts.
const FIELD_TIMESERIES: &[(&str, u8)] = &[
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the backup / restore above to include only this data? Or is it intentionally selecting everything so that there's more filtering work for the database?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-hoc rationalization: yes, it's arguably a fairer benchmark if we have data from other series in there. Real reason: backing up and restoring the fields was quick enough that I didn't think too hard about running time.

Comment on lines +79 to +80
// Use a far-future timestamp to benchmark field lookup only, with no
// measurements.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, I think, but we don't have to do it. the OxqlQueryResult type returns a Vec<OxqlQuerySummary>. Each of those corresponds to a single SQL query we run in the database. The first of those is the field query today. I don't really know that that's better, just an idea.

Also, looking at that would allow us to track the I/O usage (bytes and rows read), which would be useful to know. We don't take memory consumption now, but there's nothing preventing us from adding that functionality into the OxQL query engine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this for now, but I like the idea of adding more information to the benchmark. I don't see that criterion lets us add multiple custom metrics to a benchmark just now, but it would be interesting to record latency, io, cpu, etc., in the same benchmark.

@jmcarp jmcarp force-pushed the jmcarp/oxql-field-bench branch 2 times, most recently from 5eadce7 to e333c23 Compare March 24, 2026 13:42
Add oxql field lookup benchmark. To avoid the complication of generating
realistic synthetic data, we provide scripts for fetching and loading
fields from a running ClickHouse instance.
@jmcarp jmcarp force-pushed the jmcarp/oxql-field-bench branch from e333c23 to c1dadec Compare March 24, 2026 14:23
@jmcarp jmcarp enabled auto-merge (squash) March 24, 2026 14:52
@jmcarp jmcarp merged commit 210bdf9 into main Mar 24, 2026
17 checks passed
@jmcarp jmcarp deleted the jmcarp/oxql-field-bench branch March 24, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants