Conversation
bnaecker
left a comment
There was a problem hiding this comment.
This is awesome, thanks! I have a few small suggestions, but they're just that. Take them or leave them :)
| echo "Backing up $DATABASE.$table ($count rows) to $output" | ||
| clickhouse client --port "$PORT" \ | ||
| --query "SELECT * FROM $DATABASE.$table FORMAT Native" \ | ||
| | gzip > "$output" |
There was a problem hiding this comment.
ClickHouse has specific syntax for creating a backup, including putting compression on it. If that doesn't work, I'd also recommend using SELECT ... INTO OUTFILE like this link. It's not a huge deal, but it does avoid the pipeline and redirect.
There was a problem hiding this comment.
I wound up doing this because (1) clickhouse on the rack doesn't enable backups (although I'm going to fix that in a minute) and (2) when I'm copying data from a real clickhouse to a benchmark instance, I don't have easy access to the local disk of the source clickhouse. Added comments about this.
| fi | ||
| echo "Loading $table" | ||
| gunzip -c "$input" | clickhouse client --port "$PORT" \ | ||
| --query "INSERT INTO $DATABASE.$table FORMAT Native" |
There was a problem hiding this comment.
If we use the BACKUP statement above, we can use RESTORE here as well. See https://clickhouse.com/docs/operations/backup/disk#backup-and-restore-a-table.
oximeter/db/benches/oxql.rs
Outdated
|
|
||
| const DEFAULT_CLICKHOUSE_PORT: u16 = 9000; | ||
|
|
||
| /// Timeseries for field lookup benchmarks, with their field table counts. |
There was a problem hiding this comment.
It'd be nice to avoid hardcoding these. We could use the use_timeseries! macro directly and then count the fields? Thank kind of sucks too. At least make a note that we need to change these if the schema change.
There was a problem hiding this comment.
I made this a little less bad. I'm still hard-coding series names, but I changed to having the benchmark script look up the number of field tables, plus cardinality, of each of those metrics.
oximeter/db/benches/oxql.rs
Outdated
| const DEFAULT_CLICKHOUSE_PORT: u16 = 9000; | ||
|
|
||
| /// Timeseries for field lookup benchmarks, with their field table counts. | ||
| const FIELD_TIMESERIES: &[(&str, u8)] = &[ |
There was a problem hiding this comment.
Do we want the backup / restore above to include only this data? Or is it intentionally selecting everything so that there's more filtering work for the database?
There was a problem hiding this comment.
Post-hoc rationalization: yes, it's arguably a fairer benchmark if we have data from other series in there. Real reason: backing up and restoring the fields was quick enough that I didn't think too hard about running time.
| // Use a far-future timestamp to benchmark field lookup only, with no | ||
| // measurements. |
There was a problem hiding this comment.
This is fine, I think, but we don't have to do it. the OxqlQueryResult type returns a Vec<OxqlQuerySummary>. Each of those corresponds to a single SQL query we run in the database. The first of those is the field query today. I don't really know that that's better, just an idea.
Also, looking at that would allow us to track the I/O usage (bytes and rows read), which would be useful to know. We don't take memory consumption now, but there's nothing preventing us from adding that functionality into the OxQL query engine.
There was a problem hiding this comment.
I'm going to leave this for now, but I like the idea of adding more information to the benchmark. I don't see that criterion lets us add multiple custom metrics to a benchmark just now, but it would be interesting to record latency, io, cpu, etc., in the same benchmark.
5eadce7 to
e333c23
Compare
Add oxql field lookup benchmark. To avoid the complication of generating realistic synthetic data, we provide scripts for fetching and loading fields from a running ClickHouse instance.
e333c23 to
c1dadec
Compare
Add oxql field lookup benchmark. To avoid the complication of generating realistic synthetic data, we provide scripts for fetching and loading fields from a running ClickHouse instance.
Checking in to benchmark changes in #10110.