Skip to content

feat: add clickhouse-bench with auto-downloaded ClickHouse binary#6736

Open
fastio wants to merge 7 commits intovortex-data:developfrom
fastio:integration-clickhouse-benchmark-baseline
Open

feat: add clickhouse-bench with auto-downloaded ClickHouse binary#6736
fastio wants to merge 7 commits intovortex-data:developfrom
fastio:integration-clickhouse-benchmark-baseline

Conversation

@fastio
Copy link

@fastio fastio commented Mar 2, 2026

Introduce a new clickhouse-bench benchmark crate that runs ClickBench queries against Parquet data via clickhouse-local, providing a baseline for comparing Vortex performance against ClickHouse.

Key design decisions:

  • build.rs auto-downloads the full ClickHouse binary (with Parquet support) into target/clickhouse-local/, similar to how vortex-duckdb downloads the DuckDB library. This eliminates manual install steps and avoids issues with slim/homebrew builds lacking Parquet support.
  • The binary path is baked in via CLICKHOUSE_BINARY env at compile time; CLICKHOUSE_LOCAL env var allows runtime override.
  • ClickHouse-dialect SQL queries are maintained in a separate clickbench_clickhouse_queries.sql file (43 queries).
  • CI workflows updated to include clickhouse:parquet target in ClickBench benchmarks and conditionally build clickhouse-bench.

#6425

@myrrc myrrc self-requested a review March 2, 2026 10:32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this file is it difference to the already included one?

Copy link
Author

@fastio fastio Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I have removed the duplicate clickbench_clickhouse_queries.sql and validated with cargo check -p vortex-bench.

Copy link
Contributor

@myrrc myrrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think downloading untrusted binaries from internet via a build script is a good idea. We want first-class integration with duckdb thus we need to download its sources (although I'd not do it in build script as well), but we don't need such integration with Clickhouse yet.

My idea is to use clickhouse binary in CI (as it runs on Linux only) and require users to download it by hand if they want a local run. Benchmarking on MacOS doesn't make much sense anyway as vectorized instrustion set is different.

@fastio
Copy link
Author

fastio commented Mar 3, 2026

I don't think downloading untrusted binaries from internet via a build script is a good idea. We want first-class integration with duckdb thus we need to download its sources (although I'd not do it in build script as well), but we don't need such integration with Clickhouse yet.

My idea is to use clickhouse binary in CI (as it runs on Linux only) and require users to download it by hand if they want a local run. Benchmarking on MacOS doesn't make much sense anyway as vectorized instrustion set is different.

Agreed — removed the binary download from build.rs entirely. The clickhouse binary is now resolved at runtime: via CLICKHOUSE_BINARY env var or from $PATH. CI installs it via the official installer before building. Local users need to install it manually. No more untrusted binary downloads in the build script.

- name: Install ClickHouse
if: contains(matrix.targets, 'clickhouse:')
run: |
curl https://clickhouse.com/ | sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not download the latest release file for our architecture from Github releases? We then don't need any installation and curl in general.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — updated CI to download the static binary directly from GitHub Releases (pined ClickHouse to LTS release v25.8.18.1 from GitHub Releases), no curl | sh or installation needed.

@fastio fastio requested review from joseph-isaacs and myrrc March 9, 2026 02:22
return query.to_string();
}

strip_simple_identifier_quotes(query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clickhouse does handle quoted identifiers correctly so I think we can pass them through to reduce this PR's diff.

@myrrc
Copy link
Contributor

myrrc commented Mar 9, 2026

The changes look good to me conceptually, let's see what the CI run says.

@myrrc myrrc self-assigned this Mar 9, 2026
@fastio fastio force-pushed the integration-clickhouse-benchmark-baseline branch from 82c11d1 to 60aa2d2 Compare March 10, 2026 13:13
fastio added 6 commits March 10, 2026 21:27
Introduce a new clickhouse-bench benchmark crate that runs ClickBench
queries against Parquet data via clickhouse-local, providing a baseline
for comparing Vortex performance against ClickHouse.

Key design decisions:
- build.rs auto-downloads the full ClickHouse binary (with Parquet
  support) into target/clickhouse-local/, similar to how vortex-duckdb
  downloads the DuckDB library. This eliminates manual install steps
  and avoids issues with slim/homebrew builds lacking Parquet support.
- The binary path is baked in via CLICKHOUSE_BINARY env at compile time;
  CLICKHOUSE_LOCAL env var allows runtime override.
- ClickHouse-dialect SQL queries are maintained in a separate
  clickbench_clickhouse_queries.sql file (43 queries).
- CI workflows updated to include clickhouse:parquet target in
  ClickBench benchmarks and conditionally build clickhouse-bench.

Signed-off-by: fastio <pengjian.uestc@gmail.com>
…dling

Signed-off-by: fastio <pengjian.uestc@gmail.com>
…use from PATH

- Remove reqwest-based binary download from build.rs
- Resolve clickhouse binary via CLICKHOUSE_BINARY env var or $PATH at runtime
- Add CI step to install clickhouse before building when needed
- Fail with clear error message if binary is not found locally

Signed-off-by: fastio <pengjian.uestc@gmail.com>
- Pass subcommand arg to clickhouse-bench in run-sql-bench.sh for consistency
- Use BenchmarkArg + create_benchmark() in main.rs like other engines
- Replace `which` with `clickhouse local --version` for binary verification
- Pin ClickHouse to LTS release v25.8.18.1 from GitHub Releases

Signed-off-by: fastio <pengjian.uestc@gmail.com>
…identifier handling.

Queries are now returned as-is without dialect-specific transformation.

Signed-off-by: fastio <pengjian.uestc@gmail.com>
Signed-off-by: fastio <pengjian.uestc@gmail.com>
@fastio fastio force-pushed the integration-clickhouse-benchmark-baseline branch from 60aa2d2 to 5aa201a Compare March 10, 2026 13:28
Signed-off-by: Peng Jian <pengjian.uestc@gmail.com>
@codspeed-hq
Copy link

codspeed-hq bot commented Mar 10, 2026

Merging this PR will degrade performance by 20.89%

❌ 4 regressed benchmarks
✅ 996 untouched benchmarks
⏩ 1466 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation patched_take_200k_dispersed 4.7 ms 5.6 ms -16.59%
Simulation take_200k_first_chunk_only 3.3 ms 4.2 ms -20.89%
Simulation patched_take_200k_first_chunk_only 4.8 ms 5.4 ms -10.69%
Simulation take_200k_dispersed 3.6 ms 4.5 ms -19.61%

Comparing fastio:integration-clickhouse-benchmark-baseline (fb28623) with develop (e477fa5)

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@connortsui20
Copy link
Contributor

connortsui20 commented Mar 10, 2026

@fastio Feel free to ping us in the public slack channel if you want us to run CI for you! (Feel free to ping me here as well)

Edit: To fix the CI issues right now, could you update the lockfile with cargo check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants