This Epic is for tracking on-going efforts, inititives, ideas and issues related to our benchmarking tooling.
As of this writing, our setup includes multiple scripts, Github Actions workflows, a python tool (that is used both locally and in CI) and multiple rust binaries.
Status
Proposed.
SQL benchmarks are the most interesting to expand, so we're going to focus on that.
The ideal flow as we see it:
- Most orchestration should be in the python tool, only written once for every engine/format. Every benchmark should be just a "config" or something that the tool can interpret.
- Rust binaries do the minimum amount of work possible - run queries on existing data. This improves isolation and in-process noise like memory fragmentation.
- We want to be able to purge OS-level caches or any other cleanup we might want between invocations of the rust binaries.
- The underlying data should have a better schema that easier to use, less bespoke parsing and more structure. This should also enable more exploratory and structured analysis of the data (For example - being able to estimate noise over time seems very useful, and requires a bunch of work today).
- Data generation should be explicit - making it clearer what is the data we benchmark on, and how is it created.
Goal
Motivation
Unresolved questions
- Where does noise come from, and how has every mitigation we tried change it.
This Epic is for tracking on-going efforts, inititives, ideas and issues related to our benchmarking tooling.
As of this writing, our setup includes multiple scripts, Github Actions workflows, a python tool (that is used both locally and in CI) and multiple rust binaries.
Status
Proposed.
SQL benchmarks are the most interesting to expand, so we're going to focus on that.
The ideal flow as we see it:
Goal
Motivation
Unresolved questions