Epic: Bencmarks

This Epic is for tracking on-going efforts, inititives, ideas and issues related to our benchmarking tooling.

As of this writing, our setup includes multiple scripts, Github Actions workflows, a python tool (that is used both locally and in CI) and multiple rust binaries.

## Status

**Proposed.**

SQL benchmarks are the most interesting to expand, so we're going to focus on that.
The ideal flow as we see it:
1. Most orchestration should be in the python tool, only written once for every engine/format. Every benchmark should be just a "config" or something that the tool can interpret. 
2. Rust binaries do the minimum amount of work possible - run queries on existing data. This improves isolation and in-process noise like memory fragmentation.
3. We want to be able to purge OS-level caches or any other cleanup we might want between invocations of the rust binaries.
4. The underlying data should have a better schema that easier to use, less bespoke parsing and more structure. This should also enable more exploratory and structured analysis of the data (For example - being able to estimate noise over time seems very useful, and requires a bunch of work today).
5. Data generation should be explicit - making it clearer what is the data we benchmark on, and how is it created.

## Goal



## Motivation


## Unresolved questions

1. Where does noise come from, and how has every mitigation we tried change it.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Bencmarks #7718

Status

Goal

Motivation

Unresolved questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Epic: Bencmarks #7718

Description

Status

Goal

Motivation

Unresolved questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions