Conversation
Merging this PR will improve performance by 18.36%
Performance Changes
Comparing Footnotes
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.025x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.024x ➖, 0↑ 0↓)
datafusion / parquet (1.023x ➖, 0↑ 1↓)
datafusion / arrow (1.032x ➖, 0↑ 3↓)
duckdb / vortex-file-compressed (1.021x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.024x ➖, 0↑ 0↓)
duckdb / parquet (1.033x ➖, 2↑ 5↓)
duckdb / duckdb (1.032x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.881x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.001x ➖, 2↑ 2↓)
datafusion / parquet (0.974x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (0.996x ➖, 1↑ 0↓)
duckdb / vortex-compact (1.065x ➖, 0↑ 2↓)
duckdb / parquet (0.969x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.954x ➖, 7↑ 0↓)
datafusion / vortex-compact (0.963x ➖, 1↑ 1↓)
datafusion / parquet (0.964x ➖, 4↑ 1↓)
duckdb / vortex-file-compressed (0.953x ➖, 10↑ 1↓)
duckdb / vortex-compact (0.958x ➖, 5↑ 2↓)
duckdb / parquet (0.973x ➖, 2↑ 1↓)
duckdb / duckdb (0.957x ➖, 8↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.017x ➖, 0↑ 1↓)
datafusion / parquet (1.017x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (0.978x ➖, 4↑ 4↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
duckdb / duckdb (1.029x ➖, 0↑ 2↓)
Full attributed analysis
|
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.007x ➖ datafusion / vortex-file-compressed (1.007x ➖, 0↑ 0↓)
|
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.026x ➖, 0↑ 2↓)
datafusion / vortex-compact (1.026x ➖, 0↑ 1↓)
datafusion / parquet (1.001x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.998x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.011x ➖, 0↑ 0↓)
duckdb / parquet (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.977x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.982x ➖, 1↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.006x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.006x ➖, 0↑ 0↓)
datafusion / parquet (1.012x ➖, 0↑ 0↓)
datafusion / arrow (1.006x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.008x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.008x ➖, 0↑ 0↓)
duckdb / parquet (1.018x ➖, 0↑ 2↓)
duckdb / duckdb (1.008x ➖, 0↑ 0↓)
Full attributed analysis
|
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
eeaea6d to
355df2d
Compare
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.100x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.848x ➖, 3↑ 0↓)
datafusion / parquet (0.974x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.018x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.972x ➖, 0↑ 0↓)
duckdb / parquet (0.996x ➖, 0↑ 0↓)
Full attributed analysis
|
| // actually compressing data. | ||
| let mut codes_excludes = vec![IntCode::Dict, IntCode::Sequence]; | ||
| codes_excludes.extend_from_slice(excludes); |
There was a problem hiding this comment.
what goes wrong with sequence array?
There was a problem hiding this comment.
we want to have duplicates in dictionary codes by definition which means the codes can never be a sequence array
There was a problem hiding this comment.
that makes sense, is that documented?
There was a problem hiding this comment.
the comment right above?
There was a problem hiding this comment.
Note that Im working on something that will make this kind of reasoning a lot more obvious, if you are interested: #7018
Benchmarks: Random AccessVortex (geomean): 0.894x ✅ unknown / unknown (0.980x ➖, 7↑ 1↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.992x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.080x ➖, 1↑ 4↓)
datafusion / parquet (0.989x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.117x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.083x ➖, 0↑ 1↓)
duckdb / parquet (1.063x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.014x ➖ unknown / unknown (1.022x ➖, 0↑ 11↓)
|
Summary
Compress dictionary-encoded integer array values.
Note that we already do this for dictionary-encoded float array values.
Testing
N/A