Skip to content

Latest commit

 

History

History
53 lines (45 loc) · 2.3 KB

File metadata and controls

53 lines (45 loc) · 2.3 KB

Results

Tabular Datasets

Below we compare speeds and compressed sizes on 3 real-world datasets. All these results, as well as those from the paper, are available in the results CSVs, e.g. results for columnar datasets on a macbook pro. All benchmarks reported here and in the paper can be easily run via the CLI.

The 3 datasets we display here are:

dataset uncompressed size numeric data types
air quality 59.7MB i16, i32, i64
taxi 2.14GB f64, i32, i64
r/place 4.19GB i32, i64

bar charts showing better compression for Pco than zstd.parquet bar charts showing similar compression speed for Pco and zstd.parquet bar charts showing faster decompression speed for Pco than zstd.parquet

For these results, we used a single performance core of a Macbook Pro M3 Max. Only numerical columns were used. For Blosc, the SHUFFLE filter and the Zstd default of Zstd level 3 was used. For Parquet, the Parquet default of Zstd level 1 was used.

Even at max compression levels, Zstd-based codecs don't perform much better. E.g. on the Taxi dataset, Parquet+Zstd at the max Zstd level of 22 and Blosc+Zstd at the max Blosc level of 9 get ratios of 5.32 and 2.85, respectively. In contrast, Pco gets 6.89 at level 8 and 6.98 at level 12.