High-performance TPC-H and TPC-DS data generators with multiple output format support (Parquet, ORC, CSV, Paimon, Iceberg, Lance) and optional asynchronous I/O via Linux io_uring.
- Multiple Output Formats: Parquet, ORC, CSV, Apache Paimon, Apache Iceberg, Lance
- Apache Arrow Integration: central in-memory columnar representation, unified API across all formats
- Zero-Copy Streaming Writes: O(batch) peak RAM regardless of scale factor — essential at SF≥5
- Parallel Generation: fork-after-init with rolling N-slot window — all tables concurrently, one init cost
- io_uring Support: kernel async I/O for Parquet (IoUringOutputStream) and Lance (Rust runtime)
- Both TPC-H and TPC-DS: all 8 TPC-H tables and 24 TPC-DS tables implemented
Pre-built images are available for every platform:
docker pull ghcr.io/tsafin/tpch-cpp-all:latest
# Generate all TPC-H tables at SF=10, Parquet, parallel, zero-copy
docker run --rm -v /data:/data ghcr.io/tsafin/tpch-cpp-all:latest \
tpch_benchmark --scale-factor 10 --format parquet --output-dir /data \
--parallel --zero-copy --max-rows 0
# Generate all TPC-DS tables at SF=5, Parquet, parallel, zero-copy
docker run --rm -v /data:/data ghcr.io/tsafin/tpch-cpp-all:latest \
tpcds_benchmark --scale-factor 5 --format parquet --output-dir /data \
--parallel --zero-copy --max-rows 0Image registry: https://github.com/tsafin/tpch-cpp/pkgs/container/tpch-cpp-all
- Linux (WSL2 supported) — io_uring requires kernel ≥ 5.11
- GCC 11+ or Clang 13+, CMake 3.22+
libarrow-dev,libparquet-dev- Rust ≥ 1.85 (only for Lance format)
Install system dependencies:
./scripts/install_deps.shmkdir build && cd build
# Minimal build (Parquet + CSV only, includes tpch_benchmark)
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
cmake --build . -j$(nproc)
# With TPC-DS:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DTPCDS_ENABLE=ON ..
cmake --build . --target tpcds_benchmark -j$(nproc)
# With ORC:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DTPCH_ENABLE_ORC=ON ..
# With Lance (requires Rust):
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DTPCH_ENABLE_LANCE=ON ..
# Everything:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DTPCDS_ENABLE=ON \
-DTPCH_ENABLE_ORC=ON \
-DTPCH_ENABLE_LANCE=ON \
..
cmake --build . -j$(nproc)Usage: tpch_benchmark [options]
--scale-factor <SF> TPC-H scale factor (default: 1)
--format <fmt> Output format: parquet, csv, orc, paimon, iceberg, lance
(default: parquet)
--output-dir <dir> Output directory (default: /tmp)
--max-rows <N> Max rows to generate (default: 1000; use 0 for all rows)
--table <name> Single table: lineitem, orders, customer, part, partsupp,
supplier, nation, region (default: lineitem)
--parallel Generate all 8 tables in parallel (fork-after-init)
--parallel-tables <N> Max concurrent child processes (default: all 8)
--zero-copy Streaming writes — O(batch) RAM; required at SF≥5 with --parallel
--zero-copy-mode <m> Lance streaming variant: sync (default), auto, async
--compression <c> Parquet compression: zstd (default), snappy, none
--io-uring Kernel async I/O: IoUringOutputStream for Parquet,
delegated to Rust runtime for Lance
--verbose Verbose output
Common invocations:
# Single table, all rows, Parquet
./tpch_benchmark --scale-factor 5 --format parquet --output-dir /data \
--table lineitem --max-rows 0
# All 8 tables in parallel, Parquet, zero-copy (recommended for SF≥5)
./tpch_benchmark --scale-factor 10 --format parquet --output-dir /data \
--parallel --zero-copy --max-rows 0
# Limit to 4 concurrent children (reduces peak RAM further)
./tpch_benchmark --scale-factor 10 --format parquet --output-dir /data \
--parallel --parallel-tables 4 --zero-copy --max-rows 0
# With io_uring for kernel-async disk writes
./tpch_benchmark --scale-factor 10 --format parquet --output-dir /data \
--parallel --zero-copy --io-uring --max-rows 0
# Lance format, streaming mode
./tpch_benchmark --scale-factor 5 --format lance --output-dir /data \
--parallel --zero-copy --max-rows 0Usage: tpcds_benchmark [OPTIONS]
--format <fmt> Output format: parquet, csv, paimon, lance (default: parquet)
--table <name> Single TPC-DS table (default: store_sales)
--scale-factor <sf> Scale factor (default: 1)
--output-dir <dir> Output directory (default: /tmp)
--max-rows <n> Max rows to generate (0=all, default: 1000)
--compression <c> Parquet compression: zstd (default), snappy, none
--zero-copy Streaming mode — O(batch) RAM; required at SF≥5 with --parallel
--zero-copy-mode <m> Lance streaming variant: sync, auto, async (default: sync)
--parallel Generate all 24 tables in parallel (fork-after-init)
--parallel-tables <N> Max concurrent child processes (default: all)
--verbose Verbose output
Common invocations:
# All TPC-DS tables in parallel, Parquet, zero-copy
./tpcds_benchmark --scale-factor 5 --format parquet --output-dir /data \
--parallel --zero-copy --max-rows 0
# Single table smoke test
./tpcds_benchmark --format parquet --table store_sales --scale-factor 1
# Limit parallelism
./tpcds_benchmark --scale-factor 10 --format parquet --output-dir /data \
--parallel --parallel-tables 6 --zero-copy --max-rows 0TPC-DS tables:
| Category | Tables |
|---|---|
| Fact | store_sales, inventory, catalog_sales, web_sales, store_returns, catalog_returns, web_returns |
| Dimension | customer, item, date_dim, call_center, catalog_page, web_page, web_site, warehouse, ship_mode, household_demographics, customer_demographics, customer_address, income_band, reason, time_dim, promotion, store |
| CMake Option | Default | Description |
|---|---|---|
TPCDS_ENABLE |
OFF | Build tpcds_benchmark (TPC-DS) |
TPCH_ENABLE_ORC |
OFF | ORC format support |
TPCH_ENABLE_PAIMON |
OFF | Apache Paimon format support |
TPCH_ENABLE_ICEBERG |
OFF | Apache Iceberg format support |
TPCH_ENABLE_LANCE |
OFF | Lance format support (requires Rust ≥ 1.85) |
TPCH_ENABLE_ASYNC_IO |
OFF | Build io_uring pool (auto-detected at runtime; needed for --io-uring) |
TPCH_ENABLE_ASAN |
OFF | AddressSanitizer (for development only — do not benchmark with ASAN) |
TPCH_BUILD_EXAMPLES |
ON | Build example applications |
TPCH_BUILD_TESTS |
OFF | Build unit tests |
| Library | Required | Ubuntu Package | Notes |
|---|---|---|---|
| Apache Arrow + Parquet | Yes | libarrow-dev libparquet-dev |
≥ 10.0 |
| CMake | Yes | cmake |
≥ 3.22 |
| GCC/Clang | Yes | build-essential |
GCC ≥ 11 |
| Apache ORC | No | liborc-dev |
Needed for ORC format |
| liburing | No | liburing-dev |
Needed for --io-uring (TPCH_ENABLE_ASYNC_IO=ON) |
| Rust | No | via rustup | ≥ 1.85, needed for Lance format |
- Always use
--zero-copyat SF≥5 — without it each child accumulates all batches in RAM before writing, which OOMs at scale. --parallelforks children after one shared dbgen initialization (COW), giving full CPU utilization with a single init cost.--io-uringoffloads write syscalls to the kernel async worker pool. Useful when disk I/O is the bottleneck; has no effect on CPU-bound workloads (e.g. heavy ZSTD compression).- Do not use
TPCH_ENABLE_ASANfor performance measurement — ASAN adds 30–50% overhead and distorts comparisons.
See LICENSE file.