nail is a high-performance command-line tool for analyzing, transforming, and exploring Parquet, CSV, JSON, and Excel files. Built with Rust, Apache Arrow, and DataFusion.
Process gigabyte-scale datasets in seconds • SQL-powered • Zero configuration • Works offline • Single binary.
cargo install nail-parquetFrom source:
git clone https://github.com/Vitruves/nail-parquet
cd nail-parquet
cargo build --release
sudo cp target/release/nail /usr/local/bin/
nail --helpWith nix:
nix shell nixpkgs#nail-parquetDependencies: macOS — none. Linux — pkg-config and openssl.
append Concatenate multiple datasets
binning Bin continuous variables into categories
convert Convert between file formats
correlations Calculate correlation matrices
count Count total rows
create Create new columns with expressions
dedup Remove duplicate rows or columns
describe Show global file overview and metadata
diff Compare two datasets and show differences
drop Remove columns or rows
fill Fill missing values
filter Filter rows by conditions
frequency Calculate frequency distributions
head Display first N rows
headers Display column headers
id Add unique identifier column
merge Join two datasets
metadata Show Parquet file metadata
optimize Optimize Parquet files for better performance
outliers Detect outliers in data
pivot Create pivot tables with aggregations
preview Preview random N rows
rename Rename columns
sample Extract data samples
schema Display schema information
search Search for values in data
select Select specific columns or rows
shuffle Randomly shuffle rows
size Show data size information
sort Sort data by columns with various strategies
split Split data into multiple files
stats Calculate descriptive statistics
tail Display last N rows
update Check for newer versions
help Print this message or the help of the given subcommand(s)
Run nail <command> --help for full usage.
Available on all commands:
| Flag | Description |
|---|---|
-v, --verbose |
Timing and progress output |
-j, --jobs N |
Parallel jobs (default: half of CPU cores) |
-o, --output FILE |
Output file (prints to console if omitted) |
-f, --format FORMAT |
Output format: json, csv, parquet, text |
-h, --help |
Command help |
Explore a dataset:
nail describe sales.parquet
nail stats sales.parquet -c "revenue,profit" --percentiles "0.5,0.9,0.99"
nail correlations sales.parquet -c "price,volume,discount" --tests t_test
nail frequency sales.parquet -c "category,region"Clean and enrich:
nail dedup raw.parquet --row-wise -c "id" -o unique.parquet
nail outliers unique.parquet -c "price" --method iqr --remove -o cleaned.parquet
nail create cleaned.parquet --column "margin=(price-cost)/price" -o enriched.parquetBuild an analysis pipeline:
nail optimize raw.parquet -o opt.parquet --compression zstd --sort-by "ts,customer_id" --dictionary
nail binning opt.parquet -c "age" -b "18,25,35,50,65" --method custom --labels "18-24,25-34,35-49,50-64,65+" -o binned.parquet
nail pivot binned.parquet -i "age_binned" -c "category" -l "revenue" --agg sum -o summary.parquet
nail stats summary.parquet --stats-type exhaustive -o summary_stats.jsonCompare versions:
nail diff yesterday.parquet --compare today.parquet --keys "id" --changes-only- Prefer Parquet over CSV for analytical workloads.
- Scope operations with
-cregex patterns. - Use intermediate files for multi-step transforms.
- Tune
-jto match your machine. - Add
--verboseto monitor long runs.
MIT — see LICENSE.
Fork, branch, add tests, ensure cargo test and cargo clippy pass, open a PR.
Issues and questions: https://github.com/Vitruves/nail-parquet/issues
