Commit f2e537f
miranov25
Add column compression support to AliasDataFrame
Implements bidirectional compression for efficient storage of numerical columns
using mathematical transformations (e.g., asinh/sinh for residuals).
Core Features:
- compress_columns(): Apply compress/decompress transformations
- decompress_columns(): Materialize with inplace/keep_compressed options
- NumpyRootMapper: Unified numpy→ROOT function mapping (30+ functions)
- Metadata persistence: compression_info travels via Parquet/ROOT
- Optional precision measurement: RMSE, max error, mean error
- Safety: collision guards, double-compression prevention, partial failure handling
Implementation:
- Uses existing add_alias + materialize_alias (no core changes)
- Lazy decompression via aliases (storage: int16, access: float16)
- Fixes cycle detection false positive by removing materialized aliases
- Adds hyperbolic functions (asinh/sinh/etc) to eval namespace
Testing:
- 13 new compression tests (all passing)
- Roundtrip tests for Parquet and ROOT persistence
- Backward compatibility: old files load cleanly
- All 31 tests pass (17 original + 13 compression + 1 compat)
Use Case:
TPC residuals: 37M rows × 5 float16 = 370 MB
After asinh → int16 compression: ~120 MB (3× reduction)
Example:
```python
spec = {
'dy': {
'compress': 'round(asinh(dy)*40)',
'decompress': 'sinh(dy_c/40.)',
'compressed_dtype': np.int16,
'decompressed_dtype': np.float16
}
}
adf.compress_columns(spec, measure_precision=True)
# Storage: dy_c (int16), access via lazy alias: dy → sinh(dy_c/40.)
```
Future Work:
- Add special AST transformations for ROOT export (clip, sign, isinf)
- Expand function mapping when additional math operations needed1 parent 97498ea commit f2e537f
2 files changed
+830
-7
lines changed
0 commit comments