Skip to content

Commit f2e537f

Browse files
author
miranov25
committed
Add column compression support to AliasDataFrame
Implements bidirectional compression for efficient storage of numerical columns using mathematical transformations (e.g., asinh/sinh for residuals). Core Features: - compress_columns(): Apply compress/decompress transformations - decompress_columns(): Materialize with inplace/keep_compressed options - NumpyRootMapper: Unified numpy→ROOT function mapping (30+ functions) - Metadata persistence: compression_info travels via Parquet/ROOT - Optional precision measurement: RMSE, max error, mean error - Safety: collision guards, double-compression prevention, partial failure handling Implementation: - Uses existing add_alias + materialize_alias (no core changes) - Lazy decompression via aliases (storage: int16, access: float16) - Fixes cycle detection false positive by removing materialized aliases - Adds hyperbolic functions (asinh/sinh/etc) to eval namespace Testing: - 13 new compression tests (all passing) - Roundtrip tests for Parquet and ROOT persistence - Backward compatibility: old files load cleanly - All 31 tests pass (17 original + 13 compression + 1 compat) Use Case: TPC residuals: 37M rows × 5 float16 = 370 MB After asinh → int16 compression: ~120 MB (3× reduction) Example: ```python spec = { 'dy': { 'compress': 'round(asinh(dy)*40)', 'decompress': 'sinh(dy_c/40.)', 'compressed_dtype': np.int16, 'decompressed_dtype': np.float16 } } adf.compress_columns(spec, measure_precision=True) # Storage: dy_c (int16), access via lazy alias: dy → sinh(dy_c/40.) ``` Future Work: - Add special AST transformations for ROOT export (clip, sign, isinf) - Expand function mapping when additional math operations needed
1 parent 97498ea commit f2e537f

File tree

2 files changed

+830
-7
lines changed

2 files changed

+830
-7
lines changed

0 commit comments

Comments
 (0)