You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: Optimize clustering and I/O (4.4x faster segmented clustering) (#579)
* perf: Use ds.variables to avoid _construct_dataarray overhead
Optimize several functions by using ds.variables instead of iterating
over data_vars.items() or accessing ds[name], which triggers slow
_construct_dataarray calls.
Changes:
- io.py: save_dataset_to_netcdf, load_dataset_from_netcdf, _reduce_constant_arrays
- structure.py: from_dataset (use coord_cache pattern)
- core.py: drop_constant_arrays (use numpy operations)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* perf: Optimize clustering serialization with ds.variables
Use ds.variables for faster access in clustering/base.py:
- _create_reference_structure: original_data and metrics iteration
- compare plot: duration_curve generation with direct numpy indexing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* perf: Use batch assignment for clustering arrays (24x speedup)
_add_clustering_to_dataset was slow due to 210 individual
ds[name] = arr assignments. Each triggers xarray's
expensive dataset_update_method.
Changed to batch assignment with ds.assign(dict):
- Before: ~2600ms for to_dataset with clustering
- After: ~109ms for to_dataset with clustering
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* perf: Use ds.variables in _build_reduced_dataset (12% faster)
Avoided _construct_dataarray overhead by:
- Using ds.variables instead of ds.data_vars.items()
- Using numpy slicing instead of .isel()
- Passing attrs dict directly instead of DataArray
cluster() benchmark:
- Before: ~10.1s
- After: ~8.9s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* perf: Use numpy reshape in _build_typical_das (4.4x faster)
Eliminated 451,856 slow pandas .loc calls by using numpy reshape
for segmented clustering data instead of iterating per-cluster.
cluster() with segments benchmark (50 clusters, 4 segments):
- Before: ~93.7s
- After: ~21.1s
- Speedup: 4.4x
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: Multiple clustering and IO bug fixes
- benchmark_io_performance.py: Add Gurobi → HiGHS solver fallback
- components.py: Fix storage decay to use sum (not mean) for hours per cluster
- flow_system.py: Add RangeIndex validation requiring explicit timestep_duration
- io.py: Include auxiliary coordinates in _fast_get_dataarray
- transform_accessor.py: Add empty dataset guard after drop_constant_arrays
- transform_accessor.py: Fix timestep_mapping indexing for segmented clustering
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* perf: Use ds.variables pattern in expand() (2.2x faster)
Replace data_vars.items() iteration with ds.variables pattern to avoid
slow _construct_dataarray calls (5502 calls × ~1.5ms each).
Before: 3.73s
After: 1.72s
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
0 commit comments