Replace per-point nearest extraction with vectorized axis spatial method#143
Replace per-point nearest extraction with vectorized axis spatial method#143
nearest extraction with vectorized axis spatial method#143Conversation
Agent-Logs-Url: https://github.com/fish-pace/point-collocation/sessions/d42afd0a-b2de-4e6c-8a67-83ee84d30533 Co-authored-by: eeholmes <2545978+eeholmes@users.noreply.github.com>
nearest extraction with vectorized axis spatial method
|
@copilot The time went from 18 seconds for this to 65 seconds. Vectorized change should not have added time. The plan didn't change. plan Plan: 595 points → 4 unique granule(s) |
…e before loop Agent-Logs-Url: https://github.com/fish-pace/point-collocation/sessions/7a7a4817-f270-4cd9-ba6a-232a2ac90795 Co-authored-by: eeholmes <2545978+eeholmes@users.noreply.github.com>
... The slowdown was a dask recomputation bug (commit f9c0ea7). The fix adds |
…d() to avoid loading all timesteps Agent-Logs-Url: https://github.com/fish-pace/point-collocation/sessions/d26c8381-f68d-40d5-acf9-869899e769c2 Co-authored-by: eeholmes <2545978+eeholmes@users.noreply.github.com>
|
@copilot Evaluate the profile for one of the matchup calls. It still feels slow but maybe there is not way to avoid. Plan: 100 points → 18 unique granule(s) output Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) |
The profile shows the bottleneck is cloud I/O, not extraction logic:
The extraction code itself is not the bottleneck here. There's no algorithmic fix for serial cloud I/O latency short of opening granules in parallel (overlapping network waits), which would be a large architectural change. So yes — for 18 cloud granules, ~21 seconds is roughly what to expect without parallelizing the granule-open loop. |
_extract_axis_batchto includetime_dimas a vectorized indexer per variable before.load(), avoiding loading all time steps when only one is needed per pointtest_time_dimension_excluded_from_loaded_array— 8-timestep dataset, 3 points each requesting a different time step, verifies correct per-point valuestest_time_dimension_with_extra_axis— vectorized time + extra-axis (lev) selection together (the MERRA-2coord_specscenario)netCDF4package in the sandbox