Skip to content

Replace per-point nearest extraction with vectorized axis spatial method#143

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/add-vectorized-nearest-method
Draft

Replace per-point nearest extraction with vectorized axis spatial method#143
Copilot wants to merge 5 commits intomainfrom
copilot/add-vectorized-nearest-method

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 2, 2026

  • Fix _extract_axis_batch to include time_dim as a vectorized indexer per variable before .load(), avoiding loading all time steps when only one is needed per point
  • Add test: test_time_dimension_excluded_from_loaded_array — 8-timestep dataset, 3 points each requesting a different time step, verifies correct per-point values
  • Add test: test_time_dimension_with_extra_axis — vectorized time + extra-axis (lev) selection together (the MERRA-2 coord_spec scenario)
  • Verified all pre-existing non-netCDF4 tests still pass; the 6 failing tests were already failing before this change due to missing netCDF4 package in the sandbox

Copilot AI linked an issue Apr 2, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Add vectorized nearest method for spatial selections Replace per-point nearest extraction with vectorized axis spatial method Apr 2, 2026
Copilot AI requested a review from eeholmes April 2, 2026 18:19
@eeholmes
Copy link
Copy Markdown
Member

eeholmes commented Apr 2, 2026

@copilot The time went from 18 seconds for this to 65 seconds. Vectorized change should not have added time. The plan didn't change.

import pandas as pd
url = (
    "https://raw.githubusercontent.com/"
    "fish-pace/point-collocation/main/"
    "examples/fixtures/points.csv"
)
df_points = pd.read_csv(url)

import point_collocation as pc
plan = pc.plan(
    df_points,
    data_source="earthaccess",
    source_kwargs={
        "short_name": "PACE_OCI_L3M_RRS",
        "granule_name": "*.MO.*.4km.*",
    }
)

res = pc.matchup(plan[0:100], variables=["Rrs"])

plan

Plan: 595 points → 4 unique granule(s)
Points with 0 matches : 0
Points with >1 matches: 0
Time buffer: 0 days 00:00:00

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 2, 2026

@copilot The time went from 18 seconds for this to 65 seconds. Vectorized change should not have added time. The plan didn't change.

...

The slowdown was a dask recomputation bug (commit f9c0ea7). _extract_axis_batch did selected = ds[var].sel(..., method='nearest') which produces a lazy dask-backed DataArray, then the inner per-point loop called float(selected.isel({pts_dim: i})) 100 times — each call recomputed the full dask graph for all 100 points to extract a single value, giving O(N²) I/O.

The fix adds .load() immediately after the sel() call to materialize the dask graph exactly once. The per-point loop then operates on in-memory NumPy slices. A synthetic benchmark shows a 22× speedup on this pattern.

…d() to avoid loading all timesteps

Agent-Logs-Url: https://github.com/fish-pace/point-collocation/sessions/d26c8381-f68d-40d5-acf9-869899e769c2

Co-authored-by: eeholmes <2545978+eeholmes@users.noreply.github.com>
@eeholmes
Copy link
Copy Markdown
Member

eeholmes commented Apr 2, 2026

@copilot Evaluate the profile for one of the matchup calls. It still feels slow but maybe there is not way to avoid.

Plan: 100 points → 18 unique granule(s)
Points with 0 matches : 0
Points with >1 matches: 0
Time buffer: 0 days 00:00:00

import cProfile
import pstats
import io

pr = cProfile.Profile()
pr.enable()

res = pc.matchup(plan, variables=["avw"])

pr.disable()

s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats("cumtime")  # or "tottime"
ps.print_stats(40)   # top 40 entries
print(s.getvalue())

output

     5164968 function calls (5084085 primitive calls) in 21.025 seconds

Ordered by: cumulative time
List reduced from 9651 to 40 due to restriction <40>

ncalls tottime percall cumtime percall filename:lineno(function)
140/3 0.001 0.000 16.299 5.433 /srv/conda/envs/notebook/lib/python3.12/threading.py:637(wait)
3/2 0.000 0.000 12.631 6.315 /srv/conda/envs/notebook/lib/python3.12/site-packages/IPython/core/interactiveshell.py:3665(run_code)
2228/2 0.110 0.000 12.622 6.311 {built-in method builtins.exec}
2/1 0.000 0.000 12.622 12.622 /tmp/ipykernel_8493/1906214859.py:1()
2/1 0.000 0.000 12.622 12.622 /home/jovyan/point-collocation/src/point_collocation/core/engine.py:70(matchup)
2/1 0.001 0.000 12.622 12.622 /home/jovyan/point-collocation/src/point_collocation/core/engine.py:630(_execute_plan)
20/19 0.001 0.000 10.010 0.527 /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/api.py:388(open_dataset)
2620/1963 0.002 0.000 9.524 0.005 {built-in method builtins.next}
36 0.000 0.000 9.505 0.264 /home/jovyan/point-collocation/src/point_collocation/core/_open_method.py:1145(_open_as_flat_dataset)
361/126 0.000 0.000 9.503 0.075 /srv/conda/envs/notebook/lib/python3.12/contextlib.py:132(enter)
1906 0.003 0.000 8.898 0.005 /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:730(call)
72/54 0.000 0.000 8.832 0.164 /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:914(_fetch_block)
72/54 0.000 0.000 8.816 0.163 /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:62(_fetch)
54/36 0.000 0.000 8.795 0.244 /srv/conda/envs/notebook/lib/python3.12/site-packages/s3fs/core.py:2414(fetch_range)
54/36 0.001 0.000 8.793 0.244 /srv/conda/envs/notebook/lib/python3.12/site-packages/s3fs/core.py:2572(fetch_range)
90/37 0.004 0.000 8.752 0.237 /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/asyn.py:63(sync)
20/19 0.000 0.000 6.043 0.318 /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/h5netcdf
.py:514(open_dataset)
19 0.001 0.000 6.024 0.317 /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/store.py:27(open_dataset)
9446/9153 0.013 0.000 6.019 0.001 :892(iter)
95 0.004 0.000 6.012 0.063 /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/h5netcdf
.py:77(read_attributes)
19 0.000 0.000 5.980 0.315 /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/common.py:347(load)
19 0.000 0.000 5.943 0.313 /srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/backends/h5netcdf
.py:321(get_attrs)
932 0.004 0.000 5.783 0.006 /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/spec.py:2134(readinto)
1672 0.002 0.000 5.774 0.003 /srv/conda/envs/notebook/lib/python3.12/site-packages/h5netcdf/attrs.py:113(iter)
1976 0.009 0.000 5.772 0.003 /srv/conda/envs/notebook/lib/python3.12/site-packages/h5py/_hl/attrs.py:247(iter)
951 0.004 0.000 5.619 0.006 /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/spec.py:2103(read)
951 0.005 0.000 5.612 0.006 /srv/conda/envs/notebook/lib/python3.12/site-packages/fsspec/caching.py:838(_fetch)
2/1 0.000 0.000 4.734 4.734 /srv/conda/envs/notebook/lib/python3.12/site-packages/earthaccess/auth.py:165(get_s3_credentials)
2 0.000 0.000 4.734 2.367 /srv/conda/envs/notebook/lib/python3.12/site-packages/requests/sessions.py:593(get)
2 0.000 0.000 4.733 2.367 /srv/conda/envs/notebook/lib/python3.12/site-packages/requests/sessions.py:500(request)
5/2 0.000 0.000 4.732 2.366 /srv/conda/envs/notebook/lib/python3.12/site-packages/requests/sessions.py:673(send)
5 0.000 0.000 4.723 0.945 /srv/conda/envs/notebook/lib/python3.12/site-packages/requests/adapters.py:590(send)
5 0.000 0.000 4.720 0.944 /srv/conda/envs/notebook/lib/python3.12/site-packages/urllib3/connectionpool.py:592(urlopen)
5 0.000 0.000 4.718 0.944 /srv/conda/envs/notebook/lib/python3.12/site-packages/urllib3/connectionpool.py:377(_make_request)
9/5 0.000 0.000 4.440 0.888 /srv/conda/envs/notebook/lib/python3.12/site-packages/requests/sessions.py:159(resolve_redirects)
5 0.000 0.000 4.422 0.884 /srv/conda/envs/notebook/lib/python3.12/site-packages/urllib3/connection.py:534(getresponse)
5 0.000 0.000 4.421 0.884 /srv/conda/envs/notebook/lib/python3.12/http/client.py:1386(getresponse)
5 0.000 0.000 4.420 0.884 /srv/conda/envs/notebook/lib/python3.12/http/client.py:324(begin)
154 0.000 0.000 4.418 0.029 {method 'readline' of '_io.BufferedReader' objects}
10 0.000 0.000 4.418 0.442 /srv/conda/envs/notebook/lib/python3.12/socket.py:706(readinto)

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 2, 2026

@copilot Evaluate the profile for one of the matchup calls. It still feels slow but maybe there is not way to avoid.

Plan: 100 points → 18 unique granule(s)
...

The profile shows the bottleneck is cloud I/O, not extraction logic:

  • threading.wait accounts for 16.3 of 21 seconds (76%). This is the time spent waiting for network I/O — earthaccess opening 18 cloud-hosted HDF5 files, plus dask reading the actual data chunks from S3/HTTPS. At ~1.2s/granule that's typical latency for streaming access to remote HDF5 files.
  • The remaining ~5 seconds is Python processing overhead (5.1M function calls) covering file-open metadata parsing, xarray/dask graph construction, the vectorized .sel() + .load(), and one gc.collect() per granule to release HDF5 reference cycles.

The extraction code itself is not the bottleneck here. There's no algorithmic fix for serial cloud I/O latency short of opening granules in parallel (overlapping network waits), which would be a large architectural change.

So yes — for 18 cloud granules, ~21 seconds is roughly what to expect without parallelizing the granule-open loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add vectorized nearest method

2 participants