From d53c16159af5458ef7e6d9239ab4d7f9119e1247 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Sat, 20 Jun 2026 02:53:21 -0700 Subject: [PATCH] classify: align natural_breaks parameter order with sibling classifiers natural_breaks ordered its parameters (agg, num_sample, name, k), while the other classifiers that take the same arguments order them (agg, k, num_sample, name): quantile and maximum_breaks. So natural_breaks(raster, 5) quietly set num_sample=5 instead of k=5. Reorder natural_breaks to (agg, k=5, num_sample=20000, name) and add a backward-compatible shim. Legacy callers always passed k as a keyword (it was the last parameter), so a call with k= as a keyword plus a second positional is using the old order; treat that positional as the old num_sample and warn. This keeps natural_breaks(raster, 20000, k=4) working. Also add the missing type hints to binary, the only public classifier without them. Tests cover the new positional k and the deprecated legacy order. Refs #3398 --- .claude/sweep-api-consistency-state.csv | 29 +++++++++--------- xrspatial/classify.py | 39 +++++++++++++++++++++---- xrspatial/tests/test_classify.py | 17 +++++++++++ 3 files changed, 66 insertions(+), 19 deletions(-) diff --git a/.claude/sweep-api-consistency-state.csv b/.claude/sweep-api-consistency-state.csv index fbdfda8cb..6a0814b1f 100644 --- a/.claude/sweep-api-consistency-state.csv +++ b/.claude/sweep-api-consistency-state.csv @@ -1,14 +1,15 @@ -module,last_inspected,issue,severity_max,categories_found,notes -focal,2026-06-10,3215;3216,MEDIUM,3;4,"Sweep 2026-06-10 (deep-sweep-api-consistency-focal-2026-06-10). 2 MEDIUM findings filed, fixed on branches -01/-02 off this one. (#3215, MEDIUM Cat 4 cross-backend default parity, branch -01) apply() default func=_calc_mean is an @ngjit CPU function but the cupy/dask+cupy paths launch func as a CUDA kernel via _focal_stats_func_cupy func[griddim, blockdim], so apply(cupy_agg, kernel) raises TypeError 'CPUDispatcher' object is not subscriptable (dask+cupy builds the graph and fails at compute). Prior 2026-05-29 sweep dispositioned this LOW as 'documented in the docstring', but the docstring covers explicit funcs -- the default itself is unusable on 2 of 4 backends. Fix: func=None sentinel resolved per backend (_calc_mean CPU, _focal_mean_cuda GPU), explicit-func behavior unchanged; same PR adds the missing name= param to the apply() docstring (signature has name='focal_apply'; mean/focal_stats/hotspots document theirs). (#3216, MEDIUM Cat 3, branch -02) hotspots() docstring lists 3 backends but dask_cupy_func=_hotspots_dask_cupy is dispatched and works; kernel param documented as binary ('values of 1 indicate the kernel') while hotspots accepts weighted kernels and the Gi* formula in the same docstring uses weights w_ij (apply/focal_stats reject non-binary via _validate_binary_kernel, hotspots deliberately does not). Docs-only fix. LOW documented, not fixed: among the 4 focal publics only mean() has @supports_dataset (Dataset-support drift; feature gap, not an API bug). Cross-cutting, notes only per template: emerging_hotspots(raster=), viewshed(raster=), calc_cellsize(raster) still use raster while focal standardized on agg with a DeprecationWarning shim (#2689/PR #2699); library-wide first-arg drift, belongs to those modules' sweeps. No Cat 1 in-module (agg canonical, raster alias warns, both-args raises). No Cat 2 return drift (mean/apply/hotspots 2D same-type, focal_stats 3D (stats,y,x) as documented). No Cat 5 orphan API (apply/focal_stats/hotspots documented in focal.rst autosummary and consumed via xrspatial.focal module path; only mean re-exported top-level; emerging_hotspots top-level vs hotspots module-level asymmetry noted, additive export would be a design call, not filed). cuda-validated: CUDA_AVAILABLE=True on this host; mean/apply/focal_stats/hotspots smoke-tested on cupy with kwarg parity; the apply default crash reproduced on GPU; hotspots weighted-kernel acceptance verified empirically." -geotiff,2026-06-12,3263;3265,MEDIUM,3;5,"Re-sweep 2026-06-12 (deep-sweep-api-consistency-geotiff-2026-06-12); prior pass 2026-06-09 (#3086). Scope: surface changes since 2026-06-09 (pack/unpack fixes #3171-#3241, SUPPORTED_FEATURES reader.unpack/writer.pack/reader.coregister, coregister docs #3248) plus a fresh 5-category pass on open_geotiff/to_geotiff. 2 MEDIUM findings filed and fixed on branches -01/-02 off this one. (#3263, MEDIUM Cat 3, PR #3269, branch -01) open_geotiff unpack docstring said 'A source without scale / offset metadata is a no-op', but unpack=True folds into the masking gate (_finalize_eager_read: mask_and_scale implies masking, rioxarray parity), so a sentinel-bearing uint16 source still comes back float64 with NaN holes; verified identical on all 4 backends (not a parity bug), only a source with neither scale/offset nor a sentinel reads unchanged. Docs-only fix + test_unpack_noop_doc_3263.py pinning wording (scoped to the unpack paragraph) and behavior. (#3265, MEDIUM Cat 5, PR #3273, branch -02) exception-export drift: VRTUnsupportedError (raised 10+ times in _vrt_validation.py on public .vrt reads, documented in geotiff_safe_io.rst which steered users to the private _errors module), CloudSizeLimitError (importable but not in __all__, sibling UnsafeURLError IS exported), and PixelSafetyLimitError (raised by the [stable] max_pixels cap, only importable from _layout/_reader) were the only 3 exceptions raised on public open_geotiff paths missing from the public surface (other 17 exported). Additive fix: import + __all__ + :class: roles in safe_io doc + trigger-point docs naming the exceptions in max_pixels/max_cloud_bytes param docs and geotiff.rst; test_exception_exports_3265.py pins export, identity with private definitions, and a functional max_pixels raise. Clean elsewhere: docstring/signature parity exact on both publics (programmatic check + 218 existing contract tests); no Cat 1 (signatures unchanged since 2026-06-09; pack/unpack pair deliberate), no Cat 2 (DataArray / path returns unchanged), no Cat 4 (shared allow_* defaults match reader/writer; gpu False-vs-None auto-detect documented). SUPPORTED_FEATURES tiers (reader.unpack/writer.pack/reader.coregister experimental) agree with docstring tier markers. coregister= itself lives on accessor.py (excluded module) -- only its SUPPORTED_FEATURES registration is in geotiff, consistent. cuda-validated: CUDA_AVAILABLE=True; open_geotiff smoke-tested with identical kwargs on numpy/cupy/dask/dask+cupy (cpu/gpu pixel parity), to_geotiff gpu=True, cupy pack=True write (#3240 fix confirmed), deprecated aliases mask_and_scale/name/mask_nodata all warn. Both PRs reviewed (COMMENTED) with findings fixed in follow-up commits c14844a8/af3c8a66; branches up to date with origin/main; left for user merge per REVIEW_REQUIRED." -hydro-d8,2026-05-29,2709,HIGH,1;5,"Sweep 2026-05-29 (deep-sweep-api-consistency-hydro-d8-2026-05-29). Scope = the 13 D8-variant files only; dinf/mfd read for reference but not modified. 1 HIGH Cat 1 + 1 MEDIUM Cat 5 fixed in this branch (#2709, PR #2716). HIGH Cat 1: stream_order_d8 named its strahler/shreve selector `ordering` while sibling stream_order_dinf/stream_order_mfd use `method`; both names live in the public API and the __init__.py _StreamOrderDispatch special-cases the drift (translates ordering->method for non-d8). Fix adds `method` as an accepted alias on stream_order_d8 (case-insensitive; takes precedence; conflicting ordering+method raises ValueError), keeping `ordering` working so the out-of-scope dispatcher (passes ordering=) and existing callers are unaffected. Full rename to `method` deferred because deprecating `ordering` would warn on every stream_order(routing='d8') call via the dispatcher I cannot touch in this scope. MEDIUM Cat 5: basins_d8 (watershed_d8.py) is a backward-compat wrapper whose docstring said 'use basin instead' but emitted no warning; added DeprecationWarning(stacklevel=2). Tests added for alias parity/precedence/conflict/case-insensitivity and for the basins_d8 warning. Findings documented but NOT filed per template: (LOW Cat 1 cross-module, out of scope) dinf siblings name the first arg `flow_dir_dinf` (stream_link/flow_path/hand/watershed_dinf) while all D8 funcs use the cleaner `flow_dir`; D8 is the better convention so no D8 change -- the drift lives in the dinf files. (LOW Cat 4 defensive-validation drift) hand_d8 validates np.isfinite(threshold) but stream_link_d8/stream_order_d8 (same threshold: float = 100 param) do not; not user-facing signature surprise, document only. No Cat 2 return drift (every D8 public fn returns xr.DataArray with coords/dims/attrs preserved; Dataset in -> Dataset out via @supports_dataset). No Cat 3 missing-hints beyond fill_d8 z_limit (optional, no hint) which mirrors its sibling style. All 13 D8 funcs are re-exported in xrspatial/hydro/__init__.py (no orphan API). cuda-validated: CUDA_AVAILABLE=True on this host; method-alias parity smoke-tested on a cupy DataArray. CI: ubuntu/windows/3.12 GitHub Actions green; macOS-3.14 + ReadTheDocs slow but no failures. NOTE: the /review-pr review comment could not be posted to GitHub (auto-mode permission denial on gh pr review); review findings were applied to code instead (case-insensitive conflict check + str|None hint, commit f8467320)." -interpolate,2026-06-12,3285,MEDIUM,2,"Sweep 2026-06-12 (deep-sweep-api-consistency-interpolate-2026-06-12). Scope: idw/_idw.py, kriging/_kriging.py, spline/_spline.py, shared _validation.py. 1 MEDIUM Cat 2 finding filed as #3285, fixed on branch -01 off this one: kriging(return_variance=True) singular-matrix fallback (_kriging.py:499) returns prediction, prediction.copy() so the variance DataArray keeps the prediction's name instead of f'{name}_variance' (normal path :523 names it correctly); reproduced by monkeypatching _build_kriging_matrix to None; anything keying on .name (xr.merge, Dataset build) silently collapses the pair. One-line fix + regression test on the singular path. Clean elsewhere: Cat 1 in-module exact (idw/kriging/spline share x, y, z, template positionals and name= default ''; template matches kde's template=); docstring/signature parity exact on all 3 publics (every param documented, Returns sections match incl. kriging's tuple); Cat 4 no default drift (power=2.0, k=None, fill_value=nan, variogram_model='spherical', nlags=15, smoothing=0.0, all single-owner params); Cat 5 no orphan API (all 3 re-exported in xrspatial/__init__.py and autosummaried in docs/source/reference/interpolation.rst; tests touch private helpers only via module paths). Cross-cutting, notes only per template: fill_value (idw) vs fill (rasterize) for the uncovered-pixel value is library-wide drift (idw matches numpy's fill_value convention, left alone); public functions are untyped module-wide (consistent internally, drifts from typed kde/rasterize/proximity siblings -- annotation pass would span the whole module, LOW, not filed); kde's keyword-only style is the library minority so interpolate's positional style matches the rasterize/proximity majority. GPU k-nearest rejection (NotImplementedError) is deliberate and documented in the k param docstring. cuda-validated: CUDA_AVAILABLE=True on this host; idw/kriging/spline smoke-tested with full kwargs on numpy AND cupy DataArrays (variance name parity confirmed on both), dask+numpy and dask+cupy graph construction verified without compute." -mcda,2026-06-10,3148,HIGH,1;2;3;5,"Sweep 2026-06-10 (deep-sweep-api-consistency-mcda-2026-06-10). Fixed in this branch (#3148): (HIGH Cat 1) owa() named its criterion-weight dict criterion_weights while wlc/wpm/sensitivity use weights (same semantics, same _validate_weights); renamed to weights with keyword-only criterion_weights deprecation shim (DeprecationWarning; both names -> TypeError; positional callers untouched). (MEDIUM Cat 2) boolean_overlay annotated criteria as dict-only while every sibling combiner takes xr.Dataset; Dataset already worked via the Mapping interface -- now annotated/documented as xr.Dataset | dict. (MEDIUM Cat 3) ahp_weights docstring Raises claimed ValueError on incomplete comparisons but code warns (UserWarning) and defaults missing pairs to 1 -- docstring now documents Warns behaviour. (MEDIUM Cat 5) ConsistencyResult returned by public ahp_weights but absent from xrspatial/mcda __all__ and docs/source/reference/mcda.rst -- exported and documented. Documented, NOT fixed here: (MEDIUM Cat 2, deferred to parallel sweep-metadata sibling to avoid duplicate PR) constrain() drops attrs via xr.where while the other nine public functions preserve them. (LOW Cat 2) ahp_weights returns (weights, ConsistencyResult) tuple vs rank_weights bare dict -- intentional, documented in both docstrings, no fix. (LOW Cat 4) name=None inherit-input-name (standardize/constrain) vs literal-name defaults (combiners) -- defensible split, document only. Pre-existing backend bugs surfaced by the mandated cupy smoke (accuracy/test-coverage lane, recorded in #3148 body): owa fails on cupy (numpy order-weights array mixed into cupy multiply, combine.py ~336-340) and on ANY dask backend at graph construction (da.sort does not exist, combine.py:356, despite the owa MemoryError message recommending dask); sensitivity(method=monte_carlo) fails on cupy (template.values implicit-conversion guard). constrain on cupy blocked by the known library-wide cupy 13.6 + xarray xr.where astype incompat (dependency-pin issue), not mcda-specific. cuda-validated: CUDA_AVAILABLE=True; all 10 public functions smoke-tested on cupy DataArrays; owa weights=/criterion_weights= shim verified on numpy AND cupy entry points (cupy execution stops at the pre-existing mixed-array bug, signature acceptance confirmed)." -polygonize,2026-06-12,3306;3307,MEDIUM,1;3,"Re-sweep 2026-06-12 (deep-sweep-api-consistency-polygonize-2026-06-12); prior pass 2026-05-19 (#2148). 2 MEDIUM findings filed and fixed on branches -01/-02 off this one. (#3306, MEDIUM Cat 3, branch -01) column_name docstring says 'Only used if return_type is geopandas or spatialpandas' but _to_geojson also consumes it as the per-feature property key (verified: properties={'myval': 1}); docs-only fix + test pinning geojson property naming. (#3307, MEDIUM sibling-behavior drift, branch -02) return_type is the only polygonize parameter validated AFTER the computation: invalid value runs the full backend (spy-verified 1 invocation before raise) while sibling contours() validates up front and lists allowed values; fix hoists the check into the top validation block with an allowed-values message (existing test matches on prefix, unaffected). Re-confirmed prior dispositions, still documented-only per cross-module rule: (HIGH Cat 1 cross-module) connectivity (polygonize, matches GDAL/rasterio/skimage) vs neighborhood (sieve.py, zonal.regions) for the identical 4|8 rook/queen concept -- rename shim belongs in sieve/zonal, out of polygonize scope; (LOW Cat 1 cross-cutting) raster (polygonize/sieve/clip_polygon) vs agg (contours/terrain family) first-arg drift, library-wide, not filed per-module. No new Cat 2 (return_type dispatch shapes match docstring Returns section exactly); no Cat 4 (atol/rtol mirror numpy.isclose, connectivity=4 == sieve neighborhood=4); Cat 5 LOW documented-only: module has no __all__ and the non-underscore internals generated_jit + Turn leak via import-star; polygonize re-exported in __init__.py and accessor, no orphan API. Docstring/signature parity otherwise exact (all 10 params documented, all annotated). Open polygonize issues #3292/#3293 checked -- no overlap with these findings. cuda-validated: CUDA_AVAILABLE=True on this host; polygonize smoke-tested with identical full kwargs on numpy, cupy (int + float atol/rtol=0), and dask+cupy; no backend signature drift." -proximity,2026-06-09,3090;3091,HIGH,2;3,"Sweep 2026-06-09 (deep-sweep-api-consistency-proximity-2026-06-09). 1 HIGH Cat 2 finding (#3090): dask+numpy (and unbounded dask+cupy, which converts to it) KDTree path violates the documented lowest-flat-index tie-break in allocation()/direction() whenever the raster has >1 chunk column. _collect_region_targets concatenates targets chunk-major (iy outer, ix inner) so the tree's target order is not global row-major; _kdtree_query_lowest_index then ties to the wrong target. Existing tie-break tests put both targets in the same raster row where chunk order coincides with row-major, so they pass. Repro: 5x5, targets 2@(1,3) and 3@(2,2), chunks (5,3), pixel (2,3) tied at d=1 -> numpy gives 2, dask gives 3. Bounded map_overlap paths are fine (local row-major order is offset-invariant). 1 MEDIUM Cat 3 finding (#3091): all 3 public docstrings claim numpy + dask+numpy support only while cupy/dask+cupy backends exist, are dispatched, and are tested (the tie-break paragraphs in the same docstrings name all 4 backends); direction() opens with a stray copy-pasted slope line ('downward slope direction') plus a doubled 'the the'; allocation example output reads as float64 but the function returns float32; stale '# convert to have same type as of input @raster' comment. Within-module Cat 1/4/5 clean: proximity/allocation/direction share an identical signature (raster, x='x', y='y', target_values=None, max_distance=np.inf, distance_metric='EUCLIDEAN'); consistent with surface_distance siblings (raster/x/y/target_values/max_distance); all 6 public symbols (incl. euclidean/manhattan/great_circle_distance) re-exported in __init__.py, no orphan API. Cross-cutting, documented not filed: sibling distance modules (surface_distance, cost_distance, balanced_allocation) use mutable default target_values: list = [] while proximity uses the None sentinel - the mutable-default fix belongs to those modules; proximity's target_values: list = None hint would be more precise as Optional[list] (LOW, matches library style). cuda-validated: CUDA_AVAILABLE=True on this host; proximity/allocation/direction smoke-tested with identical kwargs on numpy, cupy, dask+numpy, dask+cupy (proximity parity passed; allocation/direction parity failure is finding #3090)." -rasterize,2026-06-09,3089,HIGH,1,"Sweep 2026-06-09 (deep-sweep-api-consistency-rasterize-2026-06-09). 1 HIGH Cat 1 fixed in this branch (#3089): rasterize(use_cuda=) vs open_geotiff(gpu=) named the identical GPU-backend opt-in differently; these are the only two public entry points with an explicit GPU boolean (no input array to dispatch on; both pair it with chunks= for dask) and both names were live in the public API at once. Fix renames the positional param to gpu (same slot, positional callers unaffected) and appends use_cuda=None as a deprecated alias: DeprecationWarning on use, TypeError when combined with gpu=True. Docstring, GPU merge warning text, CuPy ImportError text, and polygon_clip.py's internal dask+cupy caller updated (guarded so a legacy use_cuda in rasterize_kw does not collide with the new default); all rasterize test call sites migrated to gpu=; regression tests in test_rasterize_gpu_alias_3089.py pin slot position, warning, TypeError, backend parity, and the warning-free clip_polygon path. Re-inspection after the 2026-05-21 pass (#2250); prior cross-module notes (clip_polygon nodata vs fill, name default drift, polygonize column_name vs column) still documented-only. Docstring/signature parity verified programmatically (17/17 params, order matches). New params since last pass (check_crs, max_pixels) consistent with geotiff naming (max_pixels matches geotiff's). No Cat 2/4/5 findings. LOW noted, not fixed (other module's docs): docs/source/user_guide/focal.ipynb claims convolve_2d takes use_cuda, which it does not. cuda-validated: CUDA_AVAILABLE=True; numpy/cupy/dask+numpy/dask+cupy smoke-tested with identical kwargs, values equal." -reproject,2026-06-09,3095;3097,HIGH,1;2;3,"Sweep 2026-06-09 (deep-sweep-api-consistency-reproject-2026-06-09). 2 findings filed and fixed: #3095 -> PR #3125, #3097 -> PR #3134 (branches -01/-02 off this one). (HIGH Cat 2, #3095) merge() raises TypeError ('Implicit conversion to a NumPy array is not allowed') on cupy-backed inputs while sibling reproject() supports numpy/cupy/dask+numpy/dask+cupy; crash site _merge_inmemory info['raster'].values (__init__.py:2572); dask-of-cupy fails the same way at compute via _merge_block_adapter -> _reproject_chunk_numpy/np.asarray. _merge.py has a complete _merge_arrays_cupy that is imported in __init__.py:38 but never called (dead GPU plumbing; the unused import alone is lint issue #3083 from the style sweep). Fix: host round-trip on entry (same pattern as _apply_vertical_shift), GPU result out, docstring documents backend handling. (MEDIUM Cat 3, #3097) _vertical.py Returns docstrings claim 'same type as input/height' but geoid_height(DataArray) returns np.ndarray (verified empirically) and the four conversion wrappers return np.float64/np.ndarray; geoid_height converts scalars to Python float but the wrappers do not (sibling scalar-return drift). Docs-only fix. Documented but NOT fixed: (LOW Cat 1) itrf_transform(src=/tgt=) abbreviations vs source_/target_ elsewhere -- prior 2026-05-29 sweep already weighed this and left it as-is (frames, not CRSes); filed #3099 before noticing the prior disposition, then closed it as not-planned to avoid churn. (LOW Cat 5) module docstring 'Public API' section lists only reproject/merge while __all__ exports 10 names (vertical+itrf funcs invisible in help() header; docs/source/reference/reproject.rst autosummary likewise lists only reproject/merge). Cross-cutting, notes only per template: raster/rasters (reproject) vs agg (terrain family) vs source (geotiff); chunk_size (reproject/merge) vs chunks (open_geotiff); resampling+resolution (reproject/merge/accessor) vs method+target_resolution (resample.py -- resample is the outlier, belongs to a resample-module pass, already in resample row's notes). No Cat 4 default drift (resampling='bilinear'/transform_precision=16/chunk_size=None/bounds_policy='auto'/model='EGM96' consistent across siblings). reproject()/merge() kwarg parity smoke-tested on numpy AND cupy DataArrays (merge cupy crash found exactly there). cuda-validated: CUDA_AVAILABLE=True on this host. CI: all GitHub Actions checks green on both PRs; RTD flapped (pending on #3125, fail on #3134 -- repo-wide backlog, change not docs-rendered); PRs left BLOCKED on REVIEW_REQUIRED for the user to merge." -resample,2026-05-27,2544,MEDIUM,3,"Sweep 2026-05-27 (deep-sweep-api-consistency-resample-2026-05-27). 1 MEDIUM Cat 3 finding fixed in this branch (#2544): resample() was the only public symbol in xrspatial.resample without type annotations on any parameter or return; siblings slope/aspect/hillshade/curvature all annotate `agg: xr.DataArray` and `-> xr.DataArray`. Fix adds annotations matching the docstring (agg: xr.DataArray; scale_factor / target_resolution: float | tuple[float, float] | None; method: str; nodata: float | None; name: str) and a `-> xr.DataArray` return type, plus a docstring note that the @supports_dataset decorator accepts Dataset too. Regression test test_resample_signature_annot_2544.py pins every param and the return annotation. Other findings documented but not filed per template: (MEDIUM Cat 1 cross-module) `method` (resample) vs `resampling` (reproject/merge) -- same conceptual parameter, different name, cross-cutting rename, needs design issue. (LOW Cat 1 cross-cutting) first-arg `agg` (resample/slope/aspect/...) vs `raster` (reproject/rasterize/polygonize/sieve) -- library-wide drift, not per-module. (LOW Cat 5) ALL_METHODS imported by tests but not in __all__ (module has no __all__); borderline orphan but used for test parametrisation only. No Cat 2 (returns xr.DataArray as documented). No Cat 4 mutable defaults. resample is exported in xrspatial/__init__.py. cuda-validated: cupy backend smoke-tested with nearest, bilinear, and average on host with CUDA_AVAILABLE=True." -slope,2026-05-29,2681,MEDIUM,3,"Sweep 2026-05-29 (deep-sweep-api-consistency-slope-2026-05-29). 1 MEDIUM Cat 3 finding fixed in this branch (#2681, PR #2687): slope() annotated name as `str` while every terrain-family sibling (aspect/northness/eastness in aspect.py, curvature in curvature.py) uses Optional[str]. name flows into xr.DataArray(name=name) which accepts None, so slope(agg, name=None) already worked at runtime -- the annotation was just wrong and inconsistent. Fix widens to Optional[str] and imports Optional (module previously imported only Union). Non-breaking (type-hint widening), no deprecation shim. Added test_name_annotation_matches_terrain_family (pins parity vs the 4 siblings via get_type_hints, unwrapping @supports_dataset) and test_name_none_accepted (slope(agg, name=None).name is None). Full test_slope.py passes (43). No backend logic touched -- numpy/cupy/dask+numpy/dask+cupy paths unchanged; public signature is shared across backends via ArrayTypeFunctionMapping. Other categories: no Cat 1 in-module rename (slope/aspect share identical public param names agg/name/method/z_unit/boundary); no Cat 2 return drift (returns xr.DataArray/Dataset via @supports_dataset, same coords/dims/attrs convention as siblings); no Cat 4 default drift (name/method='planar'/z_unit='meter'/boundary='nan' match across the family); no Cat 5 orphan API (slope re-exported in __init__.py, documented, no __all__ but consistent with module convention). Cross-cutting (documented, not filed per template): first-arg `agg` (slope/aspect/curvature) vs `raster` (reproject/rasterize/polygonize) is library-wide drift. cuda-validated: CUDA_AVAILABLE=True on this host; cupy slope smoke-tested (planar) and signature parity confirmed between numpy and cupy entry points." -visibility,2026-06-10,3183,MEDIUM,3;5,"Sweep 2026-06-10 (deep-sweep-api-consistency-visibility-2026-06-10). 2 MEDIUM findings filed as issue #3183, fixed in this branch. (MEDIUM Cat 5) output-name convention drift: viewshed() sets a fixed output name and exposes name=, but cumulative_viewshed (visibility.py:289) and visibility_frequency built/returned DataArrays with name=None. Fix adds name='cumulative_viewshed'/'visibility_frequency' params (Optional[str]) and sets result.name; additive, non-breaking, no shim. coords/attrs were already preserved on both. (MEDIUM Cat 3) line_of_sight (visibility.py:162) annotated frequency_mhz: float = None; default contradicts the float hint and the docstring already says optional. Fix -> Optional[float] (imported typing.Optional). Tests added: cumulative/frequency default+custom name. No Cat 1 naming drift: observer_elev/target_elev/max_distance/x/y and the x0/y0/x1/y1 two-point extension match viewshed and the observers dict keys. No Cat 2 arbitrary return drift: line_of_sight -> Dataset fits its per-sample multi-variable result; the two cumulative funcs -> DataArray like viewshed. No Cat 4 default drift (observer_elev=0/target_elev=0/max_distance=None match). No Cat 5 orphan API: all 3 funcs re-exported in __init__.py; no __all__ but consistent with module convention. cuda-validated: CUDA_AVAILABLE=True on this host; cupy entry points accept the new name= kwarg and the line_of_sight Optional hint. PRE-EXISTING backend bug (out of scope, not an api-consistency issue, NOT filed here): cumulative_viewshed on a cupy raster raises TypeError 'Unsupported type numpy.ndarray' in the count + (vs_data != INVISIBLE) accumulation (numpy accumulator vs cupy viewshed result); reproduced on origin/main without this branch's changes -- a backend-parity gap for a future backend-parity sweep." -zonal,2026-06-10,3188,MEDIUM,1;3;5,"Re-sweep 2026-06-10 (deep-sweep-api-consistency-zonal-2026-06-10). Prior sweep's HIGH zones_ids/zone_ids typo confirmed already fixed on main (#2521). Several previously-documented MEDIUM Cat 3 items also fixed on main since 2026-05-27: crosstab layer docstring now says default=None; hypsometric_integral now has param+return annotations; apply now has -> xr.DataArray. Two remaining safe Cat 3 fixes filed+PR'd this run (issue #3188 / PR #3196): (1) crosstab zone_ids/cat_ids annotated List[...]=None -> wrapped in Optional[...] to match stats()/crop(); (2) crosstab nodata_values docstring said 'Cells with nodata' (copy-paste from apply) -> now references nodata_values. Non-breaking, 17 crosstab tests pass. Documented-not-fixed: (MEDIUM Cat 1) nodata vs nodata_values drift across stats/crosstab (nodata_values, default None, filters VALUES raster) vs apply/hypsometric_integral (nodata, default 0, filters ZONES raster) -- names differ but so do the concepts and defaults, so a blanket rename would conflate two distinct meanings; needs a design decision, not a mechanical shim. (MEDIUM Cat 5) get_full_extent has a public-style docstring+example but is not in __init__.py -- borderline orphan, minor utility, left as-is. (LOW Cat 3) crop() lacks a return type annotation while stats/crosstab/apply/regions/trim annotate theirs. Cross-cutting (not filed): first-arg name varies (stats/crosstab/crop use zones; regions/trim use raster) but regions/trim operate on the raster itself so the name matches the role; library-wide agg vs raster vs values naming spans 20+ modules, out of per-module scope. cuda-validated: CUDA_AVAILABLE=True on this host." \ No newline at end of file +module,last_inspected,issue,severity_max,categories_found,notes +classify,2026-06-20,3398,MEDIUM,1;3,"Sweep 2026-06-20 (deep-sweep-api-consistency-classify-2026-06-20). 1 MEDIUM Cat 1 finding filed as #3398 and fixed on this branch. (MEDIUM Cat 1 positional-order drift) natural_breaks ordered its params (agg, num_sample, name, k) while the other two classifiers that take the same trio order them (agg, k, num_sample, name): quantile(agg, k=4, num_sample, name), maximum_breaks(agg, k=5, num_sample, name). So natural_breaks(raster, 5) silently set num_sample=5 instead of k=5. Fix reorders natural_breaks to (agg, k=5, num_sample=20000, name) and adds a _natural_breaks_legacy_order shim: when k= is a keyword AND a second positional is present (the only way pre-1.0 callers passed num_sample, since k was last and always keyword), the positional is treated as the old num_sample with a DeprecationWarning. Keeps the one example notebook call natural_breaks(raster, 20000, k=4) working. Bundled trivial Cat 3 fix in same PR: binary() was the only public classifier with no type hints -- added agg: xr.DataArray, name: Optional[str], -> xr.DataArray to match the other 9. Tests: test_natural_breaks_positional_k_matches_siblings (new positional k == keyword k) and test_natural_breaks_legacy_positional_num_sample_warns (legacy order warns + maps identically). Full test_classify.py (now 91) + test_validation.py pass. Cat 4 considered NOT a finding: quantile k=4 (quartiles) vs k=5 (quintiles) elsewhere is the documented PySAL/mapclassify convention, not drift. No Cat 2 return drift (all 10 publics return xr.DataArray/Dataset via @supports_dataset, coords/dims/attrs preserved). No Cat 5 orphan API (all 10 re-exported in __init__.py; no __all__ but consistent with module convention). Cross-cutting, notes only: first-arg agg (classify family) vs raster (reproject/rasterize/polygonize) is library-wide drift, out of per-module scope. cuda-validated: CUDA_AVAILABLE=True on this host; natural_breaks new order + legacy shim smoke-tested on numpy AND cupy entry points (both warn + remap), dataset path binds name correctly, binary verified on cupy." +focal,2026-06-10,3215;3216,MEDIUM,3;4,"Sweep 2026-06-10 (deep-sweep-api-consistency-focal-2026-06-10). 2 MEDIUM findings filed, fixed on branches -01/-02 off this one. (#3215, MEDIUM Cat 4 cross-backend default parity, branch -01) apply() default func=_calc_mean is an @ngjit CPU function but the cupy/dask+cupy paths launch func as a CUDA kernel via _focal_stats_func_cupy func[griddim, blockdim], so apply(cupy_agg, kernel) raises TypeError 'CPUDispatcher' object is not subscriptable (dask+cupy builds the graph and fails at compute). Prior 2026-05-29 sweep dispositioned this LOW as 'documented in the docstring', but the docstring covers explicit funcs -- the default itself is unusable on 2 of 4 backends. Fix: func=None sentinel resolved per backend (_calc_mean CPU, _focal_mean_cuda GPU), explicit-func behavior unchanged; same PR adds the missing name= param to the apply() docstring (signature has name='focal_apply'; mean/focal_stats/hotspots document theirs). (#3216, MEDIUM Cat 3, branch -02) hotspots() docstring lists 3 backends but dask_cupy_func=_hotspots_dask_cupy is dispatched and works; kernel param documented as binary ('values of 1 indicate the kernel') while hotspots accepts weighted kernels and the Gi* formula in the same docstring uses weights w_ij (apply/focal_stats reject non-binary via _validate_binary_kernel, hotspots deliberately does not). Docs-only fix. LOW documented, not fixed: among the 4 focal publics only mean() has @supports_dataset (Dataset-support drift; feature gap, not an API bug). Cross-cutting, notes only per template: emerging_hotspots(raster=), viewshed(raster=), calc_cellsize(raster) still use raster while focal standardized on agg with a DeprecationWarning shim (#2689/PR #2699); library-wide first-arg drift, belongs to those modules' sweeps. No Cat 1 in-module (agg canonical, raster alias warns, both-args raises). No Cat 2 return drift (mean/apply/hotspots 2D same-type, focal_stats 3D (stats,y,x) as documented). No Cat 5 orphan API (apply/focal_stats/hotspots documented in focal.rst autosummary and consumed via xrspatial.focal module path; only mean re-exported top-level; emerging_hotspots top-level vs hotspots module-level asymmetry noted, additive export would be a design call, not filed). cuda-validated: CUDA_AVAILABLE=True on this host; mean/apply/focal_stats/hotspots smoke-tested on cupy with kwarg parity; the apply default crash reproduced on GPU; hotspots weighted-kernel acceptance verified empirically." +geotiff,2026-06-12,3263;3265,MEDIUM,3;5,"Re-sweep 2026-06-12 (deep-sweep-api-consistency-geotiff-2026-06-12); prior pass 2026-06-09 (#3086). Scope: surface changes since 2026-06-09 (pack/unpack fixes #3171-#3241, SUPPORTED_FEATURES reader.unpack/writer.pack/reader.coregister, coregister docs #3248) plus a fresh 5-category pass on open_geotiff/to_geotiff. 2 MEDIUM findings filed and fixed on branches -01/-02 off this one. (#3263, MEDIUM Cat 3, PR #3269, branch -01) open_geotiff unpack docstring said 'A source without scale / offset metadata is a no-op', but unpack=True folds into the masking gate (_finalize_eager_read: mask_and_scale implies masking, rioxarray parity), so a sentinel-bearing uint16 source still comes back float64 with NaN holes; verified identical on all 4 backends (not a parity bug), only a source with neither scale/offset nor a sentinel reads unchanged. Docs-only fix + test_unpack_noop_doc_3263.py pinning wording (scoped to the unpack paragraph) and behavior. (#3265, MEDIUM Cat 5, PR #3273, branch -02) exception-export drift: VRTUnsupportedError (raised 10+ times in _vrt_validation.py on public .vrt reads, documented in geotiff_safe_io.rst which steered users to the private _errors module), CloudSizeLimitError (importable but not in __all__, sibling UnsafeURLError IS exported), and PixelSafetyLimitError (raised by the [stable] max_pixels cap, only importable from _layout/_reader) were the only 3 exceptions raised on public open_geotiff paths missing from the public surface (other 17 exported). Additive fix: import + __all__ + :class: roles in safe_io doc + trigger-point docs naming the exceptions in max_pixels/max_cloud_bytes param docs and geotiff.rst; test_exception_exports_3265.py pins export, identity with private definitions, and a functional max_pixels raise. Clean elsewhere: docstring/signature parity exact on both publics (programmatic check + 218 existing contract tests); no Cat 1 (signatures unchanged since 2026-06-09; pack/unpack pair deliberate), no Cat 2 (DataArray / path returns unchanged), no Cat 4 (shared allow_* defaults match reader/writer; gpu False-vs-None auto-detect documented). SUPPORTED_FEATURES tiers (reader.unpack/writer.pack/reader.coregister experimental) agree with docstring tier markers. coregister= itself lives on accessor.py (excluded module) -- only its SUPPORTED_FEATURES registration is in geotiff, consistent. cuda-validated: CUDA_AVAILABLE=True; open_geotiff smoke-tested with identical kwargs on numpy/cupy/dask/dask+cupy (cpu/gpu pixel parity), to_geotiff gpu=True, cupy pack=True write (#3240 fix confirmed), deprecated aliases mask_and_scale/name/mask_nodata all warn. Both PRs reviewed (COMMENTED) with findings fixed in follow-up commits c14844a8/af3c8a66; branches up to date with origin/main; left for user merge per REVIEW_REQUIRED." +hydro-d8,2026-05-29,2709,HIGH,1;5,"Sweep 2026-05-29 (deep-sweep-api-consistency-hydro-d8-2026-05-29). Scope = the 13 D8-variant files only; dinf/mfd read for reference but not modified. 1 HIGH Cat 1 + 1 MEDIUM Cat 5 fixed in this branch (#2709, PR #2716). HIGH Cat 1: stream_order_d8 named its strahler/shreve selector `ordering` while sibling stream_order_dinf/stream_order_mfd use `method`; both names live in the public API and the __init__.py _StreamOrderDispatch special-cases the drift (translates ordering->method for non-d8). Fix adds `method` as an accepted alias on stream_order_d8 (case-insensitive; takes precedence; conflicting ordering+method raises ValueError), keeping `ordering` working so the out-of-scope dispatcher (passes ordering=) and existing callers are unaffected. Full rename to `method` deferred because deprecating `ordering` would warn on every stream_order(routing='d8') call via the dispatcher I cannot touch in this scope. MEDIUM Cat 5: basins_d8 (watershed_d8.py) is a backward-compat wrapper whose docstring said 'use basin instead' but emitted no warning; added DeprecationWarning(stacklevel=2). Tests added for alias parity/precedence/conflict/case-insensitivity and for the basins_d8 warning. Findings documented but NOT filed per template: (LOW Cat 1 cross-module, out of scope) dinf siblings name the first arg `flow_dir_dinf` (stream_link/flow_path/hand/watershed_dinf) while all D8 funcs use the cleaner `flow_dir`; D8 is the better convention so no D8 change -- the drift lives in the dinf files. (LOW Cat 4 defensive-validation drift) hand_d8 validates np.isfinite(threshold) but stream_link_d8/stream_order_d8 (same threshold: float = 100 param) do not; not user-facing signature surprise, document only. No Cat 2 return drift (every D8 public fn returns xr.DataArray with coords/dims/attrs preserved; Dataset in -> Dataset out via @supports_dataset). No Cat 3 missing-hints beyond fill_d8 z_limit (optional, no hint) which mirrors its sibling style. All 13 D8 funcs are re-exported in xrspatial/hydro/__init__.py (no orphan API). cuda-validated: CUDA_AVAILABLE=True on this host; method-alias parity smoke-tested on a cupy DataArray. CI: ubuntu/windows/3.12 GitHub Actions green; macOS-3.14 + ReadTheDocs slow but no failures. NOTE: the /review-pr review comment could not be posted to GitHub (auto-mode permission denial on gh pr review); review findings were applied to code instead (case-insensitive conflict check + str|None hint, commit f8467320)." +interpolate,2026-06-12,3285,MEDIUM,2,"Sweep 2026-06-12 (deep-sweep-api-consistency-interpolate-2026-06-12). Scope: idw/_idw.py, kriging/_kriging.py, spline/_spline.py, shared _validation.py. 1 MEDIUM Cat 2 finding filed as #3285, fixed on branch -01 off this one: kriging(return_variance=True) singular-matrix fallback (_kriging.py:499) returns prediction, prediction.copy() so the variance DataArray keeps the prediction's name instead of f'{name}_variance' (normal path :523 names it correctly); reproduced by monkeypatching _build_kriging_matrix to None; anything keying on .name (xr.merge, Dataset build) silently collapses the pair. One-line fix + regression test on the singular path. Clean elsewhere: Cat 1 in-module exact (idw/kriging/spline share x, y, z, template positionals and name= default ''; template matches kde's template=); docstring/signature parity exact on all 3 publics (every param documented, Returns sections match incl. kriging's tuple); Cat 4 no default drift (power=2.0, k=None, fill_value=nan, variogram_model='spherical', nlags=15, smoothing=0.0, all single-owner params); Cat 5 no orphan API (all 3 re-exported in xrspatial/__init__.py and autosummaried in docs/source/reference/interpolation.rst; tests touch private helpers only via module paths). Cross-cutting, notes only per template: fill_value (idw) vs fill (rasterize) for the uncovered-pixel value is library-wide drift (idw matches numpy's fill_value convention, left alone); public functions are untyped module-wide (consistent internally, drifts from typed kde/rasterize/proximity siblings -- annotation pass would span the whole module, LOW, not filed); kde's keyword-only style is the library minority so interpolate's positional style matches the rasterize/proximity majority. GPU k-nearest rejection (NotImplementedError) is deliberate and documented in the k param docstring. cuda-validated: CUDA_AVAILABLE=True on this host; idw/kriging/spline smoke-tested with full kwargs on numpy AND cupy DataArrays (variance name parity confirmed on both), dask+numpy and dask+cupy graph construction verified without compute." +mcda,2026-06-10,3148,HIGH,1;2;3;5,"Sweep 2026-06-10 (deep-sweep-api-consistency-mcda-2026-06-10). Fixed in this branch (#3148): (HIGH Cat 1) owa() named its criterion-weight dict criterion_weights while wlc/wpm/sensitivity use weights (same semantics, same _validate_weights); renamed to weights with keyword-only criterion_weights deprecation shim (DeprecationWarning; both names -> TypeError; positional callers untouched). (MEDIUM Cat 2) boolean_overlay annotated criteria as dict-only while every sibling combiner takes xr.Dataset; Dataset already worked via the Mapping interface -- now annotated/documented as xr.Dataset | dict. (MEDIUM Cat 3) ahp_weights docstring Raises claimed ValueError on incomplete comparisons but code warns (UserWarning) and defaults missing pairs to 1 -- docstring now documents Warns behaviour. (MEDIUM Cat 5) ConsistencyResult returned by public ahp_weights but absent from xrspatial/mcda __all__ and docs/source/reference/mcda.rst -- exported and documented. Documented, NOT fixed here: (MEDIUM Cat 2, deferred to parallel sweep-metadata sibling to avoid duplicate PR) constrain() drops attrs via xr.where while the other nine public functions preserve them. (LOW Cat 2) ahp_weights returns (weights, ConsistencyResult) tuple vs rank_weights bare dict -- intentional, documented in both docstrings, no fix. (LOW Cat 4) name=None inherit-input-name (standardize/constrain) vs literal-name defaults (combiners) -- defensible split, document only. Pre-existing backend bugs surfaced by the mandated cupy smoke (accuracy/test-coverage lane, recorded in #3148 body): owa fails on cupy (numpy order-weights array mixed into cupy multiply, combine.py ~336-340) and on ANY dask backend at graph construction (da.sort does not exist, combine.py:356, despite the owa MemoryError message recommending dask); sensitivity(method=monte_carlo) fails on cupy (template.values implicit-conversion guard). constrain on cupy blocked by the known library-wide cupy 13.6 + xarray xr.where astype incompat (dependency-pin issue), not mcda-specific. cuda-validated: CUDA_AVAILABLE=True; all 10 public functions smoke-tested on cupy DataArrays; owa weights=/criterion_weights= shim verified on numpy AND cupy entry points (cupy execution stops at the pre-existing mixed-array bug, signature acceptance confirmed)." +polygonize,2026-06-12,3306;3307,MEDIUM,1;3,"Re-sweep 2026-06-12 (deep-sweep-api-consistency-polygonize-2026-06-12); prior pass 2026-05-19 (#2148). 2 MEDIUM findings filed and fixed on branches -01/-02 off this one. (#3306, MEDIUM Cat 3, branch -01) column_name docstring says 'Only used if return_type is geopandas or spatialpandas' but _to_geojson also consumes it as the per-feature property key (verified: properties={'myval': 1}); docs-only fix + test pinning geojson property naming. (#3307, MEDIUM sibling-behavior drift, branch -02) return_type is the only polygonize parameter validated AFTER the computation: invalid value runs the full backend (spy-verified 1 invocation before raise) while sibling contours() validates up front and lists allowed values; fix hoists the check into the top validation block with an allowed-values message (existing test matches on prefix, unaffected). Re-confirmed prior dispositions, still documented-only per cross-module rule: (HIGH Cat 1 cross-module) connectivity (polygonize, matches GDAL/rasterio/skimage) vs neighborhood (sieve.py, zonal.regions) for the identical 4|8 rook/queen concept -- rename shim belongs in sieve/zonal, out of polygonize scope; (LOW Cat 1 cross-cutting) raster (polygonize/sieve/clip_polygon) vs agg (contours/terrain family) first-arg drift, library-wide, not filed per-module. No new Cat 2 (return_type dispatch shapes match docstring Returns section exactly); no Cat 4 (atol/rtol mirror numpy.isclose, connectivity=4 == sieve neighborhood=4); Cat 5 LOW documented-only: module has no __all__ and the non-underscore internals generated_jit + Turn leak via import-star; polygonize re-exported in __init__.py and accessor, no orphan API. Docstring/signature parity otherwise exact (all 10 params documented, all annotated). Open polygonize issues #3292/#3293 checked -- no overlap with these findings. cuda-validated: CUDA_AVAILABLE=True on this host; polygonize smoke-tested with identical full kwargs on numpy, cupy (int + float atol/rtol=0), and dask+cupy; no backend signature drift." +proximity,2026-06-09,3090;3091,HIGH,2;3,"Sweep 2026-06-09 (deep-sweep-api-consistency-proximity-2026-06-09). 1 HIGH Cat 2 finding (#3090): dask+numpy (and unbounded dask+cupy, which converts to it) KDTree path violates the documented lowest-flat-index tie-break in allocation()/direction() whenever the raster has >1 chunk column. _collect_region_targets concatenates targets chunk-major (iy outer, ix inner) so the tree's target order is not global row-major; _kdtree_query_lowest_index then ties to the wrong target. Existing tie-break tests put both targets in the same raster row where chunk order coincides with row-major, so they pass. Repro: 5x5, targets 2@(1,3) and 3@(2,2), chunks (5,3), pixel (2,3) tied at d=1 -> numpy gives 2, dask gives 3. Bounded map_overlap paths are fine (local row-major order is offset-invariant). 1 MEDIUM Cat 3 finding (#3091): all 3 public docstrings claim numpy + dask+numpy support only while cupy/dask+cupy backends exist, are dispatched, and are tested (the tie-break paragraphs in the same docstrings name all 4 backends); direction() opens with a stray copy-pasted slope line ('downward slope direction') plus a doubled 'the the'; allocation example output reads as float64 but the function returns float32; stale '# convert to have same type as of input @raster' comment. Within-module Cat 1/4/5 clean: proximity/allocation/direction share an identical signature (raster, x='x', y='y', target_values=None, max_distance=np.inf, distance_metric='EUCLIDEAN'); consistent with surface_distance siblings (raster/x/y/target_values/max_distance); all 6 public symbols (incl. euclidean/manhattan/great_circle_distance) re-exported in __init__.py, no orphan API. Cross-cutting, documented not filed: sibling distance modules (surface_distance, cost_distance, balanced_allocation) use mutable default target_values: list = [] while proximity uses the None sentinel - the mutable-default fix belongs to those modules; proximity's target_values: list = None hint would be more precise as Optional[list] (LOW, matches library style). cuda-validated: CUDA_AVAILABLE=True on this host; proximity/allocation/direction smoke-tested with identical kwargs on numpy, cupy, dask+numpy, dask+cupy (proximity parity passed; allocation/direction parity failure is finding #3090)." +rasterize,2026-06-09,3089,HIGH,1,"Sweep 2026-06-09 (deep-sweep-api-consistency-rasterize-2026-06-09). 1 HIGH Cat 1 fixed in this branch (#3089): rasterize(use_cuda=) vs open_geotiff(gpu=) named the identical GPU-backend opt-in differently; these are the only two public entry points with an explicit GPU boolean (no input array to dispatch on; both pair it with chunks= for dask) and both names were live in the public API at once. Fix renames the positional param to gpu (same slot, positional callers unaffected) and appends use_cuda=None as a deprecated alias: DeprecationWarning on use, TypeError when combined with gpu=True. Docstring, GPU merge warning text, CuPy ImportError text, and polygon_clip.py's internal dask+cupy caller updated (guarded so a legacy use_cuda in rasterize_kw does not collide with the new default); all rasterize test call sites migrated to gpu=; regression tests in test_rasterize_gpu_alias_3089.py pin slot position, warning, TypeError, backend parity, and the warning-free clip_polygon path. Re-inspection after the 2026-05-21 pass (#2250); prior cross-module notes (clip_polygon nodata vs fill, name default drift, polygonize column_name vs column) still documented-only. Docstring/signature parity verified programmatically (17/17 params, order matches). New params since last pass (check_crs, max_pixels) consistent with geotiff naming (max_pixels matches geotiff's). No Cat 2/4/5 findings. LOW noted, not fixed (other module's docs): docs/source/user_guide/focal.ipynb claims convolve_2d takes use_cuda, which it does not. cuda-validated: CUDA_AVAILABLE=True; numpy/cupy/dask+numpy/dask+cupy smoke-tested with identical kwargs, values equal." +reproject,2026-06-09,3095;3097,HIGH,1;2;3,"Sweep 2026-06-09 (deep-sweep-api-consistency-reproject-2026-06-09). 2 findings filed and fixed: #3095 -> PR #3125, #3097 -> PR #3134 (branches -01/-02 off this one). (HIGH Cat 2, #3095) merge() raises TypeError ('Implicit conversion to a NumPy array is not allowed') on cupy-backed inputs while sibling reproject() supports numpy/cupy/dask+numpy/dask+cupy; crash site _merge_inmemory info['raster'].values (__init__.py:2572); dask-of-cupy fails the same way at compute via _merge_block_adapter -> _reproject_chunk_numpy/np.asarray. _merge.py has a complete _merge_arrays_cupy that is imported in __init__.py:38 but never called (dead GPU plumbing; the unused import alone is lint issue #3083 from the style sweep). Fix: host round-trip on entry (same pattern as _apply_vertical_shift), GPU result out, docstring documents backend handling. (MEDIUM Cat 3, #3097) _vertical.py Returns docstrings claim 'same type as input/height' but geoid_height(DataArray) returns np.ndarray (verified empirically) and the four conversion wrappers return np.float64/np.ndarray; geoid_height converts scalars to Python float but the wrappers do not (sibling scalar-return drift). Docs-only fix. Documented but NOT fixed: (LOW Cat 1) itrf_transform(src=/tgt=) abbreviations vs source_/target_ elsewhere -- prior 2026-05-29 sweep already weighed this and left it as-is (frames, not CRSes); filed #3099 before noticing the prior disposition, then closed it as not-planned to avoid churn. (LOW Cat 5) module docstring 'Public API' section lists only reproject/merge while __all__ exports 10 names (vertical+itrf funcs invisible in help() header; docs/source/reference/reproject.rst autosummary likewise lists only reproject/merge). Cross-cutting, notes only per template: raster/rasters (reproject) vs agg (terrain family) vs source (geotiff); chunk_size (reproject/merge) vs chunks (open_geotiff); resampling+resolution (reproject/merge/accessor) vs method+target_resolution (resample.py -- resample is the outlier, belongs to a resample-module pass, already in resample row's notes). No Cat 4 default drift (resampling='bilinear'/transform_precision=16/chunk_size=None/bounds_policy='auto'/model='EGM96' consistent across siblings). reproject()/merge() kwarg parity smoke-tested on numpy AND cupy DataArrays (merge cupy crash found exactly there). cuda-validated: CUDA_AVAILABLE=True on this host. CI: all GitHub Actions checks green on both PRs; RTD flapped (pending on #3125, fail on #3134 -- repo-wide backlog, change not docs-rendered); PRs left BLOCKED on REVIEW_REQUIRED for the user to merge." +resample,2026-05-27,2544,MEDIUM,3,"Sweep 2026-05-27 (deep-sweep-api-consistency-resample-2026-05-27). 1 MEDIUM Cat 3 finding fixed in this branch (#2544): resample() was the only public symbol in xrspatial.resample without type annotations on any parameter or return; siblings slope/aspect/hillshade/curvature all annotate `agg: xr.DataArray` and `-> xr.DataArray`. Fix adds annotations matching the docstring (agg: xr.DataArray; scale_factor / target_resolution: float | tuple[float, float] | None; method: str; nodata: float | None; name: str) and a `-> xr.DataArray` return type, plus a docstring note that the @supports_dataset decorator accepts Dataset too. Regression test test_resample_signature_annot_2544.py pins every param and the return annotation. Other findings documented but not filed per template: (MEDIUM Cat 1 cross-module) `method` (resample) vs `resampling` (reproject/merge) -- same conceptual parameter, different name, cross-cutting rename, needs design issue. (LOW Cat 1 cross-cutting) first-arg `agg` (resample/slope/aspect/...) vs `raster` (reproject/rasterize/polygonize/sieve) -- library-wide drift, not per-module. (LOW Cat 5) ALL_METHODS imported by tests but not in __all__ (module has no __all__); borderline orphan but used for test parametrisation only. No Cat 2 (returns xr.DataArray as documented). No Cat 4 mutable defaults. resample is exported in xrspatial/__init__.py. cuda-validated: cupy backend smoke-tested with nearest, bilinear, and average on host with CUDA_AVAILABLE=True." +slope,2026-05-29,2681,MEDIUM,3,"Sweep 2026-05-29 (deep-sweep-api-consistency-slope-2026-05-29). 1 MEDIUM Cat 3 finding fixed in this branch (#2681, PR #2687): slope() annotated name as `str` while every terrain-family sibling (aspect/northness/eastness in aspect.py, curvature in curvature.py) uses Optional[str]. name flows into xr.DataArray(name=name) which accepts None, so slope(agg, name=None) already worked at runtime -- the annotation was just wrong and inconsistent. Fix widens to Optional[str] and imports Optional (module previously imported only Union). Non-breaking (type-hint widening), no deprecation shim. Added test_name_annotation_matches_terrain_family (pins parity vs the 4 siblings via get_type_hints, unwrapping @supports_dataset) and test_name_none_accepted (slope(agg, name=None).name is None). Full test_slope.py passes (43). No backend logic touched -- numpy/cupy/dask+numpy/dask+cupy paths unchanged; public signature is shared across backends via ArrayTypeFunctionMapping. Other categories: no Cat 1 in-module rename (slope/aspect share identical public param names agg/name/method/z_unit/boundary); no Cat 2 return drift (returns xr.DataArray/Dataset via @supports_dataset, same coords/dims/attrs convention as siblings); no Cat 4 default drift (name/method='planar'/z_unit='meter'/boundary='nan' match across the family); no Cat 5 orphan API (slope re-exported in __init__.py, documented, no __all__ but consistent with module convention). Cross-cutting (documented, not filed per template): first-arg `agg` (slope/aspect/curvature) vs `raster` (reproject/rasterize/polygonize) is library-wide drift. cuda-validated: CUDA_AVAILABLE=True on this host; cupy slope smoke-tested (planar) and signature parity confirmed between numpy and cupy entry points." +visibility,2026-06-10,3183,MEDIUM,3;5,"Sweep 2026-06-10 (deep-sweep-api-consistency-visibility-2026-06-10). 2 MEDIUM findings filed as issue #3183, fixed in this branch. (MEDIUM Cat 5) output-name convention drift: viewshed() sets a fixed output name and exposes name=, but cumulative_viewshed (visibility.py:289) and visibility_frequency built/returned DataArrays with name=None. Fix adds name='cumulative_viewshed'/'visibility_frequency' params (Optional[str]) and sets result.name; additive, non-breaking, no shim. coords/attrs were already preserved on both. (MEDIUM Cat 3) line_of_sight (visibility.py:162) annotated frequency_mhz: float = None; default contradicts the float hint and the docstring already says optional. Fix -> Optional[float] (imported typing.Optional). Tests added: cumulative/frequency default+custom name. No Cat 1 naming drift: observer_elev/target_elev/max_distance/x/y and the x0/y0/x1/y1 two-point extension match viewshed and the observers dict keys. No Cat 2 arbitrary return drift: line_of_sight -> Dataset fits its per-sample multi-variable result; the two cumulative funcs -> DataArray like viewshed. No Cat 4 default drift (observer_elev=0/target_elev=0/max_distance=None match). No Cat 5 orphan API: all 3 funcs re-exported in __init__.py; no __all__ but consistent with module convention. cuda-validated: CUDA_AVAILABLE=True on this host; cupy entry points accept the new name= kwarg and the line_of_sight Optional hint. PRE-EXISTING backend bug (out of scope, not an api-consistency issue, NOT filed here): cumulative_viewshed on a cupy raster raises TypeError 'Unsupported type numpy.ndarray' in the count + (vs_data != INVISIBLE) accumulation (numpy accumulator vs cupy viewshed result); reproduced on origin/main without this branch's changes -- a backend-parity gap for a future backend-parity sweep." +zonal,2026-06-10,3188,MEDIUM,1;3;5,"Re-sweep 2026-06-10 (deep-sweep-api-consistency-zonal-2026-06-10). Prior sweep's HIGH zones_ids/zone_ids typo confirmed already fixed on main (#2521). Several previously-documented MEDIUM Cat 3 items also fixed on main since 2026-05-27: crosstab layer docstring now says default=None; hypsometric_integral now has param+return annotations; apply now has -> xr.DataArray. Two remaining safe Cat 3 fixes filed+PR'd this run (issue #3188 / PR #3196): (1) crosstab zone_ids/cat_ids annotated List[...]=None -> wrapped in Optional[...] to match stats()/crop(); (2) crosstab nodata_values docstring said 'Cells with nodata' (copy-paste from apply) -> now references nodata_values. Non-breaking, 17 crosstab tests pass. Documented-not-fixed: (MEDIUM Cat 1) nodata vs nodata_values drift across stats/crosstab (nodata_values, default None, filters VALUES raster) vs apply/hypsometric_integral (nodata, default 0, filters ZONES raster) -- names differ but so do the concepts and defaults, so a blanket rename would conflate two distinct meanings; needs a design decision, not a mechanical shim. (MEDIUM Cat 5) get_full_extent has a public-style docstring+example but is not in __init__.py -- borderline orphan, minor utility, left as-is. (LOW Cat 3) crop() lacks a return type annotation while stats/crosstab/apply/regions/trim annotate theirs. Cross-cutting (not filed): first-arg name varies (stats/crosstab/crop use zones; regions/trim use raster) but regions/trim operate on the raster itself so the name matches the role; library-wide agg vs raster vs values naming spans 20+ modules, out of per-module scope. cuda-validated: CUDA_AVAILABLE=True on this host." diff --git a/xrspatial/classify.py b/xrspatial/classify.py index 8144a4afa..142934180 100644 --- a/xrspatial/classify.py +++ b/xrspatial/classify.py @@ -1,5 +1,6 @@ from __future__ import annotations +import functools import warnings from functools import partial from typing import List, Optional @@ -111,7 +112,7 @@ def _run_dask_cupy_binary(data, values_cupy): @supports_dataset -def binary(agg, values, name='binary'): +def binary(agg: xr.DataArray, values, name: Optional[str] = 'binary') -> xr.DataArray: """ Binarize a data array based on a set of values. Data that equals to a value in the set will be set to 1. In contrast, data that does not equal to any value in the set will be set to 0. @@ -799,11 +800,39 @@ def _run_dask_cupy_natural_break(agg, num_sample, k): return out +def _natural_breaks_legacy_order(func): + """Bridge the pre-1.0 ``natural_breaks`` parameter order. + + The old signature was ``(agg, num_sample, name, k)``. The new order is + ``(agg, k, num_sample, name)`` to match ``quantile`` and + ``maximum_breaks``. Legacy callers always passed ``k`` as a keyword + (it was the last parameter), so a call that supplies ``k=`` together + with a second positional argument is using the old order: that + positional is the old ``num_sample``. Remap it and warn. + """ + @functools.wraps(func) + def wrapper(agg, *args, **kwargs): + if 'k' in kwargs and len(args) >= 1 and 'num_sample' not in kwargs: + warnings.warn( + "natural_breaks parameter order changed to " + "(agg, k, num_sample, name) to match the other " + "classifiers. Passing num_sample positionally is " + "deprecated; pass num_sample=... instead.", + DeprecationWarning, + stacklevel=2, + ) + kwargs['num_sample'] = args[0] + args = args[1:] + return func(agg, *args, **kwargs) + return wrapper + + @supports_dataset +@_natural_breaks_legacy_order def natural_breaks(agg: xr.DataArray, + k: int = 5, num_sample: Optional[int] = 20000, - name: Optional[str] = 'natural_breaks', - k: int = 5) -> xr.DataArray: + name: Optional[str] = 'natural_breaks') -> xr.DataArray: """ Reclassifies data for array `agg` into new values based on Natural Breaks or K-Means clustering method. Values are grouped so that @@ -815,14 +844,14 @@ def natural_breaks(agg: xr.DataArray, agg : xr.DataArray or xr.Dataset 2D NumPy, CuPy, NumPy-backed Dask, or CuPy-backed Dask array of values to be reclassified. + k : int, default=5 + Number of classes to be produced. num_sample : int, default=20000 Number of sample data points used to fit the model. Natural Breaks (Jenks) classification is indeed O(n²) complexity, where n is the total number of data points, i.e: `agg.size` When n is large, we should fit the model on a small sub-sample of the data instead of using the whole dataset. - k : int, default=5 - Number of classes to be produced. name : str, default='natural_breaks' Name of output aggregate. diff --git a/xrspatial/tests/test_classify.py b/xrspatial/tests/test_classify.py index 950409e5c..1574022ce 100644 --- a/xrspatial/tests/test_classify.py +++ b/xrspatial/tests/test_classify.py @@ -1078,3 +1078,20 @@ def test_percentiles_dask_no_unknown_chunks(): dask_result.data.compute(), equal_nan=True, ) + + +def test_natural_breaks_positional_k_matches_siblings(): + """natural_breaks second positional arg is k, like quantile/maximum_breaks.""" + agg = input_data() + positional = natural_breaks(agg, 3) + keyword = natural_breaks(agg, k=3) + np.testing.assert_array_equal(positional.data, keyword.data) + + +def test_natural_breaks_legacy_positional_num_sample_warns(): + """Legacy (agg, num_sample, k=...) order warns and still maps correctly.""" + agg = input_data() + with pytest.warns(DeprecationWarning): + legacy = natural_breaks(agg, 20000, k=3) + new = natural_breaks(agg, k=3, num_sample=20000) + np.testing.assert_array_equal(legacy.data, new.data)