Skip to content

reproject: attrs['vertical_crs'] semantics collide with geotiff (string vs EPSG int) #1570

@brendancol

Description

@brendancol

Problem

Two xrspatial subpackages write to the same DataArray attribute key but with incompatible types:

  • xrspatial.geotiff.open_geotiff() writes attrs['vertical_crs'] as an EPSG integer (e.g. 5773 for EGM96, 3855 for EGM2008). See xrspatial/geotiff/__init__.py:355-356 and the geo_info.vertical_epsg: int | None field at xrspatial/geotiff/_geotags.py:143.

  • xrspatial.reproject.reproject() writes attrs['vertical_crs'] as a string token ('EGM96', 'EGM2008', 'ellipsoidal'). See xrspatial/reproject/__init__.py:728-729.

Same key, two different value types. A user reading a GeoTIFF and re-saving after reproject() ends up with a string where the rest of the library expects an int (or vice versa), and downstream code that checks attrs['vertical_crs'] cannot rely on its type.

Reproducer

import xrspatial.geotiff as gt
import xrspatial.reproject as rp

src = gt.open_geotiff('vertical_egm96.tif')
type(src.attrs.get('vertical_crs'))   # int, e.g. 5773

out = rp.reproject(src, 'EPSG:3857', src_vertical_crs='EGM96', tgt_vertical_crs='ellipsoidal')
type(out.attrs.get('vertical_crs'))   # str, 'ellipsoidal'

Proposal

Align reproject's attrs['vertical_crs'] output with the geotiff convention by writing the EPSG integer code:

String token EPSG
'EGM96' 5773
'EGM2008' 3855
'ellipsoidal' (WGS84) 4979

To preserve the human-readable name without losing information, also write attrs['vertical_datum'] with the string token. Keep the src_vertical_crs / tgt_vertical_crs kwargs accepting the string tokens — only the output attribute type changes.

The kwarg names themselves are not renamed, so no deprecation shim is needed for the public signature. Existing callers that read attrs['vertical_crs'] as a string from a reproject output will break, so this should be called out in the release notes as a behavior change. Searching the repo, no tests or notebooks currently consume attrs['vertical_crs'] from reproject output, so blast radius is small.

Categories

  • Cat 2 (return shape drift — output attribute type differs between sibling modules)
  • Cat 5 (public API surface — cross-module attribute key with incompatible semantics)

Severity

HIGH. Same attr key, incompatible types across two public read/write paths in the same package.

Discovered by /sweep-api-consistency pass on reproject (2026-05-10).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions