Skip to content

Redesign as_dataarray: strict public API + internal ensure_dataarray helper#551

Closed
FBumann wants to merge 19 commits intoPyPSA:masterfrom
FBumann:fix/as-dataarray
Closed

Redesign as_dataarray: strict public API + internal ensure_dataarray helper#551
FBumann wants to merge 19 commits intoPyPSA:masterfrom
FBumann:fix/as-dataarray

Conversation

@FBumann
Copy link
Collaborator

@FBumann FBumann commented Jan 20, 2026

Summary

Redesigns as_dataarray() by splitting it into two functions with clear responsibilities:

  • as_dataarray(arr, coords?, dims?) — Public API. When coords is provided: converts to DataArray, validates shared dim coords match exactly, rejects extra dims, broadcasts missing dims via expand_dims. When coords is None: pure type conversion only.
  • ensure_dataarray(arr, coords?, dims?) — Internal helper. Pure type conversion (scalar, numpy, pandas, polars, list → DataArray). No validation, no expand_dims. Coords are used only as construction hints. Callers handle alignment themselves (xarray broadcasting, reindex_like, etc.).

Both share a common _type_dispatch function for the actual type conversion logic.

Motivation

Previously as_dataarray() mixed two conflicting roles:

  1. Strict validation for add_variables where user-provided bounds must match coords exactly
  2. Lenient conversion in expression arithmetic where xarray alignment handles coord reconciliation

This led to bugs where DataArray inputs to add_variables had their coords silently ignored, and attempts to fix it broke expression arithmetic.

How variable coordinates are determined

Scenario Behavior
coords provided, bounds are scalars/numpy/pandas coords defines the variable's coordinate space; inputs are converted using coords
coords provided, bounds are DataArrays coords defines the variable's coordinate space; bounds are validated against coords (shared dims must match, extra dims raise ValueError, missing dims are broadcast)
coords is None, bounds are DataArrays Coordinates are inferred from the DataArray bounds directly
coords is None, bounds are scalars Scalar (dimensionless) variable is created

Changes

File Change
linopy/common.py Extract _type_dispatch (shared conversion), _expand_missing_dims, _validate_dataarray_coords (strict, no allow_extra_dims flag). Add ensure_dataarray. Refactor as_dataarray to delegate to ensure_dataarray when no coords, or validate+expand when coords provided.
linopy/expressions.py Switch all call sites to ensure_dataarray (arithmetic, constants, constraint comparisons, dot)
linopy/variables.py Switch Variable.to_linexpr to ensure_dataarray
linopy/model.py add_variables lower/upper use as_dataarray (strict). Masks, sign, rhs use ensure_dataarray (lenient).
test/test_common.py Tests for _validate_dataarray_coords, ensure_dataarray (no expand, allows extra dims, no coord validation), and add_variables integration.
doc/release_notes.rst Release note entry.

Mask handling

The mask parameter uses ensure_dataarray (no validation). The existing broadcast_mask function handles misaligned mask coords by filling with False. A TODO is left for a future deprecation path toward stricter validation (see #591).

Examples

# Mismatched coords → ValueError
lower = xr.DataArray([0, 0, 0], dims=["time"], coords={"time": [0, 1, 2]})
m.add_variables(lower=lower, coords=[pd.RangeIndex(5, name="time")], name="x")
# ValueError: Coordinates for dimension 'time' do not match

# Extra dims → ValueError
lower = xr.DataArray([[1, 2], [3, 4]], dims=["x", "y"])
m.add_variables(lower=lower, coords={"x": [0, 1]}, name="x")
# ValueError: DataArray has extra dimensions not in coords: {'y'}

# Subset dims → broadcast
lower = xr.DataArray([0, 0], dims=["x"], coords={"x": [0, 1]})
m.add_variables(lower=lower, coords={"x": [0, 1], "y": [0, 1, 2]}, name="x")
# works, lower broadcast to shape (2, 3)

# No coords → pass through unchanged
m.add_variables(lower=lower, name="x")

# Internal arithmetic uses ensure_dataarray (lenient)
expr = m.variables["x"] + np.array([1, 2])  # xarray handles alignment

Checklist

  • Code changes are sufficiently documented
  • Unit tests pass
  • A note for the release notes
  • I consent to the release of this PR's code under the MIT license

@FBumann
Copy link
Collaborator Author

FBumann commented Feb 18, 2026

Maybe remove explicit broadcast from masking added in a prior patch

@FBumann FBumann marked this pull request as ready for review February 19, 2026 08:29
FBumann and others added 12 commits March 11, 2026 13:39
Previously, when a DataArray was passed to as_dataarray(), the coords
parameter was silently ignored. This was inconsistent with other input
types (numpy, pandas) where coords are applied.

Now, when coords is provided as a dict and the input is a DataArray,
the function will reindex the array to match the provided coordinates.
This ensures consistent behavior across all input types.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
  2. Expands to new dims from coords (broadcast)

  Summary:
  - as_dataarray now consistently applies coords for all input types
  - DataArrays with fewer dims are expanded to match the full coords specification
  - This fixes the inconsistency when creating variables with DataArray bounds
Replace reindex with a strict equality check for DataArray inputs.
Silent reindexing is dangerous as it introduces NaNs for missing
indices and drops unmatched ones, masking user bugs. Now raises
ValueError if coords don't match, while still allowing expand_dims
for broadcasting to new dimensions.
Strict by default: raises ValueError if a DataArray has dimensions
not present in coords. Call sites that need broadcasting (multiply,
dot, add) opt in with allow_extra_dims=True. Structural call sites
like add_variables bounds/mask remain strict.
When coords is a sequence (e.g. from add_variables), convert it to a
dict using dims or Index names so the same validation applies. This
closes the gap where sequence coords were silently ignored for
DataArray inputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unifies the sequence-to-dict coords conversion used in
pandas_to_dataarray, numpy_to_dataarray, and the DataArray branch
of as_dataarray into a single helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the coord validation and broadcasting behavior from the
user perspective.
- Skip coord validation for DataArray inputs in arithmetic contexts
  (allow_extra_dims=True) to preserve xarray's native alignment
- Add allow_extra_dims=True to comparison operator and quadratic dot
  as_dataarray calls for consistent broadcasting
- Handle MultiIndex levels in expand_dims guard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move validation out of as_dataarray into model.add_variables directly.
This removes the allow_extra_dims flag and all changes to expressions.py
and variables.py — arithmetic call sites are unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FBumann
Copy link
Collaborator Author

FBumann commented Mar 12, 2026

SHould be merged before #591

Resolve conflict in model.py: keep both semi-continuous variable
validation (from master) and DataArray coord validation (from this PR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FBumann

This comment was marked as spam.

- Fix "coordinates are taken considered" → "are considered" in
  pandas_to_dataarray warning
- Add TODO noting mask DataArray validation is intentionally skipped
  to preserve broadcast_mask's fill-with-False behavior
- Clarify coords docstring for add_variables

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FBumann
Copy link
Collaborator Author

FBumann commented Mar 12, 2026

Adjust mask validation in #591

@FBumann
Copy link
Collaborator Author

FBumann commented Mar 12, 2026

@FabianHofmann I think merging this would be a great addition in UX

@FBumann
Copy link
Collaborator Author

FBumann commented Mar 12, 2026

Fixed #450

FBumann

This comment was marked as spam.

@FBumann FBumann marked this pull request as draft March 12, 2026 14:02
…ay helper

Split as_dataarray into two functions with distinct responsibilities:
- as_dataarray (public): strict coord validation when coords provided,
  rejects extra dims, expands missing dims, raises on coord mismatch
- _coerce_to_dataarray (internal): pure type conversion using coords
  only as construction hints, no validation or expansion

This removes the allow_extra_dims flag and makes the API predictable:
callers that need lenient conversion (expression arithmetic, masks)
use _coerce_to_dataarray, while add_variables bounds use as_dataarray.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FBumann FBumann changed the title Fix as_dataarray to apply coords parameter for DataArray input Redesign as_dataarray: strict public API + internal ensure_dataarray helper Mar 13, 2026
FBumann and others added 2 commits March 13, 2026 09:03
…to optional coords

- Rename internal helper to ensure_dataarray (no underscore, consistent
  with other common.py conventions like broadcast_mask)
- Revert as_dataarray coords back to optional — when coords is None,
  delegates to _type_dispatch for pure type conversion
- Deduplicate _type_dispatch call in as_dataarray (always called, validation conditional)
- Update all call sites in expressions.py, model.py, variables.py
- Update test names to match new function name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Raise ValueError when dims length doesn't match coords sequence length
- Raise ValueError on duplicate .name in coords sequence
- Warn on unnamed items in coords sequence (when dims is None)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@FBumann
Copy link
Collaborator Author

FBumann commented Mar 13, 2026

Closed in favor of #614

@FBumann FBumann closed this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant