diff --git a/arithmetics-design/convention.md b/arithmetics-design/convention.md new file mode 100644 index 00000000..b0600296 --- /dev/null +++ b/arithmetics-design/convention.md @@ -0,0 +1,197 @@ +# The v1 convention + +The strict ("v1") convention for linopy. Goals and rollout plan: +[`goals.md`](goals.md). The bugs it fixes are catalogued in [#714]. + +Thirteen sections in three groups: absence (§1–§7), coordinate alignment +(§8–§11), then constraints and reductions (§12–§13). + +## Absence + +Absence — a labelled slot the model does not cover — is the richer half of the +convention. The sections below say what it is (§1–§3), how it arises (§4–§5), +and how it flows through arithmetic and is resolved (§6–§7). + +### §1. Absence is a first-class state + +A *slot* — one labelled position — is either present or *absent*. An absent +slot is one the model does not cover. Absence is a state in its own right, +never a stand-in for a number: an absent variable is not a variable fixed to +zero ([#712]). + +### §2. Encoding absence + +The *marker* is how an absent slot is stored: `NaN` in floating-point fields +(`coeffs`, `const`, numeric constants), and `-1` in integer label fields (a +variable's `labels`, an expression's `vars`, which cannot hold a NaN). The two +encodings are one concept — an absent slot, whatever the dtype. + +Within a single slot, the markers move together: `const.isnull()` at a slot +implies *every* term at that slot has `coeffs = NaN` and `vars = -1`. Operators +that introduce absence at a slot also absorb any live terms there, so the +storage never carries a half-absent row. A term at a *present* slot may still +carry `vars = -1` after `fillna(value)` revives the slot — that's a *dead +term*, inert at the solver layer, and only meaningful as storage book-keeping. + +### §3. Testing absence + +`isnull()` is the one predicate for absence. It reads the marker — `NaN` or +`-1`, whichever the field uses — and reports absence slot by slot. Every rule +that speaks of an "absent slot" means exactly what `isnull()` reports; the +caller never inspects the raw marker. + +### §4. Creating absence + +Absence enters a model only through named operations: `mask=` at construction +marks slots absent up front; `.where(cond)` masks slots in place, keeping +shape; `.reindex()`, `.reindex_like()`, `.shift()`, and `.unstack()` +restructure a coordinate and leave the new positions absent. Operations that +merely move or select existing data — `.roll()`, `.sel()`, `.isel()` — never +introduce it. + +### §5. User-supplied NaN raises + +A NaN in a user-supplied constant raises `ValueError`. linopy trusts NaN only +from its own structural operations (§4), which genuinely mark absence. A NaN in +user data is ambiguous — a deliberate "absent", or a data error — so linopy +refuses to guess and asks the caller to resolve it with `fillna()`. This +replaces today's silent per-operator fills, which guessed a different value for +every operator ([#713]). To mark slots absent, use the mechanisms of §4 — a +bare NaN in a constant is not one of them. + +The alternative — reading user NaN as "absent" instead of raising — was +discussed in [#627] and closed: ambiguous overload of a numeric value +defeats goal #1, since a data-error NaN is silently re-labelled as +intentional absence. + +### §6. Absence propagates through every operator + +Every operator carries absence through unchanged: a slot absent in any operand +is absent in the result. `shifted * 3` is absent; `shifted + 5` is absent; +`x + shifted` is absent wherever `shifted` is — even though `x` itself is fine +there. + +linopy never fills an absent slot on the user's behalf, because the right fill +depends on intent it cannot see: 0 for a sum, 1 for a product, or "leave this +out" entirely. Because every operator propagates the same way, the algebraic +laws of §10 carry over to absent slots untouched — absence absorbs, so every +grouping of an expression agrees. And `shifted * 3` staying absent, rather than +collapsing to `0`, is what preserves the absent-vs-zero distinction of §1. + +### §7. Resolving absence + +Because §6 never fills, turning an absent slot into a value is the caller's +explicit act, never linopy's. `fillna(value)` fills an expression's absent +slots; `.fillna(...)` fills a constant before it enters the arithmetic; +`fill_value=` on a named method fills as part of the call. Filling at the call +site documents the intent: `x + y.shift(time=1).fillna(0)` says "treat the +missing earlier step as zero" exactly where it matters. + +## Coordinate alignment + +linopy's operands are xarray objects, so the convention starts from xarray's +alignment model (goal 4): coordinates align by *label*, never by position; +non-shared dimensions broadcast; a mismatch on a shared dimension is resolved +by an explicit *join*. + +**Open question:** how should v1 align *unlabeled* data — a raw numpy array +carries no labels to match on. Still open. + +### §8. Shared dimensions must match exactly + +If two operands share a dimension, their coordinate labels must be identical, +or the operator raises `ValueError`. + +This is xarray's model with `arithmetic_join="exact"` — deliberately stricter +than xarray's own default (`inner`). An inner join silently drops the +non-overlapping labels, and in an optimization model a dropped coordinate is a +dropped term or constraint: a silent wrong answer. An exact match surfaces the +mismatch where it happens. (The [pyoframe] library uses the same model.) + +Because the rule is identical for every operator, the operator-alignment split +([#708]) — `*` aligning by label while `+`, `-`, `/` go by position — +disappears. + +### §9. Non-shared dimensions broadcast freely + +A dimension present in only one operand broadcasts over the other, with no +restriction — for both expressions and constants. Only *shared* dimensions are +subject to §8. + +### §10. Mismatches resolve via an explicit join + +When coordinates genuinely differ, §8 raises — and the caller says how to +resolve it. Several primitives bring operands into agreement: + +- `.sel()` / `.isel()` cut operands down to a shared subset — often the + clearest fix. +- The named methods — `.add` `.sub` `.mul` `.div` `.le` `.ge` `.eq` — take a + `join=` argument: `exact`, `inner`, `outer`, `left`, `right`, or `override`. + `override` is the old positional behavior — still available, but now opt-in + and named rather than triggered by a size coincidence. +- `.reindex()` / `.reindex_like()` conform an operand to a target index + (extending past the original creates absent positions — §4). +- `.assign_coords()` relabels an operand outright (positional alignment, made + explicit). +- `linopy.align()` pre-aligns several operands at once. + +Because no operator silently drops coordinates, the associativity break +([#711]) cannot occur: the operation that used to drop coordinates now raises. +Every standard algebraic law — commutativity, associativity, distributivity, +the identities — holds for same-coordinate operands. + +### §11. Auxiliary-coordinate conflicts raise + +Auxiliary (non-dimension) coordinates are user-attached metadata: a coord +defined on some dimension but not itself a dimension, like a `B(A)` group +label on dimension `A`. linopy *validates* them (the conflict-raise rule +below) and *propagates* them through arithmetic unchanged, but never +*computes* with them — they describe the data, they don't enter the math. + +When two operands carry an aux coord with the same name and values agree, +the coord propagates to the result. When only one operand carries the +coord, it propagates from that operand unchanged — asymmetric presence is +not a conflict. When the values *do* disagree (same name on both sides, +different values), the operator raises — `xarray` silently drops the +conflict, which is the [#295] bug. The caller resolves it explicitly with +`.drop_vars(name)` (remove the coord) or `.assign_coords(name=...)` +(relabel one side). + +## Constraints and reductions + +Two kinds of operation build on the rules above without being binary operators: +the comparisons that form constraints, and the reductions that collapse a +dimension. + +### §12. Constraints follow the same rules + +A constraint is built by comparing two sides with `<=`, `>=`, or `==` — and a +comparison is an operator like any other. It aligns its sides by §8 and carries +absence by §6, exactly as `+`, `-`, `*`, and `/` do. So algebraically equal +forms build the same constraint: `x - a <= 0` and `x <= a` agree, where today +they do not ([#707]). + +Each slot becomes one constraint row. An absent slot yields no row — absence +propagated into a comparison drops the constraint there, the same outcome as +masking it. + +### §13. Reductions skip absent slots + +Reductions — `sum`, `mean`, and the `groupby` / `resample` / `coarsen` +aggregations — collapse a dimension rather than combining two operands, so the +propagation of §6 does not apply: they *skip* absent slots instead. `sum` adds +the present terms, and the sum of none is the zero expression. `mean` divides +by the count of *present* slots, not all of them — dividing by all would treat +an absent slot as a zero term, which §1 forbids. The objective totals its +terms the way `sum` does. + + +[pyoframe]: https://github.com/Bravos-Power/pyoframe +[#714]: https://github.com/PyPSA/linopy/issues/714 +[#713]: https://github.com/PyPSA/linopy/issues/713 +[#712]: https://github.com/PyPSA/linopy/issues/712 +[#711]: https://github.com/PyPSA/linopy/issues/711 +[#708]: https://github.com/PyPSA/linopy/issues/708 +[#707]: https://github.com/PyPSA/linopy/issues/707 +[#627]: https://github.com/PyPSA/linopy/issues/627 +[#295]: https://github.com/PyPSA/linopy/issues/295 diff --git a/arithmetics-design/docs-plan.md b/arithmetics-design/docs-plan.md new file mode 100644 index 00000000..9c55750e --- /dev/null +++ b/arithmetics-design/docs-plan.md @@ -0,0 +1,32 @@ +# Docs plan — user-facing migration guide + +Early-stage outline for the v1 migration docs. Not the guide itself — +the three pieces it needs to cover, written when someone picks it up. + +## Three audiences, one migration + +- **Downstream library maintainers** (PyPSA, pypsa-eur, calliope, …) — + carry the bulk of the migration work: opt their codebases into v1, + fix the raises, ship a release that no longer warns under legacy. +- **Direct users of linopy** — write linopy code themselves and need + to know what changes for their own call sites. +- **End users of downstream libraries** — never touch linopy directly, + but may see a `LinopySemanticsWarning` in CI logs and need a + pointer to "this is upstream; your maintainer will handle it". + +## Three things to cover + +1. **Why v1 exists.** One paragraph: legacy silently mishandled NaN, + coord mismatches, and absent variables. The bug catalogue in #714 + has the case-by-case detail. + +2. **What's changing and when.** The rollout timeline: + - v1 ships opt-in via `linopy.options['semantics'] = 'v1'`. + - v1 becomes the default in a later minor release (date TBD). + - Legacy removed at 1.0. + +3. **How to migrate.** What downstream maintainers do to flip their + codebase: opt in on a branch, run tests, fix the raises. The + legacy warning text already names the rule and the fix per site, + so the guide is mostly the high-level recipe plus a pointer to + the spec (`arithmetics-design/convention.md`) for the rule list. diff --git a/arithmetics-design/goals.md b/arithmetics-design/goals.md new file mode 100644 index 00000000..ba434836 --- /dev/null +++ b/arithmetics-design/goals.md @@ -0,0 +1,47 @@ +# The v1 convention — design & transitioning goals + +Goals for linopy's strict ("v1") convention. The bugs that motivate +it are catalogued in [#714]; the convention itself is in +[`convention.md`](convention.md). + +## Design goals + +The convention serves four goals, in priority order: + +1. **No silent wrong answers.** Every bug in the catalogue ([#714]) returns a + plausible result with no error. The overriding goal: a mismatch linopy + cannot resolve unambiguously must raise, not get guessed. Where the library + cannot decide, the caller does — with an explicit join, `.sel()`, or + `fill_value=`. +2. **Preserve the algebraic laws.** Commutativity, associativity, + distributivity, the identities. Optimization code builds expressions by + rearranging terms, and the convention must keep that safe. +3. **Absence is first-class.** A variable can be genuinely absent at a slot — + masked out, or shifted past the edge. The data model needs an explicit + marker for that absence, kept distinct from a zero term, so absent-vs-zero + is never a silent guess. +4. **Least surprise.** linopy is built on xarray and its users know xarray. The + convention should behave the way xarray already taught them — align by + label, broadcast non-shared dimensions, resolve mismatches with a named + join — not invent linopy-specific rules. Auxiliary coordinates the user + attached are the user's; linopy validates and carries them through, + never silently dropped or rewritten. + +## Transitioning goals + +1. **Non-breaking.** Existing code keeps working — legacy stays available and + unchanged until it is removed at linopy 1.0. +2. **Actionable warnings.** Warn every legacy user about behaviour changes — + what changes under v1, and how to fix it — aiming for 100% coverage. +3. **No silent change.** Opting into v1 never silently changes a model — every + difference is either raised, or was warned about in legacy mode. + +**Schedule:** + +1. Introduce v1 as opt-in — warn about behaviour changes on legacy, raise if + opted into v1. +2. Make v1 the default, allow opt-out. +3. linopy 1.0 — drop the legacy convention entirely. + + +[#714]: https://github.com/PyPSA/linopy/issues/714 diff --git a/arithmetics-design/legacy-removal.md b/arithmetics-design/legacy-removal.md new file mode 100644 index 00000000..f4ff6f1d --- /dev/null +++ b/arithmetics-design/legacy-removal.md @@ -0,0 +1,151 @@ +# Legacy removal checklist (for linopy 1.0) + +The v1 convention ships alongside legacy from 0.x onward and replaces it +entirely at 1.0. This file enumerates everything to delete when that +release happens, in dependency order. Most edits are mechanical; +`grep "LEGACY: remove at 1.0"` finds every inline marker comment in the +source tree. + +## Implementation + +### `linopy/config.py` + +- Drop `LEGACY_SEMANTICS`, `V1_SEMANTICS`, `VALID_SEMANTICS` constants. +- Drop `LEGACY_SEMANTICS_MESSAGE`. +- Drop the `LinopySemanticsWarning` class. +- Remove the `semantics` key from `options`. The option no longer exists; + callers don't need to opt in. +- Remove the v1/legacy validation branch in `set_value`. + +### `linopy/semantics.py` + +- Delete `is_v1()` (always true). Inline `True` at the four import sites + in `expressions.py`/`variables.py` or, better, delete the import and + the now-dead `else` branches alongside it. +- Drop the legacy-warn branch from `check_user_nan_scalar` / + `check_user_nan_array` — both become a single `raise ValueError(...)`. +- Delete `dim_coords_differ`: only the legacy `_align_constant` default + path uses it. +- `merge_shared_user_coords_differ`, `conflicting_aux_coord`, + `absorb_absence` stay — they're v1 enforcement helpers. + +### `linopy/expressions.py` + +- `_add_constant`: delete `_add_constant_legacy`; inline + `_add_constant_v1` into the now-trivial dispatcher (or rename it to + `_add_constant`). +- `_apply_constant_op`: same treatment with `_apply_constant_op_legacy`. +- `_align_constant`: delete the `else` branch under `if join is None:` + that handles the legacy size-aware default (`other.sizes == self.const.sizes` + positional + `reindex_like` left-join paths). The explicit-join code + below stays. +- `to_constraint`: drop the `if is_v1(): ... return ...` wrapper and + keep its body; delete the legacy auto-mask fallthrough that follows + (the `rhs_nan_mask` plumbing plus the `rhs.reindex_like(..., + fill_value=np.nan)` pad). +- `LinearExpression.isnull`: drop the legacy `(self.vars == -1).all(...) + & self.const.isnull()` branch — `self.const.isnull()` is the v1 answer. +- `merge`: + - Drop the `if differ: warn(...)` line and the `if aux_conflict: + warn(...)` line — these are the §8 / §11 legacy warns. The raises + above stay. + - The `skipna = not is_v1()` simplifies to `skipna = False` (v1's + propagation rule). + - The trailing `if is_v1(): ds = absorb_absence(ds)` becomes + unconditional. +- Drop the `LinopySemanticsWarning` / `LEGACY_SEMANTICS_MESSAGE` imports + from `expressions.py`. + +### `linopy/variables.py` + +- `Variable.to_linexpr`: drop the `else` branch (legacy + `reindex_like(fill_value=0).fillna(0)`); make the v1 `reindex_like( + fill_value=NaN) → .where(~absent)` path unconditional. The + `const = NaN`/`0` assign also becomes unconditional. +- Drop the `from linopy.semantics import is_v1` import. + +### `linopy/piecewise.py` / `linopy/sos_reformulation.py` + +Nothing to remove; these are v1-clean (the `drop=True` / `assign_coords` +fixes from Slice P are correct for both semantics). + +## Tests + +### `test/conftest.py` + +- Drop the `LEGACY_SEMANTICS`/`V1_SEMANTICS`/`VALID_SEMANTICS` imports + and the `LinopySemanticsWarning` import. +- Drop the `legacy` / `v1` marker registration in `pytest_configure`. +- Delete the autouse `semantics` fixture entirely (no more parameterization, + no more warning suppression). + +### `test/test_legacy_violations.py` + +- Delete the file. Everything in it either documents legacy behaviour + (gone) or tests v1 raises (covered by the per-module test files we + add tests to alongside the implementation). + + Before deleting, move any v1 tests that don't have a per-§ home into + the appropriate module: + - `TestExactAlignmentConstant`, `TestExactAlignmentMerge`, + `TestBroadcastNonSharedDim` → `test_linear_expression.py`. + - `TestConstraintRHS` → `test_constraints.py`. + - The rest are small enough to fold in alongside related tests. + +### `test/test_convention.py` + +- Delete the file (it tests the `options["semantics"]` framework, which + is gone). + +### Marker stripping + +`grep -rn "@pytest.mark.legacy\|@pytest.mark.v1\|pytestmark = pytest.mark.legacy" +test/` finds every marker: + +- `@pytest.mark.legacy` decorators — delete the decorator (the test + body is documenting old behaviour; deleting the whole test is + usually right). Spot-check before each delete; a few "legacy" + marks turned out to gate on legacy auto-mask semantics and the + test itself stays valid under v1. +- `@pytest.mark.v1` decorators — strip the decorator (the test stays). +- `pytestmark = pytest.mark.legacy` at module level — was only used + while `piecewise.py` was non-v1-aware; removed in Slice P. Verify + none remain. + +## Documentation + +### `arithmetics-design/goals.md` + +- Drop the entire "Transitioning goals" section (the three transitioning + goals + the schedule are about the legacy bridge, which is gone). +- Goal #4's mention of `LinopySemanticsWarning` (if any) goes. + +### `arithmetics-design/convention.md` + +- Drop the `## Legacy` framing if it exists. Each § already describes + v1 directly; any "where today this does X" asides referencing legacy + behaviour can go. +- Update the intro: "The strict ('v1') convention" → "The arithmetic + convention" (drops the v1 framing now that there's only one). + +### This file (`legacy-removal.md`) + +- Delete after the 1.0 release ships. + +## Order of operations + +A safe sequence (each step compiles and tests pass): + +1. Delete legacy test infrastructure (`test/conftest.py` fixture, + `test/test_convention.py`, `test/test_legacy_violations.py` after + moving v1 tests out). +2. Strip `@pytest.mark.legacy` decorators (the tests fail under v1 + anyway once the legacy paths are gone — delete or update each). +3. Delete legacy implementation branches in `expressions.py` / + `variables.py`. +4. Delete `semantics.py` legacy bits (`is_v1`, the warn branches in + `check_user_nan_*`, `dim_coords_differ`). +5. Delete `config.py` symbols (`LEGACY_SEMANTICS`, the warning class, + the option key). +6. Update `arithmetics-design/goals.md` and `convention.md`. +7. Delete this file. diff --git a/linopy/__init__.py b/linopy/__init__.py index e80e615d..0105e511 100644 --- a/linopy/__init__.py +++ b/linopy/__init__.py @@ -13,7 +13,7 @@ # we need to extend their __mul__ functions with a quick special case import linopy.monkey_patch_xarray # noqa: F401 from linopy.common import align -from linopy.config import options +from linopy.config import LinopySemanticsWarning, options from linopy.constants import ( EQUAL, GREATER_EQUAL, @@ -57,6 +57,7 @@ "GREATER_EQUAL", "LESS_EQUAL", "LinearExpression", + "LinopySemanticsWarning", "Model", "Objective", "OetcHandler", diff --git a/linopy/config.py b/linopy/config.py index 5d269c4e..13731e58 100644 --- a/linopy/config.py +++ b/linopy/config.py @@ -9,6 +9,28 @@ from typing import Any +LEGACY_SEMANTICS = "legacy" +V1_SEMANTICS = "v1" +VALID_SEMANTICS = {LEGACY_SEMANTICS, V1_SEMANTICS} + +LEGACY_SEMANTICS_MESSAGE = ( + "The 'legacy' semantics are deprecated and will be removed in " + "linopy 1.0. Set linopy.options['semantics'] = 'v1' to opt in " + "to the new behaviour, or silence this warning with:\n" + " import warnings; warnings.filterwarnings(" + "'ignore', category=LinopySemanticsWarning)" +) + + +class LinopySemanticsWarning(FutureWarning): + """ + Emitted when code runs under the legacy arithmetic semantics. + + Subclasses ``FutureWarning`` rather than ``DeprecationWarning`` so it is + shown to end users by default; the legacy-to-v1 transition changes + results, not just an API surface. + """ + class OptionSettings: """Runtime configuration knobs (e.g. display widths). Use as a context manager or set values directly via ``options(key=value)``.""" @@ -30,6 +52,11 @@ def set_value(self, **kwargs: Any) -> None: for k, v in kwargs.items(): if k not in self._defaults: raise KeyError(f"{k} is not a valid setting.") + if k == "semantics" and v not in VALID_SEMANTICS: + raise ValueError( + f"Invalid semantics: {v!r}. " + f"Must be one of {sorted(VALID_SEMANTICS)}." + ) self._current_values[k] = v def get_value(self, name: str) -> Any: @@ -62,4 +89,5 @@ def __repr__(self) -> str: options = OptionSettings( display_max_rows=14, display_max_terms=6, + semantics=LEGACY_SEMANTICS, ) diff --git a/linopy/expressions.py b/linopy/expressions.py index 2ab0b8d3..8af95ef6 100644 --- a/linopy/expressions.py +++ b/linopy/expressions.py @@ -68,7 +68,9 @@ to_dataframe, to_polars, ) -from linopy.config import options +from linopy.config import ( + options, +) from linopy.constants import ( CV_DIM, EQUAL, @@ -81,6 +83,18 @@ STACKED_TERM_DIM, TERM_DIM, ) +from linopy.semantics import ( + _legacy_coord_mismatch_message, + _legacy_nan_rhs_constraint_message, + _shared_dim_mismatch_message, + absorb_absence, + check_user_nan, + enforce_aux_conflict, + first_mismatched_dim, + is_v1, + merge_shared_user_coord_mismatch, + warn_legacy, +) from linopy.types import ( ConstantLike, DimsLike, @@ -521,11 +535,18 @@ def _multiply_by_linear_expression( ds = other.data[["coeffs", "vars"]].sel(_term=0).broadcast_like(self.data) ds = assign_multiindex_safe(ds, const=other.const) res = merge([self, ds], dim=FACTOR_DIM, cls=QuadraticExpression) - # deal with cross terms c1 * v2 + c2 * v1 + # deal with cross terms c1 * v2 + c2 * v1. The ``const`` factors + # are internal §6-propagated fields — NaN at absent slots, not + # user-supplied data. ``fillna(0)`` makes them safe to pass back + # through the public-API ``*`` (which §5-checks user NaN); the + # zeroed cross-term contribution at an absent slot adds nothing, + # and ``res`` already carries the absence marker from the + # FACTOR_DIM merge above (NaN + 0 = NaN), so absence survives + # and ``absorb_absence`` enforces the storage invariant. if self.has_constant: - res = res + self.const * other.reset_const() + res = res + other.reset_const() * self.const.fillna(0) if other.has_constant: - res = res + self.reset_const() * other.const + res = res + self.reset_const() * other.const.fillna(0) return cast(QuadraticExpression, res) def _align_constant( @@ -544,7 +565,9 @@ def _align_constant( fill_value : float, default: 0 Fill value for missing coordinates. join : str, optional - Alignment method. If None, uses size-aware default behavior. + Alignment method. If None, the default is determined by + ``options["semantics"]`` — ``"exact"`` under ``v1``, the + legacy size-aware behavior under ``legacy``. Returns ------- @@ -555,33 +578,137 @@ def _align_constant( needs_data_reindex : bool Whether the expression's data needs reindexing. """ + # §11: aux-coord conflict is independent of dim alignment — fires + # on every join path. Gating it behind ``join is None`` (alongside + # the §8 dim check) would leave ``join="override"`` etc. silently + # dropping the conflicting coord, which is the #295 bug v1 is + # meant to close. v1 raises; legacy warns. + enforce_aux_conflict([self.const, other], stacklevel=4) if join is None: - if other.sizes == self.const.sizes: - return self.const, other.assign_coords(coords=self.coords), False + if is_v1(): + join = "exact" + else: + # LEGACY: remove at 1.0 — see arithmetics-design/legacy-removal.md. + # Legacy default: positional when sizes match, else left-join. + mismatch = first_mismatched_dim(self.const, other) + if other.sizes == self.const.sizes: + if mismatch is not None: + warn_legacy( + _legacy_coord_mismatch_message( + "this operator's constant operand", *mismatch + ), + stacklevel=4, + ) + return self.const, other.assign_coords(coords=self.coords), False + warn_legacy( + _legacy_coord_mismatch_message( + "this operator's constant operand", + *(mismatch or (None, None, None)), + ), + stacklevel=4, + ) + return ( + self.const, + other.reindex_like(self.const, fill_value=fill_value), + False, + ) + + if join == "override": + # §10: ``override`` is the *explicit* form of the legacy + # positional alignment. Positional pairing requires the + # shared dims to have matching sizes on both sides — without + # that the labels we're about to ``assign_coords`` on would + # be a coord-rename of mismatched data, which is exactly the + # silent-wrong-answer hazard ``override`` was tightened up + # to prevent. Raise with a clear message instead of letting + # xarray broadcast / error opaquely downstream. + shared = set(self.const.dims) & set(other.dims) + bad = sorted( + (d for d in shared if self.const.sizes[d] != other.sizes[d]), + key=str, + ) + if bad: + sizes = ", ".join( + f"{d!r}: left={self.const.sizes[d]}, right={other.sizes[d]}" + for d in bad + ) + raise ValueError( + f"join='override' requires matching sizes on shared " + f"dimensions, but sizes differ on {sizes}. Use " + f"join='inner' / 'outer' / 'left' / 'right' to combine " + f"by label, or reshape one side first." + ) + return self.const, other.assign_coords(coords=self.coords), False + if join == "left": return ( self.const, other.reindex_like(self.const, fill_value=fill_value), False, ) - elif join == "override": - return self.const, other.assign_coords(coords=self.coords), False - else: - self_const, aligned = xr.align( - self.const, - other, - join=join, - fill_value=fill_value, - ) - return self_const, aligned, True + # ``xr.align(..., join="exact")`` raises with a wording that's not + # API-stable across xarray releases; matching on ``"exact" in str(e)`` + # would silently degrade if upstream rephrases. Do the §8 check + # ourselves and raise the canonical ``_shared_dim_mismatch_message`` + # (same text as the v1-default and merge paths). Other joins + # (inner / outer / right) handle coord mismatches via the join + # mode and don't error here. + if join == "exact": + mismatch = first_mismatched_dim(self.const, other) + if mismatch is not None: + raise ValueError(_shared_dim_mismatch_message(*mismatch)) + self_const, aligned = xr.align( + self.const, + other, + join=join, + fill_value=fill_value, + ) + return self_const, aligned, True def _add_constant( self: GenericExpression, other: ConstantLike, join: JoinOptions | None = None ) -> GenericExpression: - # NaN values in self.const or other are filled with 0 (additive identity) - # so that missing data does not silently propagate through arithmetic. + if is_v1(): + return self._add_constant_v1(other, join) + return self._add_constant_legacy(other, join) + + def _add_constant_v1( + self: GenericExpression, other: ConstantLike, join: JoinOptions | None + ) -> GenericExpression: + # §6: absence propagates — self.const NaN stays NaN, no fillna(0). + # §5: user NaN raised in check_user_nan; never reaches the math here. if np.isscalar(other) and join is None: + if isinstance(other, float) and np.isnan(other): + check_user_nan() + return self.assign(const=self.const + other) + da = as_dataarray(other, coords=self.coords, dims=self.coord_dims) + if da.isnull().any(): + check_user_nan() + self_const, da, needs_data_reindex = self._align_constant( + da, fill_value=0, join=join + ) + if needs_data_reindex: + return self.__class__( + self.data.reindex_like(self_const, fill_value=self._fill_value).assign( + const=self_const + da + ), + self.model, + ) + return self.assign(const=self_const + da) + + # LEGACY: remove at 1.0 — see arithmetics-design/legacy-removal.md. + def _add_constant_legacy( + self: GenericExpression, other: ConstantLike, join: JoinOptions | None + ) -> GenericExpression: + # NaN values in self.const or other are silently filled with 0 + # (additive identity) so missing data does not propagate through + # arithmetic. ``check_user_nan`` only warns under legacy. + if np.isscalar(other) and join is None: + if isinstance(other, float) and np.isnan(other): + check_user_nan() return self.assign(const=self.const.fillna(0) + other) da = as_dataarray(other, coords=self.coords, dims=self.coord_dims) + if da.isnull().any(): + check_user_nan() self_const, da, needs_data_reindex = self._align_constant( da, fill_value=0, join=join ) @@ -601,16 +728,60 @@ def _apply_constant_op( other: ConstantLike, op: Callable[[DataArray, DataArray], DataArray], fill_value: float, + op_kind: str, join: JoinOptions | None = None, ) -> GenericExpression: - """ - Apply a constant operation (mul, div, etc.) to this expression with a scalar or array. + """Apply a constant operation (mul, div) to this expression.""" + if is_v1(): + return self._apply_constant_op_v1(other, op, fill_value, op_kind, join) + return self._apply_constant_op_legacy(other, op, fill_value, op_kind, join) - NaN values are filled with neutral elements before the operation: - - factor (other) is filled with fill_value (0 for mul, 1 for div) - - coeffs and const are filled with 0 (additive identity) - """ + def _apply_constant_op_v1( + self: GenericExpression, + other: ConstantLike, + op: Callable[[DataArray, DataArray], DataArray], + fill_value: float, + op_kind: str, + join: JoinOptions | None, + ) -> GenericExpression: + # §6: NaN in coeffs/const propagates through op (NaN * x = NaN). + # §5: user NaN raised before we get here. + if isinstance(other, float) and np.isnan(other): + check_user_nan(op_kind=op_kind) + factor = as_dataarray(other, coords=self.coords, dims=self.coord_dims) + if factor.isnull().any(): + check_user_nan(op_kind=op_kind) + self_const, factor, needs_data_reindex = self._align_constant( + factor, fill_value=fill_value, join=join + ) + if needs_data_reindex: + data = self.data.reindex_like(self_const, fill_value=self._fill_value) + return self.__class__( + assign_multiindex_safe( + data, + coeffs=op(data.coeffs, factor), + const=op(self_const, factor), + ), + self.model, + ) + return self.assign(coeffs=op(self.coeffs, factor), const=op(self_const, factor)) + + # LEGACY: remove at 1.0 — see arithmetics-design/legacy-removal.md. + def _apply_constant_op_legacy( + self: GenericExpression, + other: ConstantLike, + op: Callable[[DataArray, DataArray], DataArray], + fill_value: float, + op_kind: str, + join: JoinOptions | None, + ) -> GenericExpression: + # NaN values are silently filled with neutral elements before the op: + # factor → fill_value (0 for mul, 1 for div), coeffs/const → 0. + if isinstance(other, float) and np.isnan(other): + check_user_nan(op_kind=op_kind) factor = as_dataarray(other, coords=self.coords, dims=self.coord_dims) + if factor.isnull().any(): + check_user_nan(op_kind=op_kind) self_const, factor, needs_data_reindex = self._align_constant( factor, fill_value=fill_value, join=join ) @@ -631,12 +802,16 @@ def _apply_constant_op( def _multiply_by_constant( self: GenericExpression, other: ConstantLike, join: JoinOptions | None = None ) -> GenericExpression: - return self._apply_constant_op(other, operator.mul, fill_value=0, join=join) + return self._apply_constant_op( + other, operator.mul, fill_value=0, op_kind="mul", join=join + ) def _divide_by_constant( self: GenericExpression, other: ConstantLike, join: JoinOptions | None = None ) -> GenericExpression: - return self._apply_constant_op(other, operator.truediv, fill_value=1, join=join) + return self._apply_constant_op( + other, operator.truediv, fill_value=1, op_kind="div", join=join + ) def __div__(self: GenericExpression, other: SideLike) -> GenericExpression: try: @@ -1101,6 +1276,38 @@ def to_constraint( f"Both sides of the constraint are constant. At least one side must contain variables. {self} {rhs}" ) + if is_v1(): + # §5 + §12: the RHS is a user-supplied constant just like any + # operand in arithmetic. Validate NaN here (raise) and let + # ``self.sub(rhs)`` do the §8 alignment — no silent + # reindex_like-pad that would mask a coordinate mismatch. + # An absent slot in ``self.const`` (propagated from §6) flows + # through ``sub`` into the RHS and reaches downstream + # auto-mask handling as "no constraint at this row" (§12). + if isinstance(rhs, SUPPORTED_CONSTANT_TYPES): + rhs = as_dataarray(rhs, coords=self.coords, dims=self.coord_dims) + extra_dims = set(rhs.dims) - set(self.coord_dims) + if extra_dims: + logger.warning( + f"Constant RHS contains dimensions {extra_dims} not present " + f"in the expression, which might lead to inefficiencies. " + f"Consider collapsing the dimensions by taking min/max." + ) + if rhs.isnull().any(): + check_user_nan() + all_to_lhs = self.sub(rhs, join=join).data + computed_rhs = -all_to_lhs.const + data = assign_multiindex_safe( + all_to_lhs[["coeffs", "vars"]], sign=sign, rhs=computed_rhs + ) + return constraints.Constraint(data, model=self.model) + + # LEGACY: remove at 1.0 — see arithmetics-design/legacy-removal.md. + # Legacy auto-mask path: NaN RHS is silently preserved as "no + # constraint at this row" (the legacy reindex_like-pad fills + # subset coords with NaN, then `sub` would fill them with 0 as + # part of normal arithmetic, so we restore the original NaN mask + # afterward). if isinstance(rhs, SUPPORTED_CONSTANT_TYPES): rhs = as_dataarray(rhs, coords=self.coords, dims=self.coord_dims) @@ -1111,11 +1318,17 @@ def to_constraint( f"in the expression, which might lead to inefficiencies. " f"Consider collapsing the dimensions by taking min/max." ) + # Two legacy paths can introduce NaN into ``rhs`` and the + # ``rhs_nan_mask`` below treats both the same (auto-mask drops + # the row). The fix-hints differ though, so check each cause + # before the reindex obscures them, and warn the right one. + mismatch = first_mismatched_dim(self.const, rhs) + if mismatch is not None: + warn_legacy(_legacy_coord_mismatch_message("constraint RHS", *mismatch)) + if bool(rhs.isnull().any()): + warn_legacy(_legacy_nan_rhs_constraint_message()) rhs = rhs.reindex_like(self.const, fill_value=np.nan) - # Remember where RHS is NaN (meaning "no constraint") before the - # subtraction, which may fill NaN with 0 as part of normal - # expression arithmetic. if isinstance(rhs, DataArray): rhs_nan_mask = rhs.isnull() else: @@ -1124,8 +1337,6 @@ def to_constraint( all_to_lhs = self.sub(rhs, join=join).data computed_rhs = -all_to_lhs.const - # Restore NaN at positions where the original constant RHS had no - # value so that downstream code still treats them as unconstrained. if rhs_nan_mask is not None and rhs_nan_mask.any(): computed_rhs = xr.where(rhs_nan_mask, np.nan, computed_rhs) @@ -1142,12 +1353,21 @@ def reset_const(self: GenericExpression) -> GenericExpression: def isnull(self) -> DataArray: """ - Get a boolean mask with true values where there is only missing values in an expression. - - Returns - ------- - xr.DataArray - """ + Get a boolean mask reporting which slots are absent. + + Under v1 (§3), a slot is absent iff ``const`` is NaN — every + operator propagates const NaN whenever an operand is absent, so + this is the universal signal. A term with ``vars == -1`` is a + *dead term* (contributes nothing), not a slot-level absence + signal; ``fillna(value)`` can revive an absent slot to a present + constant while leaving the dead sentinel term in place. The + legacy convention has no real absence concept (``const`` is + always filled with 0); the historical AND of "all vars + sentinel" and "const NaN" is preserved verbatim under legacy. + """ + if is_v1(): + return self.const.isnull() + # LEGACY: remove at 1.0 — see arithmetics-design/legacy-removal.md. helper_dims = set(self.vars.dims).intersection(HELPER_DIMS) return (self.vars == -1).all(helper_dims) & self.const.isnull() @@ -2384,6 +2604,30 @@ def merge( model = exprs[0].model + data = [e.data if isinstance(e, linopy_types) else e for e in exprs] + data = [fill_missing_coords(ds, fill_helper_dims=True) for ds in data] + + # §11: aux-coord conflict is independent of dim alignment — fires on + # every join path. xr.concat(..., compat="override") silently drops + # the conflicting aux coord, which is the #295 bug v1 closes; we must + # raise (v1) / warn (legacy) before xr.concat sees the data, regardless + # of how the caller resolves the §8 dim mismatch. + enforce_aux_conflict(data) + + # §8: shared *user* dimension coordinates must match exactly across all + # operands. Helper dims (_term, _factor) legitimately differ, so we + # validate user dims separately and keep xr.concat on join="outer" + # (which doesn't enforce "exact" — that's what this check is for). + if join is None: + mismatch = merge_shared_user_coord_mismatch(data, concat_dim=dim) + if is_v1() and mismatch is not None: + raise ValueError(_shared_dim_mismatch_message(*mismatch)) + # LEGACY: remove at 1.0 — warn-on-divergence is the migration signal. + if mismatch is not None: + warn_legacy( + _legacy_coord_mismatch_message(f"merge along dim {dim!r}", *mismatch), + ) + if join is not None: override = join == "override" elif issubclass(cls, linopy_types) and dim in HELPER_DIMS: @@ -2394,9 +2638,6 @@ def merge( else: override = False - data = [e.data if isinstance(e, linopy_types) else e for e in exprs] - data = [fill_missing_coords(ds, fill_helper_dims=True) for ds in data] - if not kwargs: kwargs = { "coords": "minimal", @@ -2417,12 +2658,26 @@ def merge( if dim == TERM_DIM: ds = xr.concat([d[["coeffs", "vars"]] for d in data], dim, **kwargs) subkwargs = {**kwargs, "fill_value": 0} - const = xr.concat([d["const"] for d in data], dim, **subkwargs).sum(TERM_DIM) + # Under v1, §6 requires that an absent slot in any operand stays + # absent in the result; ``sum(skipna=False)`` propagates NaN + # rather than collapsing it to 0. + skipna = not is_v1() + const = xr.concat([d["const"] for d in data], dim, **subkwargs).sum( + TERM_DIM, skipna=skipna + ) ds = assign_multiindex_safe(ds, const=const) elif dim == FACTOR_DIM: ds = xr.concat([d[["vars"]] for d in data], dim, **kwargs) - coeffs = xr.concat([d["coeffs"] for d in data], dim, **kwargs).prod(FACTOR_DIM) - const = xr.concat([d["const"] for d in data], dim, **kwargs).prod(FACTOR_DIM) + # §6 also applies to the quadratic build: an absent factor must + # stay absent (``prod(skipna=False)`` → NaN) rather than collapse + # to multiplicative identity 1. Matches the TERM_DIM branch above. + skipna = not is_v1() + coeffs = xr.concat([d["coeffs"] for d in data], dim, **kwargs).prod( + FACTOR_DIM, skipna=skipna + ) + const = xr.concat([d["const"] for d in data], dim, **kwargs).prod( + FACTOR_DIM, skipna=skipna + ) ds = assign_multiindex_safe(ds, coeffs=coeffs, const=const) else: ds = xr.concat(data, dim, **kwargs) @@ -2430,6 +2685,9 @@ def merge( for d in set(HELPER_DIMS) & set(ds.coords): ds = ds.reset_index(d, drop=True) + if is_v1(): + ds = absorb_absence(ds) + return cls(ds, model) diff --git a/linopy/piecewise.py b/linopy/piecewise.py index ccc265a7..74cc6bef 100644 --- a/linopy/piecewise.py +++ b/linopy/piecewise.py @@ -1716,12 +1716,22 @@ def _add_incremental( ) if n_pieces >= 2: + # ``piece_dim`` is a *positional* relation here: the constraint pairs + # each lower piece with the next-higher one. The two ``isel`` slices + # share the dim but carry different labels (first n-1 vs last n-1 of + # piece_index), which v1 §8 rejects. Relabel the high slice onto the + # low slice's labels so alignment-by-label gives the intended + # positional pairing — convention §10's explicit-positional path. delta_lo = delta_var.isel({piece_dim: slice(None, -1)}, drop=True) - delta_hi = delta_var.isel({piece_dim: slice(1, None)}, drop=True) + delta_hi = delta_var.isel({piece_dim: slice(1, None)}, drop=True).assign_coords( + {piece_dim: delta_lo.coords[piece_dim]} + ) model.add_constraints( delta_hi <= delta_lo, name=f"{name}{PWL_FILL_ORDER_SUFFIX}" ) - binary_hi = binary_var.isel({piece_dim: slice(1, None)}, drop=True) + binary_hi = binary_var.isel( + {piece_dim: slice(1, None)}, drop=True + ).assign_coords({piece_dim: delta_lo.coords[piece_dim]}) model.add_constraints( binary_hi <= delta_lo, name=f"{name}{PWL_BINARY_ORDER_SUFFIX}" ) @@ -1729,7 +1739,10 @@ def _add_incremental( def _incremental_weighted(bp: DataArray) -> LinearExpression: steps = bp.diff(dim).rename({dim: piece_dim}) steps[piece_dim] = piece_index - bp0 = bp.isel({dim: 0}) + # ``drop=True`` keeps the breakpoint coord from sticking around as a + # scalar on ``bp0_term`` — otherwise §11 rejects it as an aux-coord + # conflict against the constraint LHS. + bp0 = bp.isel({dim: 0}, drop=True) bp0_term: DataArray | LinearExpression = bp0 if active is not None: bp0_term = bp0 * active diff --git a/linopy/semantics.py b/linopy/semantics.py new file mode 100644 index 00000000..1ef43cd5 --- /dev/null +++ b/linopy/semantics.py @@ -0,0 +1,407 @@ +""" +v1 semantics helpers. + +Single home for the predicates, validators, and storage-invariant +enforcement that the v1 arithmetic convention requires. Importing from +here keeps ``expressions.py`` focused on the operator dispatch and lets a +future legacy removal be a single-file delete. + +See ``arithmetics-design/convention.md`` for the rules these helpers +implement and ``arithmetics-design/goals.md`` for the design intent. +""" + +from __future__ import annotations + +import os +import sys +from collections.abc import Sequence +from typing import Any +from warnings import warn + +import numpy as np +import pandas as pd +from xarray import DataArray, Dataset + +from linopy.config import ( + V1_SEMANTICS, + LinopySemanticsWarning, + options, +) +from linopy.constants import HELPER_DIMS + + +def _user_nan_message() -> str: + """User-NaN error text — distinguishes the two intents a user might have.""" + return ( + "NaN found in a user-supplied constant. linopy treats this as " + "ambiguous: if you meant a *data error*, fix it with .fillna(value); " + "if you meant *absent at this slot*, mark it on the variable " + "instead (mask=, .where(cond), .reindex(...), .shift(...))." + ) + + +def _shared_dim_mismatch_message(dim: str, left: Any, right: Any) -> str: + """Shared-dim error text — names the dim and shows the disagreeing labels.""" + return ( + f"Coordinate mismatch on shared dimension {dim!r}: " + f"left={_short_repr(left)}, right={_short_repr(right)}. " + "Resolve with `.sel(...)` / `.reindex(...)` to align before " + "combining, with `.assign_coords(...)` to relabel one side " + "(positional alignment, made explicit), with `linopy.align(...)` " + "to pre-align several operands at once, or by passing an explicit " + "`join=` argument to `.add` / `.sub` / `.mul` / `.div` / `.le` / " + "`.ge` / `.eq` (accepts inner / outer / left / right / override)." + ) + + +def _aux_conflict_message(name: str, left: Any, right: Any, kind: str) -> str: + """ + Aux-coord error text — names the coord, the failure mode (shape vs + value), and shows the disagreeing values. + """ + if kind == "shape": + problem = ( + f"Auxiliary coordinate {name!r} has differing shapes across " + f"operands: left.shape={np.shape(left)}, " + f"right.shape={np.shape(right)}. " + ) + else: + problem = ( + f"Auxiliary coordinate {name!r} has conflicting values across " + f"operands: left={_short_repr(left)}, right={_short_repr(right)}. " + ) + return ( + problem + "xarray would silently drop the conflict; linopy raises so the " + f"caller resolves it. Use `.drop_vars({name!r})` to remove the " + f"coord, `.assign_coords({name}=...)` to relabel one side, or " + "`.isel(..., drop=True)` if the coord was introduced by a " + "scalar isel." + ) + + +# --------------------------------------------------------------------------- +# Legacy-deprecation warnings — actionable, per-site (goal #2 in goals.md: +# tell the user *what* will change for the op they just ran). +# Each helper returns the message; ``warn_legacy(msg)`` issues it. +# --------------------------------------------------------------------------- + + +_OPT_IN_HINT = ( + "\n Opt in: linopy.options['semantics'] = 'v1'" + "\n Silence: warnings.filterwarnings('ignore', " + "category=LinopySemanticsWarning)" +) + + +# Per-op opening clause for ``_legacy_nan_constant_message`` — operand +# noun and the historical fill value (`+`/`*` filled with 0; `/` filled +# with 1, a different fill that's worth calling out at the warn site). +_LEGACY_NAN_FILL_CLAUSE = { + "add": ( + "NaN in the constant operand was silently treated as 0 by legacy" + " (additive identity)." + ), + "mul": ( + "NaN in the multiplicative factor was silently treated as 0 by" + " legacy (so the variable was zeroed out at that slot)." + ), + "div": ( + "NaN in the divisor was silently treated as 1 by legacy (a" + " different fill from `+`/`*` which use 0)." + ), +} + + +def _legacy_nan_constant_message(op_kind: str) -> str: + """Legacy NaN-fill warning for `+`/`*`/`/`, keyed by ``op_kind``.""" + return ( + _LEGACY_NAN_FILL_CLAUSE[op_kind] + " Under v1 this raises ValueError." + "\n Resolve: `.fillna(value)` (data error)" + "\n or `mask=` / `.where(cond)` / `.reindex(...)` " + "on the variable (intended absence)." + _OPT_IN_HINT + ) + + +def _legacy_coord_mismatch_message( + context: str, + dim: str | None = None, + left: Any = None, + right: Any = None, +) -> str: + """ + Mismatched dim coords silently aligned (positional or left-join). + + When ``dim`` / ``left`` / ``right`` are given, the message names the + offending dim and shows the diff — same shape as the v1-raise text + so the user sees the same information at warn time as at raise time. + """ + diff = ( + f"\n Dim: {dim!r}: left={_short_repr(left)}, right={_short_repr(right)}" + if dim is not None + else "" + ) + return ( + f"Coordinate mismatch in {context} silently aligned by legacy" + " (positional when sizes match, otherwise left-join)." + " Under v1 this raises ValueError." + + diff + + "\n Resolve: `.sel(...)` / `.reindex(...)` to align" + "\n `.assign_coords(...)` to relabel one side" + "\n `linopy.align(...)` to pre-align several operands" + "\n or pass an explicit `join=` argument." + _OPT_IN_HINT + ) + + +def _legacy_aux_conflict_message(name: str, left: Any, right: Any, kind: str) -> str: + """ + Conflicting aux coord silently dropped by xarray under legacy. + + The diff line names the failure mode (shape vs value) — same shape + as the v1-raise text so the user sees the same information at warn + time as at raise time. + """ + if kind == "shape": + diff = f"\n Shapes: left={np.shape(left)}, right={np.shape(right)}" + else: + diff = f"\n Values: left={_short_repr(left)}, right={_short_repr(right)}" + return ( + f"Auxiliary coordinate {name!r} was conflicting across operands" + " and silently dropped by legacy (xarray's default)." + " Under v1 this raises ValueError." + + diff + + f"\n Resolve: `.drop_vars({name!r})`" + f"\n `.assign_coords({name}=...)` to relabel one side" + "\n or `.isel(..., drop=True)` if a scalar isel " + "introduced it." + _OPT_IN_HINT + ) + + +def _legacy_nan_rhs_constraint_message() -> str: + """Constraint RHS NaN silently kept as 'no constraint at this row'.""" + return ( + "NaN in the constraint RHS was silently kept as 'no constraint" + " at this row' by legacy auto-mask. Under v1 this raises" + " ValueError." + "\n Resolve: `mask=` on the variable for explicit per-row " + "masking" + "\n or `.fillna(value)` if the NaN was a data error." + _OPT_IN_HINT + ) + + +def _legacy_masked_variable_message(name: str) -> str: + """A masked/shifted/reindexed variable used in arithmetic under legacy.""" + return ( + f"Variable {name!r} has absent slots (from `mask=` / `.where()`" + " / `.shift()` / `.reindex()`). Under legacy each absent slot" + " contributes 0 to the resulting expression's terms (so `x + y" + " >= 10` reduces to `x >= 10` there). Under v1 the absence" + " propagates through arithmetic instead (`x + y` becomes absent" + " at the slot and the constraint drops)." + f"\n Resolve: wrap with `{name}.fillna(0)` for the legacy" + " behaviour under v1" + "\n (no fix needed if you only use the variable in a" + " constraint LHS alone — `y >= 0` drops the same way in both)." + _OPT_IN_HINT + ) + + +_LINOPY_ROOT = os.path.dirname(os.path.abspath(__file__)) + + +def warn_legacy(message: str, *, stacklevel: int | None = None) -> None: + """ + Emit a `LinopySemanticsWarning` whose source-frame points at the + first call-stack frame *outside* the linopy package. + + Static ``stacklevel`` doesn't fit here — the call-chain depth from + ``warn_legacy`` to the user's code varies per site (e.g. masked-var + via ``__add__`` is 5 frames deep, via ``Variable.fillna`` is 4). On + Python 3.12+ we use the stdlib ``skip_file_prefixes`` argument + (implemented and tested in CPython); on 3.11 we fall back to a + static ``stacklevel=5``, good enough for the common merge chain. + Pass an explicit ``stacklevel`` to override (e.g. for tests). + """ + if stacklevel is not None: + warn(message, LinopySemanticsWarning, stacklevel=stacklevel) + elif sys.version_info >= (3, 12): + warn( + message, + LinopySemanticsWarning, + skip_file_prefixes=(_LINOPY_ROOT,), + ) + else: + warn(message, LinopySemanticsWarning, stacklevel=5) + + +def _short_repr(values: Any, limit: int = 6) -> str: + """Render an array-like as a short, readable string for error messages.""" + arr = np.asarray(values) + if arr.ndim == 0: + return repr(arr.item()) + flat = arr.ravel() + if flat.size <= limit: + return repr(flat.tolist()) + head = ", ".join(repr(v) for v in flat[:limit].tolist()) + return f"[{head}, ... ({flat.size} total)]" + + +def is_v1() -> bool: + """True iff the current semantics is v1.""" + return options["semantics"] == V1_SEMANTICS + + +def check_user_nan(*, op_kind: str = "add") -> None: + """ + Enforce §5 for a user-supplied constant (scalar or array). + + v1 raises ``ValueError`` with the generic user-NaN message; legacy + warns with operator-specific text (``"add"`` covers +/-, ``"mul"``, + ``"div"`` — they differ in which fill value legacy applied). + """ + if is_v1(): + raise ValueError(_user_nan_message()) + warn_legacy(_legacy_nan_constant_message(op_kind), stacklevel=5) + + +def enforce_aux_conflict(datasets: Sequence[Any], *, stacklevel: int = 5) -> None: + """ + Enforce §11 across the given operands: v1 raises on aux-coord + conflict, legacy warns (xarray would silently drop it). + """ + conflict = conflicting_aux_coord(datasets) + if conflict is None: + return + if is_v1(): + raise ValueError(_aux_conflict_message(*conflict)) + warn_legacy(_legacy_aux_conflict_message(*conflict), stacklevel=stacklevel) + + +def dim_coords_differ(a: DataArray, b: DataArray) -> bool: + """True if a and b share a dimension whose coordinate labels disagree.""" + return first_mismatched_dim(a, b) is not None + + +def first_mismatched_dim(a: DataArray, b: DataArray) -> tuple[str, Any, Any] | None: + """ + Return ``(dim, a_labels, b_labels)`` for the first shared dim that + disagrees on coordinate labels OR size, or ``None`` if all agree. + + Uses ``indexes[dim]`` (the bare pandas Index) rather than + ``coords[dim]`` — a coord DataArray's ``equals`` compares attached + aux coords too, which gives a false positive when only one operand + carries an aux coord on the shared dim (§11's territory, not §8's). + """ + for dim in set(a.dims) & set(b.dims): + if dim in a.indexes and dim in b.indexes: + if not a.indexes[dim].equals(b.indexes[dim]): + return str(dim), a.indexes[dim].values, b.indexes[dim].values + elif a.sizes[dim] != b.sizes[dim]: + return str(dim), None, None + return None + + +def merge_shared_user_coord_mismatch( + datasets: Sequence[Dataset], concat_dim: str +) -> tuple[str, Any, Any] | None: + """ + Find a shared user dim where the operands' labels disagree. + + Returns ``(dim_name, left_labels, right_labels)`` for the first + mismatch found, or ``None`` if all operands agree. Helper dims + (``_term``, ``_factor``) and the concat dim itself are excluded — + those legitimately vary across the operands being merged. Compares + bare dimension indexes (``d.indexes[k]``) so non-dim (auxiliary) + coords are ignored — those are §11's job. + """ + skip = set(HELPER_DIMS) | {concat_dim} + per_ds = [ + {k: d.indexes[k] for k in d.dims if k not in skip and k in d.indexes} + for d in datasets + ] + shared = set.intersection(*(set(p.keys()) for p in per_ds)) if per_ds else set() + for d_name in shared: + ref = per_ds[0][d_name] + for p in per_ds[1:]: + if not ref.equals(p[d_name]): + return str(d_name), ref.values, p[d_name].values + return None + + +def conflicting_aux_coord( + datasets: Sequence[Any], +) -> tuple[str, Any, Any, str] | None: + """ + Find an auxiliary (non-dim) coord that two or more operands carry with + disagreeing values. + + Returns ``(name, left_values, right_values, kind)`` for the first + conflict found, or ``None`` if every shared aux coord agrees. ``kind`` + is ``"shape"`` if the two operands carry differently-shaped values for + the coord (e.g. one is a vector, the other a scalar), or ``"value"`` + if shapes agree but the values themselves disagree. The two failure + modes get different error text downstream. + + Per §11, an aux coord either propagates (values agree across operands) + or surfaces as an error; xarray's default silently drops the conflict + and is what this check intercepts under v1. When only one operand + carries the coord (``len(present) < 2``), it propagates from that + operand unchanged. + """ + if not datasets: + return None + all_names: set[str] = set() + for d in datasets: + all_names.update(d.coords) + for name in all_names: + present = [ + d.coords[name].values + for d in datasets + if name in d.coords and name not in d.dims + ] + # §11 asymmetric-presence: when only one operand carries the coord, + # it propagates unchanged — no conflict to surface. + if len(present) < 2: + continue + ref = present[0] + for vals in present[1:]: + if ref.shape != vals.shape: + return str(name), ref, vals, "shape" + if not _aux_values_equal(ref, vals): + return str(name), ref, vals, "value" + return None + + +def _aux_values_equal(a: np.ndarray, b: np.ndarray) -> bool: + """ + Equality for aux-coord value arrays with NaN-equal-NaN semantics + on every dtype. + + ``np.array_equal(..., equal_nan=True)`` only works on float dtypes + (it calls ``isnan`` which crashes on object/string). Aux coords on + object dtype can embed ``np.nan`` placeholders (e.g. ragged category + labels), and we want two operands with identical NaN placement to + compare equal — pandas' element-equality already treats NaN as + self-equal for object dtypes, so route through ``pd.Series.equals``. + """ + if np.issubdtype(a.dtype, np.floating): + return bool(np.array_equal(a, b, equal_nan=True)) + return bool(pd.Series(a.ravel()).equals(pd.Series(b.ravel()))) + + +def absorb_absence(ds: Dataset) -> Dataset: + """ + Enforce the v1 dead-term invariant on a merged dataset. + + ``const.isnull()`` at a slot ⇒ every term at that slot must have + ``coeffs = NaN`` and ``vars = -1``. After ``merge`` concatenates two + expressions along ``_term``, a slot that's absent in one operand + still carries the *other* operand's valid term in its row; this + helper masks those away so the §1/§2 storage invariant holds. + """ + if "const" not in ds or "coeffs" not in ds or "vars" not in ds: + return ds + mask = ds["const"].isnull() + if not bool(mask.any()): + return ds + coeffs = ds["coeffs"].where(~mask, np.nan) + vars_ = ds["vars"].where(~mask, -1) + return ds.assign(coeffs=coeffs, vars=vars_) diff --git a/linopy/sos_reformulation.py b/linopy/sos_reformulation.py index 1f17ee92..0615f8ed 100644 --- a/linopy/sos_reformulation.py +++ b/linopy/sos_reformulation.py @@ -184,8 +184,13 @@ def reformulate_sos2( added_constraints = [first_name] + # Scalar isel keeps ``sos_dim`` as a leftover non-dim coord whose value + # differs between ``x``/``M`` (indexed at ``n-1``) and ``z`` (indexed at + # ``n-2``). v1 §11 rejects that aux-coord conflict, so we ``drop=True`` + # to remove ``sos_dim`` from the comparison entirely. model.add_constraints( - x_expr.isel({sos_dim: 0}) <= M.isel({sos_dim: 0}) * z_expr.isel({sos_dim: 0}), + x_expr.isel({sos_dim: 0}, drop=True) + <= M.isel({sos_dim: 0}, drop=True) * z_expr.isel({sos_dim: 0}, drop=True), name=first_name, ) @@ -211,8 +216,9 @@ def reformulate_sos2( added_constraints.append(mid_name) model.add_constraints( - x_expr.isel({sos_dim: n - 1}) - <= M.isel({sos_dim: n - 1}) * z_expr.isel({sos_dim: n - 2}), + x_expr.isel({sos_dim: n - 1}, drop=True) + <= M.isel({sos_dim: n - 1}, drop=True) + * z_expr.isel({sos_dim: n - 2}, drop=True), name=last_name, ) added_constraints.extend([last_name, card_name]) diff --git a/linopy/variables.py b/linopy/variables.py index cbf2fb87..47e8de9a 100644 --- a/linopy/variables.py +++ b/linopy/variables.py @@ -62,6 +62,12 @@ SOS_TYPE_ATTR, TERM_DIM, ) +from linopy.semantics import ( + _legacy_masked_variable_message, + check_user_nan, + is_v1, + warn_legacy, +) from linopy.types import ( ConstantLike, DimsLike, @@ -328,11 +334,36 @@ def to_linexpr( Linear expression with the variables and coefficients. """ coefficient = as_dataarray(coefficient, coords=self.coords, dims=self.dims) - coefficient = coefficient.reindex_like(self.labels, fill_value=0) - coefficient = coefficient.fillna(0) + # §5: user-supplied NaN in the coefficient must raise (v1) / warn + # (legacy) — it's the multiplicative analogue of ``x + nan_data`` + # and otherwise enters the expression silently. The default + # coefficient ``1`` carries no NaN, so the check is a no-op there. + if coefficient.isnull().any(): + check_user_nan(op_kind="mul") + if is_v1(): + # Under v1 the LinearExpression must carry absence (NaN at + # `labels == -1`) so §6 propagation through downstream + # arithmetic works. + coefficient = coefficient.reindex_like(self.labels, fill_value=np.nan) + absent = self.labels == -1 + coefficient = coefficient.where(~absent) + else: + # LEGACY: warn if the variable carries absent slots — those + # silently contribute 0 here, but v1 will propagate the + # absence and produce a different result downstream. This is + # the origin of the most common legacy↔v1 divergence (masked + # variables in arithmetic) that no other warn-site catches. + has_absence = bool((self.labels == -1).any()) + if has_absence: + warn_legacy(_legacy_masked_variable_message(self.name)) + coefficient = coefficient.reindex_like(self.labels, fill_value=0) + coefficient = coefficient.fillna(0) ds = Dataset({"coeffs": coefficient, "vars": self.labels}).expand_dims( TERM_DIM, -1 ) + if is_v1(): + const = DataArray(np.where(absent, np.nan, 0.0), coords=self.labels.coords) + ds = ds.assign(const=const) return expressions.LinearExpression(ds, self.model) def __repr__(self) -> str: @@ -414,8 +445,14 @@ def __mul__(self, other: SideLike) -> ExpressionLike: try: if isinstance(other, Variable | ScalarVariable): return self.to_linexpr() * other - - return self.to_linexpr(other) + # Scalars can take the fast path; for arrays / expressions go + # through the LinearExpression operator so semantics-aware + # alignment and NaN checks apply. + if np.isscalar(other): + if isinstance(other, float) and np.isnan(other): + return self.to_linexpr() * other + return self.to_linexpr(other) + return self.to_linexpr() * other except TypeError: return NotImplemented @@ -1143,19 +1180,28 @@ def where( def fillna( self, - fill_value: ScalarVariable | dict[str, str | float | int] | Variable | Dataset, - ) -> Variable: + fill_value: int + | float + | ScalarVariable + | dict[str, str | float | int] + | Variable + | Dataset, + ) -> Variable | expressions.LinearExpression: """ - Fill missing values with a variable. + Fill missing (absent) slots. - This operation call ``xarray.DataArray.fillna`` but ensures preserving - the linopy.Variable type. + A numeric ``fill_value`` substitutes a *constant* for the absent + variable slots, so the result is a :class:`LinearExpression` (a + constant is not a variable). A Variable / ScalarVariable + ``fill_value`` keeps the result a Variable. Parameters ---------- - fill_value : Variable/ScalarVariable - Variable to use for filling. + fill_value : numeric, Variable, or ScalarVariable + Value to fill the absent slots with. """ + if isinstance(fill_value, int | float | np.integer | np.floating): + return self.to_linexpr().fillna(fill_value) return self.where(~self.isnull(), fill_value) def ffill(self, dim: str, limit: None = None) -> Variable: @@ -1267,6 +1313,48 @@ def equals(self, other: Variable) -> bool: shift = varwrap(Dataset.shift, fill_value=_fill_value) + def reindex( + self, + indexers: Mapping[Any, Any] | None = None, + **indexers_kwargs: Any, + ) -> Variable: + """ + Reindex the variable to a new set of coordinates. + + New positions are marked absent (``labels = -1``, + ``lower = upper = NaN``); existing positions are preserved. This + is one of the named mechanisms in convention.md §4 for creating + absence. + """ + return self.__class__( + self.data.reindex(indexers, fill_value=self._fill_value, **indexers_kwargs), + self.model, + self.name, + ) + + def reindex_like( + self, + other: Any, + **kwargs: Any, + ) -> Variable: + """ + Reindex the variable to another object's coordinates. + + New positions are marked absent with the sentinel fill values + (see :meth:`reindex`). + """ + if isinstance(other, DataArray): + ref = other.to_dataset(name="__tmp__") + elif isinstance(other, Dataset): + ref = other + else: + ref = other.data + return self.__class__( + self.data.reindex_like(ref, fill_value=self._fill_value, **kwargs), + self.model, + self.name, + ) + swap_dims = varwrap(Dataset.swap_dims) set_index = varwrap(Dataset.set_index) @@ -1277,7 +1365,10 @@ def equals(self, other: Variable) -> bool: stack = varwrap(Dataset.stack) - unstack = varwrap(Dataset.unstack) + # ``fill_value=_fill_value`` so missing (region, year) combinations end up + # as the absent-slot sentinel (labels=-1, lower=upper=NaN) instead of as + # NaN labels — §2 storage invariant + §4 absence-creation guarantee. + unstack = varwrap(Dataset.unstack, fill_value=_fill_value) iterate_slices = iterate_slices diff --git a/test/conftest.py b/test/conftest.py index 067452d2..6439168f 100644 --- a/test/conftest.py +++ b/test/conftest.py @@ -3,6 +3,8 @@ from __future__ import annotations import os +import warnings +from collections.abc import Generator from typing import TYPE_CHECKING import pandas as pd @@ -11,6 +13,16 @@ if TYPE_CHECKING: from linopy import Model, Variable +# ``linopy`` is intentionally NOT imported at module level — doing so +# loads it from site-packages before pytest's ``--doctest-modules`` +# collection walks the ``linopy/`` source tree, and the resulting +# __file__ mismatch breaks the whole run on Windows CI (and elsewhere). +# Same reasoning as the ``filterwarnings`` comment in ``pyproject.toml``. +# Values mirror ``linopy.config.LEGACY_SEMANTICS`` / ``V1_SEMANTICS``. +_LEGACY_SEMANTICS = "legacy" +_V1_SEMANTICS = "v1" +_VALID_SEMANTICS = {_LEGACY_SEMANTICS, _V1_SEMANTICS} + def pytest_addoption(parser: pytest.Parser) -> None: """Add custom command line options.""" @@ -25,6 +37,10 @@ def pytest_addoption(parser: pytest.Parser) -> None: def pytest_configure(config: pytest.Config) -> None: """Configure pytest with custom markers and behavior.""" config.addinivalue_line("markers", "gpu: marks tests as requiring GPU hardware") + for sem in sorted(_VALID_SEMANTICS): + config.addinivalue_line( + "markers", f"{sem}: run this test only under the {sem} semantics" + ) # Set environment variable so test modules can check if GPU tests are enabled # This is needed because parametrize happens at import time @@ -63,6 +79,35 @@ def pytest_collection_modifyitems( item.add_marker(pytest.mark.gpu) +@pytest.fixture(autouse=True, params=[_LEGACY_SEMANTICS, _V1_SEMANTICS]) +def semantics(request: pytest.FixtureRequest) -> Generator[str, None, None]: + """ + Run every test under both arithmetic semantics by default. + + A test marked with a semantics name (``@pytest.mark.legacy`` or + ``@pytest.mark.v1``) runs only under that semantics. Under ``legacy``, + ``LinopySemanticsWarning`` is suppressed so test output stays clean; + ``test_convention.py`` verifies the warnings are actually emitted. + """ + # Deferred import (see top-of-file comment). + from linopy.config import LinopySemanticsWarning, options + + item = request.node + for sem in _VALID_SEMANTICS: + if item.get_closest_marker(sem) and request.param != sem: + pytest.skip(f"{sem}-only test") + + old = options["semantics"] + options["semantics"] = request.param + if request.param == _LEGACY_SEMANTICS: + with warnings.catch_warnings(): + warnings.simplefilter("ignore", LinopySemanticsWarning) + yield request.param + else: + yield request.param + options["semantics"] = old + + @pytest.fixture def m() -> Model: from linopy import Model diff --git a/test/test_constraints.py b/test/test_constraints.py index 1667bfec..0fc4ba74 100644 --- a/test/test_constraints.py +++ b/test/test_constraints.py @@ -346,7 +346,13 @@ def test_sanitize_infinities() -> None: m.add_constraints(y <= -np.inf, name="con_wrong_neg_inf") +@pytest.mark.legacy class TestConstraintCoordinateAlignment: + """ + Constraint-side counterpart of TestCoordinateAlignment — legacy-only; + v1 raises on subset/superset/NaN-RHS (see convention.md §5/§8/§12). + """ + @pytest.fixture(params=["xarray", "pandas_series"], ids=["da", "series"]) def subset(self, request: Any) -> xr.DataArray | pd.Series: if request.param == "xarray": diff --git a/test/test_convention.py b/test/test_convention.py new file mode 100644 index 00000000..b4f2c3e5 --- /dev/null +++ b/test/test_convention.py @@ -0,0 +1,29 @@ +"""Tests for the v1 semantics option and the test harness.""" + +from __future__ import annotations + +import pytest + +import linopy + + +class TestSemanticsOption: + def test_default_is_legacy(self) -> None: + linopy.options.reset() + assert linopy.options["semantics"] == "legacy" + + def test_invalid_value_raises(self) -> None: + with pytest.raises(ValueError, match="Invalid semantics"): + linopy.options["semantics"] = "exact" + + +class TestHarness: + """The autouse ``semantics`` fixture and the legacy / v1 markers.""" + + @pytest.mark.legacy + def test_legacy_marker(self) -> None: + assert linopy.options["semantics"] == "legacy" + + @pytest.mark.v1 + def test_v1_marker(self) -> None: + assert linopy.options["semantics"] == "v1" diff --git a/test/test_legacy_violations.py b/test/test_legacy_violations.py new file mode 100644 index 00000000..82759ebb --- /dev/null +++ b/test/test_legacy_violations.py @@ -0,0 +1,1700 @@ +""" +Legacy convention violations and v1 fixes. + +Pairs ``@pytest.mark.legacy`` tests that document the surprising legacy +behaviour against ``@pytest.mark.v1`` tests that pin the v1 fix. Each +class corresponds to a section of ``arithmetics-design/convention.md`` +and to one or more linopy bug reports. + +Slice A — constant operand path (§5, §8, §9): + §8 Shared dimensions must match exactly → #708 / #586 / #550 + §5 User-supplied NaN raises → #713 / PyPSA #1683 + §9 Non-shared dimensions broadcast → (positive regression guard) + +Slice B — expression-OP-expression / variable-OP-variable (§8 via `merge`): + §8 Shared dimensions must match exactly → #708 / #570 (expr+expr branch) + +Slice C — absence propagation (§3, §6, §7): + §6 Absence propagates through every operator → #712 (absent-as-zero) + §3 isnull() is the unifying predicate → #711 + §7 fillna()/.where() resolve absence → (positive coverage) + +Slice D — Variable.reindex / .reindex_like (§4 absence creation): + §4 Reindexing extends coords and marks new slots absent + +Slice E — named-method join= + constraint RHS (§10, §12): + §10 .add/.sub/.mul/.div/.le/.ge/.eq accept explicit join= + §12 NaN in constraint RHS raises (v1) → PyPSA #1683 + §12 Coord mismatch in RHS raises (v1) → #707 + +Slice G — reductions skip absent slots (§13): + §13 sum / groupby.sum skip absent, sum of none is the zero expression + +Slice F — auxiliary-coordinate conflicts (§11): + §11 Non-dim coord conflict raises (v1) → #295 + §11 Non-conflicting aux coords propagate through arithmetic +""" + +from __future__ import annotations + +import operator +import warnings +from collections.abc import Generator + +import numpy as np +import pandas as pd +import pytest +import xarray as xr + +from linopy import Model +from linopy.config import LinopySemanticsWarning +from linopy.variables import Variable + + +@pytest.fixture +def m() -> Model: + return Model() + + +@pytest.fixture +def time() -> pd.RangeIndex: + return pd.RangeIndex(5, name="time") + + +@pytest.fixture +def x(m: Model, time: pd.RangeIndex) -> Variable: + return m.add_variables(lower=0, coords=[time], name="x") + + +@pytest.fixture +def unsilenced() -> Generator[None, None, None]: + """Drop the autouse fixture's LinopySemanticsWarning filter for one test.""" + with warnings.catch_warnings(): + warnings.simplefilter("always", LinopySemanticsWarning) + yield + + +# ===================================================================== +# §8 — Shared dimensions must match exactly (constant operand) +# ===================================================================== + + +_OPS = { + "add": operator.add, + "sub": operator.sub, + "mul": operator.mul, + "div": operator.truediv, +} + + +class TestExactAlignmentConstant: + @pytest.mark.v1 + @pytest.mark.parametrize("op", ["add", "sub", "mul", "div"]) + def test_same_size_different_labels_raises(self, x: Variable, op: str) -> None: + """ + #708 / #550 — same shape, different labels: legacy aligns by + position; v1 raises. Holds for every binary operator. + """ + other = xr.DataArray( + [1.0, 2.0, 3.0, 4.0, 5.0], + dims=["time"], + coords={"time": pd.Index([10, 11, 12, 13, 14], name="time")}, + ) + with pytest.raises(ValueError, match="Coordinate mismatch on shared dimension"): + _OPS[op](x, other) + + @pytest.mark.v1 + @pytest.mark.parametrize("op", ["add", "sub", "mul", "div"]) + def test_subset_constant_raises(self, x: Variable, op: str) -> None: + """ + #711 / #708 — constant covers only some of the variable's + coords. Legacy left-joins (silently drops the gap); v1 raises. + """ + subset = xr.DataArray( + [10.0, 20.0], dims=["time"], coords={"time": pd.Index([1, 3], name="time")} + ) + with pytest.raises(ValueError, match="Coordinate mismatch on shared dimension"): + _OPS[op](x, subset) + + @pytest.mark.legacy + def test_add_same_size_different_labels_silent(self, x: Variable) -> None: + """Document the legacy behaviour: silent positional alignment.""" + other = xr.DataArray( + [1.0, 2.0, 3.0, 4.0, 5.0], + dims=["time"], + coords={"time": pd.Index([10, 11, 12, 13, 14], name="time")}, + ) + # Legacy keeps left coords; the user's intended pairing by label is lost. + result = x + other + assert list(result.coords["time"].values) == [0, 1, 2, 3, 4] + assert result.const.values.tolist() == [1.0, 2.0, 3.0, 4.0, 5.0] + + @pytest.mark.legacy + def test_add_subset_constant_silent(self, x: Variable) -> None: + """Document the legacy behaviour: silent left-join (gaps → 0).""" + subset = xr.DataArray( + [10.0, 20.0], dims=["time"], coords={"time": pd.Index([1, 3], name="time")} + ) + result = x + subset + # Legacy reindex_like fills the missing positions with 0 (additive fill). + assert result.const.sel(time=0).item() == 0.0 + assert result.const.sel(time=1).item() == 10.0 + assert result.const.sel(time=3).item() == 20.0 + + +class TestBroadcastNonSharedDim: + """ + §9 — a dimension that exists only in one operand broadcasts freely. + Runs under both semantics: this is unchanged behaviour. + """ + + def test_add_broadcast_introduces_new_dim(self, x: Variable) -> None: + bcast = xr.DataArray( + [10.0, 20.0], dims=["scenario"], coords={"scenario": [0, 1]} + ) + result = x + bcast + assert set(result.const.dims) == {"time", "scenario"} + assert result.const.sizes == {"time": 5, "scenario": 2} + + def test_mul_broadcast_introduces_new_dim(self, x: Variable) -> None: + bcast = xr.DataArray([2.0, 3.0], dims=["scenario"], coords={"scenario": [0, 1]}) + result = x * bcast + assert set(result.coeffs.dims) == {"time", "scenario", "_term"} + + +# ===================================================================== +# §5 — User-supplied NaN raises (covers #713 and PyPSA #1683) +# ===================================================================== + + +class TestUserNaNRaises: + @pytest.mark.v1 + @pytest.mark.parametrize("op", ["add", "sub", "mul", "div"]) + def test_nan_dataarray_raises( + self, x: Variable, time: pd.RangeIndex, op: str + ) -> None: + # Use [2, NaN, 3, 4, 5] so div doesn't trip on a 0 divisor at slot 0. + nan_data = xr.DataArray( + [2.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + with pytest.raises(ValueError, match="NaN"): + _OPS[op](x, nan_data) + + @pytest.mark.v1 + @pytest.mark.parametrize("op", ["add", "sub", "mul", "div"]) + def test_nan_scalar_raises(self, x: Variable, op: str) -> None: + with pytest.raises(ValueError, match="NaN"): + _OPS[op](x, float("nan")) + + @pytest.mark.v1 + def test_pypsa_1683_inf_times_zero_raises( + self, x: Variable, time: pd.RangeIndex + ) -> None: + """ + PyPSA #1683 — ``min_pu * nominal_fix`` with ``p_nom=inf`` and + ``p_min_pu=0`` yields a NaN bound. v1 surfaces this at construction, + not as a downstream solve failure. + """ + nominal_fix = xr.DataArray( + [np.inf, np.inf, np.inf, np.inf, np.inf], + dims=["time"], + coords={"time": time}, + ) + min_pu = xr.DataArray( + [1.0, 0.0, 1.0, 1.0, 1.0], dims=["time"], coords={"time": time} + ) + bound = min_pu * nominal_fix # 0 * inf = NaN at time=1 + assert np.isnan(bound.values[1]) + with pytest.raises(ValueError, match="NaN"): + x * bound + + @pytest.mark.v1 + def test_to_linexpr_coefficient_nan_raises( + self, x: Variable, time: pd.RangeIndex + ) -> None: + """ + §5 on the direct ``Variable.to_linexpr(coefficient)`` entry — + callers can construct an expression bypassing the operator + overloads. NaN in the explicit coefficient is still user data + and must raise (otherwise the NaN flows into ``coeffs`` and + §6 silently propagates absence from what was actually a + data error). + """ + nan_coeff = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + with pytest.raises(ValueError, match="NaN"): + x.to_linexpr(nan_coeff) + + @pytest.mark.legacy + def test_add_nan_dataarray_silently_fills_with_zero( + self, x: Variable, time: pd.RangeIndex + ) -> None: + """Document legacy: NaN in addend silently becomes 0 (#713).""" + nan_data = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + result = x + nan_data + assert result.const.sel(time=1).item() == 0.0 # NaN → 0 + + @pytest.mark.legacy + def test_mul_nan_dataarray_silently_fills_with_zero( + self, x: Variable, time: pd.RangeIndex + ) -> None: + """ + Document legacy: NaN in multiplier silently becomes 0 — variable + zeroed out at that slot (#713). + """ + nan_data = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + result = x * nan_data + assert result.coeffs.squeeze().sel(time=1).item() == 0.0 + + +# ===================================================================== +# Legacy emits LinopySemanticsWarning where v1 would diverge +# ===================================================================== + + +def _one_legacy_warning(*ops) -> str: # type: ignore[no-untyped-def] + """ + Run ``ops`` (a series of callables) under fresh warning capture + and return the first ``LinopySemanticsWarning``'s text. Test + helper — keeps each test focused on the message, not the plumbing. + """ + with warnings.catch_warnings(record=True) as ws: + warnings.simplefilter("always") + for op in ops: + op() + legacy = [w for w in ws if issubclass(w.category, LinopySemanticsWarning)] + assert legacy, "expected at least one LinopySemanticsWarning" + return str(legacy[0].message) + + +# Common tail shared by every legacy warning — separated so each +# expected-message test can focus on the part that's specific to the +# rule, without 4 lines of boilerplate per test. +_OPT_IN_HINT = ( + "\n Opt in: linopy.options['semantics'] = 'v1'" + "\n Silence: warnings.filterwarnings('ignore', " + "category=LinopySemanticsWarning)" +) + + +class TestLegacyWarning: + """ + Asserts the *full text* of each legacy warning. The point: the + test reads like a spec — a reviewer judges the message's helpfulness + by reading the test, and any change to a message surfaces as a diff + in the test. Goal #2 (actionable warnings) lives or dies here. + """ + + @pytest.mark.legacy + def test_coord_mismatch_const_operand_same_size( + self, x: Variable, unsilenced: None + ) -> None: + other = xr.DataArray( + [1.0, 2.0, 3.0, 4.0, 5.0], + dims=["time"], + coords={"time": pd.Index([10, 11, 12, 13, 14], name="time")}, + ) + msg = _one_legacy_warning(lambda: x + other) + assert msg == ( + "Coordinate mismatch in this operator's constant operand " + "silently aligned by legacy (positional when sizes match, " + "otherwise left-join). Under v1 this raises ValueError." + "\n Dim: 'time': left=[0, 1, 2, 3, 4], " + "right=[10, 11, 12, 13, 14]" + "\n Resolve: `.sel(...)` / `.reindex(...)` to align" + "\n `.assign_coords(...)` to relabel one side" + "\n `linopy.align(...)` to pre-align several operands" + "\n or pass an explicit `join=` argument." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_coord_mismatch_const_operand_subset( + self, x: Variable, unsilenced: None + ) -> None: + subset = xr.DataArray( + [10.0, 20.0], dims=["time"], coords={"time": pd.Index([1, 3], name="time")} + ) + msg = _one_legacy_warning(lambda: x + subset) + assert msg == ( + "Coordinate mismatch in this operator's constant operand " + "silently aligned by legacy (positional when sizes match, " + "otherwise left-join). Under v1 this raises ValueError." + "\n Dim: 'time': left=[0, 1, 2, 3, 4], right=[1, 3]" + "\n Resolve: `.sel(...)` / `.reindex(...)` to align" + "\n `.assign_coords(...)` to relabel one side" + "\n `linopy.align(...)` to pre-align several operands" + "\n or pass an explicit `join=` argument." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_coord_mismatch_merge_path( + self, m: Model, time: pd.RangeIndex, unsilenced: None + ) -> None: + x_local = m.add_variables(lower=0, coords=[time], name="x_local") + other = m.add_variables( + lower=0, coords=[pd.Index([10, 11, 12, 13, 14], name="time")], name="other" + ) + msg = _one_legacy_warning(lambda: x_local + other) + assert msg == ( + "Coordinate mismatch in merge along dim '_term' silently " + "aligned by legacy (positional when sizes match, otherwise " + "left-join). Under v1 this raises ValueError." + "\n Dim: 'time': left=[0, 1, 2, 3, 4], " + "right=[10, 11, 12, 13, 14]" + "\n Resolve: `.sel(...)` / `.reindex(...)` to align" + "\n `.assign_coords(...)` to relabel one side" + "\n `linopy.align(...)` to pre-align several operands" + "\n or pass an explicit `join=` argument." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_nan_addend( + self, x: Variable, time: pd.RangeIndex, unsilenced: None + ) -> None: + nan_data = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + msg = _one_legacy_warning(lambda: x + nan_data) + assert msg == ( + "NaN in the constant operand was silently treated as 0 by " + "legacy (additive identity). Under v1 this raises ValueError." + "\n Resolve: `.fillna(value)` (data error)" + "\n or `mask=` / `.where(cond)` / `.reindex(...)` " + "on the variable (intended absence)." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_nan_multiplier( + self, x: Variable, time: pd.RangeIndex, unsilenced: None + ) -> None: + nan_factor = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + msg = _one_legacy_warning(lambda: x * nan_factor) + assert msg == ( + "NaN in the multiplicative factor was silently treated as 0 " + "by legacy (so the variable was zeroed out at that slot). " + "Under v1 this raises ValueError." + "\n Resolve: `.fillna(value)` (data error)" + "\n or `mask=` / `.where(cond)` / `.reindex(...)` " + "on the variable (intended absence)." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_nan_divisor( + self, x: Variable, time: pd.RangeIndex, unsilenced: None + ) -> None: + nan_divisor = xr.DataArray( + [2.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + msg = _one_legacy_warning(lambda: x / nan_divisor) + assert msg == ( + "NaN in the divisor was silently treated as 1 by legacy (a " + "different fill from `+`/`*` which use 0). Under v1 this " + "raises ValueError." + "\n Resolve: `.fillna(value)` (data error)" + "\n or `mask=` / `.where(cond)` / `.reindex(...)` " + "on the variable (intended absence)." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_aux_conflict(self, m: Model, unsilenced: None) -> None: + A = pd.Index([1, 2, 3], name="A") + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + msg = _one_legacy_warning(lambda: v + const) + assert msg == ( + "Auxiliary coordinate 'B' was conflicting across operands " + "and silently dropped by legacy (xarray's default). Under v1 " + "this raises ValueError." + "\n Values: left=[311, 311, 322], right=[400, 400, 500]" + "\n Resolve: `.drop_vars('B')`" + "\n `.assign_coords(B=...)` to relabel one side" + "\n or `.isel(..., drop=True)` if a scalar isel " + "introduced it." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_nan_constraint_rhs( + self, x: Variable, time: pd.RangeIndex, unsilenced: None + ) -> None: + nan_rhs = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + msg = _one_legacy_warning(lambda: x <= nan_rhs) + assert msg == ( + "NaN in the constraint RHS was silently kept as 'no " + "constraint at this row' by legacy auto-mask. Under v1 this " + "raises ValueError." + "\n Resolve: `mask=` on the variable for explicit per-row " + "masking" + "\n or `.fillna(value)` if the NaN was a data error." + + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_masked_variable_in_arithmetic( + self, time: pd.RangeIndex, unsilenced: None + ) -> None: + """ + The masked-variable warning is the one that catches the + ``2 * x + y`` (no fillna) divergence — no other site fires for + this case. Message names the variable and the fillna(0) fix. + """ + m = Model() + x = m.add_variables(lower=0, coords=[time], name="x") + mask = xr.DataArray( + [True, True, True, True, False], dims=["time"], coords={"time": time} + ) + m.add_variables(lower=0, coords=[time], name="y", mask=mask) + y = m.variables["y"] + msg = _one_legacy_warning(lambda: 2 * x + y) + assert msg == ( + "Variable 'y' has absent slots (from `mask=` / `.where()` / " + "`.shift()` / `.reindex()`). Under legacy each absent slot " + "contributes 0 to the resulting expression's terms (so `x + " + "y >= 10` reduces to `x >= 10` there). Under v1 the absence " + "propagates through arithmetic instead (`x + y` becomes " + "absent at the slot and the constraint drops)." + "\n Resolve: wrap with `y.fillna(0)` for the legacy " + "behaviour under v1" + "\n (no fix needed if you only use the variable " + "in a constraint LHS alone — `y >= 0` drops the same way in " + "both)." + _OPT_IN_HINT + ) + + @pytest.mark.legacy + def test_warning_stacklevel_points_to_user_call( + self, time: pd.RangeIndex, unsilenced: None + ) -> None: + """ + The warning's source frame must be the user's call site, not + a linopy internal — IDE jump-to-source on the warning depends + on it, and the rollout-warning is useless if it points at + ``linopy/expressions.py`` instead of the user's source. + """ + m = Model() + x = m.add_variables(lower=0, coords=[time], name="x") + mask = xr.DataArray( + [True, True, True, True, False], dims=["time"], coords={"time": time} + ) + y = m.add_variables(lower=0, coords=[time], name="y", mask=mask) + with warnings.catch_warnings(record=True) as ws: + warnings.simplefilter("always") + _ = 2 * x + y # this is the user's call site + relevant = [w for w in ws if issubclass(w.category, LinopySemanticsWarning)] + assert relevant, "expected the masked-variable warning to fire" + assert relevant[0].filename == __file__, ( + f"warning frame is {relevant[0].filename!r}, " + "should be the user's source file" + ) + + +# ===================================================================== +# §8 — Shared dimensions must match exactly (expr+expr / var+var, merge path) +# ===================================================================== + + +class TestExactAlignmentMerge: + @pytest.fixture + def x_other(self, m: Model) -> Variable: + # Same shape, different labels — legacy uses positional override. + return m.add_variables( + lower=0, + coords=[pd.Index([10, 11, 12, 13, 14], name="time")], + name="x_other", + ) + + @pytest.fixture + def x_subset(self, m: Model) -> Variable: + # Subset coords on the same dim — legacy outer-joins (and pads). + return m.add_variables( + lower=0, + coords=[pd.Index([1, 3], name="time")], + name="x_subset", + ) + + @pytest.mark.v1 + def test_var_plus_var_different_labels_raises( + self, x: Variable, x_other: Variable + ) -> None: + with pytest.raises(ValueError, match="Coordinate mismatch"): + x + x_other + + @pytest.mark.v1 + def test_expr_plus_expr_different_labels_raises( + self, x: Variable, x_other: Variable + ) -> None: + with pytest.raises(ValueError, match="Coordinate mismatch"): + (1 * x) + (1 * x_other) + + @pytest.mark.v1 + def test_var_plus_var_subset_raises(self, x: Variable, x_subset: Variable) -> None: + with pytest.raises(ValueError, match="Coordinate mismatch"): + x + x_subset + + @pytest.mark.v1 + def test_var_minus_var_different_labels_raises( + self, x: Variable, x_other: Variable + ) -> None: + with pytest.raises(ValueError, match="Coordinate mismatch"): + x - x_other + + @pytest.mark.v1 + def test_var_plus_var_same_coords_works( + self, m: Model, time: pd.RangeIndex + ) -> None: + """Same coords on a shared dim is fine — regression guard.""" + a = m.add_variables(lower=0, coords=[time], name="a") + b = m.add_variables(lower=0, coords=[time], name="b") + result = a + b + assert result.sizes["time"] == 5 + + @pytest.mark.v1 + def test_var_plus_var_broadcast_non_shared_dim_works( + self, m: Model, time: pd.RangeIndex + ) -> None: + """§9 regression guard for the merge path: non-shared dims broadcast.""" + a = m.add_variables(lower=0, coords=[time], name="a") + b = m.add_variables( + lower=0, coords=[pd.Index([0, 1], name="scenario")], name="b" + ) + result = a + b + assert set(result.coord_dims) == {"time", "scenario"} + + @pytest.mark.legacy + def test_var_plus_var_different_labels_silent( + self, x: Variable, x_other: Variable + ) -> None: + """ + Document legacy: same-shape var+var aligns by position via + override; the right-hand labels are silently dropped. + """ + result = x + x_other + # Left wins via override → time coords are x's [0..4], even though + # x_other was time=[10..14]. The two terms are paired by position. + assert list(result.coords["time"].values) == [0, 1, 2, 3, 4] + + @pytest.mark.legacy + def test_warn_on_var_plus_var_different_labels( + self, x: Variable, x_other: Variable, unsilenced: None + ) -> None: + with pytest.warns(LinopySemanticsWarning, match="merge along dim"): + x + x_other + + +# ===================================================================== +# §6 — Absence propagates through every operator +# §3 — isnull() reports absent slots (covers #712 absent-as-zero) +# ===================================================================== + + +class TestAbsencePropagation: + @pytest.fixture + def xs(self, x: Variable) -> Variable: + # x.shift(time=1) → absent at time=0, present elsewhere. + return x.shift(time=1) + + @pytest.mark.v1 + def test_to_linexpr_marks_absent_with_nan_const(self, xs: Variable) -> None: + """ + Variable.to_linexpr() encodes absence as NaN const + NaN + coeff + vars=-1, so §6 has something to propagate. + """ + expr = xs.to_linexpr() + assert np.isnan(expr.const.values[0]) + assert np.isnan(expr.coeffs.values[0, 0]) + assert int(expr.vars.values[0, 0]) == -1 + assert not np.isnan(expr.const.values[1:]).any() + + @pytest.mark.v1 + def test_isnull_reports_absent_slot(self, xs: Variable) -> None: + """§3: isnull() reports the absent slot on a LinearExpression.""" + expr = xs.to_linexpr() + assert bool(expr.isnull().values[0]) + assert not bool(expr.isnull().values[1:].any()) + + @pytest.mark.v1 + @pytest.mark.parametrize("op", ["add", "sub", "mul", "div"]) + def test_scalar_op_preserves_absence(self, xs: Variable, op: str) -> None: + """ + #712 — `shifted OP scalar` stays absent at the shifted slot. + Holds for every binary operator: const and coeffs both NaN. + """ + result = _OPS[op](xs, 3) + assert np.isnan(result.const.values[0]) + assert np.isnan(result.coeffs.values[0, 0]) + assert bool(result.isnull().values[0]) + # And the present slots carry the expected per-op value. + expected_const = {"add": 3.0, "sub": -3.0, "mul": 0.0, "div": 0.0}[op] + assert (result.const.values[1:] == expected_const).all() + + @pytest.mark.v1 + def test_add_present_variable_propagates_absence( + self, xs: Variable, x: Variable + ) -> None: + """`x + xs` is absent wherever xs is, even though x is fine there.""" + result = xs + x + assert np.isnan(result.const.values[0]) + assert bool(result.isnull().values[0]) + assert not bool(result.isnull().values[1:].any()) + + @pytest.mark.v1 + def test_merge_absorbs_dead_terms_at_absent_slot( + self, xs: Variable, x: Variable + ) -> None: + """ + §1/§2 storage invariant — ``const.isnull()`` at a slot implies + every term at that slot has ``coeffs = NaN`` and ``vars = -1``. + ``xs + x`` merges xs's absent slot with x's live term; the live + term must be absorbed, not silently kept alongside a NaN const. + Regression guard for ``_absorb_absence`` (commit 4d87a05). + """ + result = xs + x + assert np.isnan(result.coeffs.values[0]).all() + assert (result.vars.values[0] == -1).all() + + @pytest.mark.v1 + def test_merge_absorbs_dead_terms_multi_operand( + self, m: Model, time: pd.RangeIndex + ) -> None: + """ + Same invariant on a 3-operand merge: a regression that absorbs + only on the binary path would still leave one live term at the + absent slot here. + """ + x = m.add_variables(lower=0, coords=[time], name="x") + y = m.add_variables(lower=0, coords=[time], name="y") + xs = x.shift(time=1) + result = (1 * x) + (1 * y) + xs + assert np.isnan(result.coeffs.values[0]).all() + assert (result.vars.values[0] == -1).all() + # And the present rows still carry all three live terms. + assert (~np.isnan(result.coeffs.values[1:])).all() + + @pytest.mark.v1 + def test_absent_distinguishable_from_zero(self, x: Variable, xs: Variable) -> None: + """ + #712 — under v1, ``x.shift(time=1) * 3`` and ``x * 0`` are + distinct: the first is absent, the second is a present zero. + """ + absent = xs * 3 + zero = x * 0 + assert bool(absent.isnull().values[0]) + assert not bool(zero.isnull().values[0]) + + @pytest.mark.v1 + @pytest.mark.parametrize( + "build", + [ + "var_mul_var", + "var_pow_2", + "expr_mul_var", + "expr_mul_expr", + "quad_plus_linexpr", + "quad_times_scalar", + ], + ) + def test_quadratic_absence_propagates( + self, xs: Variable, x: Variable, build: str + ) -> None: + """ + §6 on the quadratic build paths — every entry point that ends + in a QuadraticExpression must keep an absent factor absent. + + Regression for ``prod(skipna=True)`` on the FACTOR_DIM branch + and the cross-term ``self.const * other.reset_const()`` path, + plus the downstream operators on the resulting quadratic. + """ + builders = { + "var_mul_var": lambda: xs * x, + "var_pow_2": lambda: xs**2, + "expr_mul_var": lambda: (1 * xs) * x, + "expr_mul_expr": lambda: (1 * xs) * (1 * x), + "quad_plus_linexpr": lambda: (xs * x) + (2 * x), + "quad_times_scalar": lambda: (xs * x) * 3, + } + quad = builders[build]() + # absent slot stays absent in the resulting quadratic + assert bool(quad.isnull().values[0]) + # and §1/§2: every term at the absent slot has coeffs NaN and vars -1. + assert np.isnan(quad.coeffs.values[0]).all() + assert (quad.vars.values[0] == -1).all() + # And the present slots stay present (cross-term storage may carry + # vars=-1 as the "no second factor" sentinel inside a term — that's + # not absence, so check the slot-level isnull predicate, not vars). + assert not bool(quad.isnull().values[1:].any()) + + @pytest.mark.legacy + def test_legacy_collapses_absent_to_zero(self, xs: Variable) -> None: + """ + Document the #712 bug: legacy treats absent as 0 after `* 3`. + + The term ends up as ``coeffs=3 * vars=-1 + const=0`` — a + ``coeff*sentinel`` term that evaluates to 0 at the solver layer. + There is no NaN signal anywhere, so ``isnull()`` returns False and + downstream code can't tell ``xs * 3`` apart from ``x * 0``. + """ + result = xs * 3 + assert not np.isnan(result.const.values[0]) + assert not np.isnan(result.coeffs.values[0, 0]) + assert not bool(result.isnull().values[0]) + + +class TestFillnaResolves: + """§7 — fillna()/.where() are how the caller resolves an absent slot.""" + + @pytest.fixture + def xs(self, x: Variable) -> Variable: + return x.shift(time=1) + + @pytest.mark.v1 + def test_expr_fillna_replaces_absent_const(self, xs: Variable) -> None: + result = xs.to_linexpr().fillna(42) + assert result.const.values[0] == 42.0 + assert result.const.values[1:].tolist() == [0.0, 0.0, 0.0, 0.0] + assert not bool(result.isnull().values.any()) + + @pytest.mark.v1 + def test_variable_fillna_numeric_returns_expression(self, xs: Variable) -> None: + """ + A constant fill is not a variable, so the return type is a + LinearExpression. + """ + from linopy import LinearExpression + + result = xs.fillna(42) + assert isinstance(result, LinearExpression) + assert result.const.values[0] == 42.0 + + @pytest.mark.v1 + def test_variable_fillna_zero_revives_slot_as_present_zero( + self, xs: Variable + ) -> None: + from linopy import LinearExpression + + result = xs.fillna(0) + assert isinstance(result, LinearExpression) # numeric fill → expression + assert not bool(result.isnull().values[0]) + assert result.const.values[0] == 0.0 + + @pytest.mark.v1 + def test_outer_fillna_then_add_collapses_to_just_added( + self, m: Model, time: pd.RangeIndex + ) -> None: + """ + Interpretation A — once `(x + y.shift())` is absent at slot 0, + ``.fillna(0)`` revives the slot as the constant 0 (dead terms + stay dead), and a subsequent ``+ x`` re-introduces only ``x[0]``. + Compare ``x + y.shift().fillna(0) + x`` which would double-count + ``x`` at slot 0 — the placement of fillna is load-bearing. + """ + x = m.add_variables(lower=0, coords=[time], name="x") + y = m.add_variables(lower=0, coords=[time], name="y") + expr = (x + y.shift(time=1)).fillna(0) + x + + # At slot 0 the only live term is 1·x[0]; const is 0 → result == x[0]. + coeffs0 = expr.coeffs.values[0] + vars0 = expr.vars.values[0] + live = ~np.isnan(coeffs0) + assert int(live.sum()) == 1 + assert float(coeffs0[live][0]) == 1.0 + assert int(vars0[live][0]) == int(x.labels.values[0]) + assert float(expr.const.values[0]) == 0.0 + + # At slots 1+ all three terms are live (x[i] + y[i-1] + x[i]) — the + # outer ``+ x`` is genuinely additive where y.shift was present. + assert int((~np.isnan(expr.coeffs.values[1])).sum()) == 3 + + @pytest.mark.v1 + def test_masked_variable_constraint_via_fillna(self) -> None: + """ + v1 counterpart of ``test_masked_variable_model`` — under §6 the + constraint ``x + y >= 10`` drops at the masked y slots, so the + caller must say ``y.fillna(0)`` to keep ``x >= 10`` there. + """ + m = Model() + lower = pd.Series(0, range(10)) + x = m.add_variables(lower, name="x") + mask = pd.Series([True] * 8 + [False, False]) + y = m.add_variables(lower, name="y", mask=mask) + m.add_constraints(x + y.fillna(0), ">=", 10) + m.add_constraints(y, ">=", 0) + m.add_objective(2 * x + y) + + # The constraint x + y.fillna(0) >= 10 binds at every slot. + rhs = m.constraints["con0"].rhs.values + assert not np.isnan(rhs).any() + + +# ===================================================================== +# §4 — Variable.reindex / .reindex_like create absence +# ===================================================================== + + +class TestVariableReindex: + """ + Reindexing past the original coords marks the new positions + absent (labels=-1, lower/upper=NaN); §4 lists this as one of the + named mechanisms for creating absence. Runs under both semantics: + this is a new API that didn't exist on master. + """ + + def test_reindex_extends_with_absent( + self, x: Variable, time: pd.RangeIndex + ) -> None: + extended = pd.RangeIndex(8, name="time") + result = x.reindex(time=extended) + assert result.sizes["time"] == 8 + # Original slots 0..4 are preserved + assert int(result.labels.values[0]) == int(x.labels.values[0]) + # New slots 5..7 are absent + assert (result.labels.values[5:] == -1).all() + assert np.isnan(result.lower.values[5:]).all() + assert np.isnan(result.upper.values[5:]).all() + + def test_reindex_subset_drops_coords(self, x: Variable) -> None: + """ + Reindex to a strict subset shrinks the variable (no absence + introduced — those slots are just gone). + """ + result = x.reindex(time=pd.RangeIndex(3, name="time")) + assert result.sizes["time"] == 3 + assert not (result.labels.values == -1).any() + + def test_reindex_like_extends_with_absent(self, m: Model, x: Variable) -> None: + wider = m.add_variables( + lower=0, coords=[pd.RangeIndex(7, name="time")], name="wider" + ) + result = x.reindex_like(wider) + assert result.sizes["time"] == 7 + assert (result.labels.values[5:] == -1).all() + + @pytest.mark.v1 + def test_reindexed_variable_propagates_absence_in_arithmetic( + self, x: Variable, time: pd.RangeIndex + ) -> None: + """ + §4 + §6 hand-off: a reindex-introduced absence flows through + the next operator and is visible via isnull(). + """ + wider = x.reindex(time=pd.RangeIndex(7, name="time")) + expr = wider * 3 + assert bool(expr.isnull().values[5:].all()) + assert not bool(expr.isnull().values[:5].any()) + + def test_where_creates_absence(self, x: Variable) -> None: + """§4 — ``.where(cond)`` marks slots absent in place.""" + cond = xr.DataArray( + [True, True, False, False, False], + dims=["time"], + coords={"time": x.coords["time"]}, + ) + masked = x.where(cond) + assert (masked.labels.values[2:] == -1).all() + assert not (masked.labels.values[:2] == -1).any() + + def test_unstack_creates_absence_at_missing_combinations(self, m: Model) -> None: + """ + §4 — ``.unstack`` of a non-rectangular MultiIndex leaves the + missing combinations as absent slots. + """ + # Three (region, year) observations that don't form a full grid: + # (DE, 2030) and (DE, 2040) exist but (FR, 2030) only — so + # unstacking (FR, 2040) becomes absent. + idx = pd.MultiIndex.from_tuples( + [("DE", 2030), ("DE", 2040), ("FR", 2030)], + names=("region", "year"), + ) + v = m.add_variables(coords=[idx], name="v") + unstacked = v.unstack("dim_0") + assert unstacked.sizes == {"region": 2, "year": 2} + # (FR, 2040) missing → absent + assert int(unstacked.labels.sel(region="FR", year=2040).item()) == -1 + # The three present cells stay present + assert int(unstacked.labels.sel(region="DE", year=2030).item()) != -1 + + @pytest.mark.parametrize("method", ["roll", "sel", "isel"]) + def test_data_preserving_methods_do_not_create_absence( + self, x: Variable, method: str + ) -> None: + """ + §4 negative — operations that *move or select* existing data + never introduce absent slots. Pins the spec's contrast against + the absence-creating mechanisms. + """ + results = { + "roll": lambda: x.roll(time=2), + "sel": lambda: x.sel(time=[0, 2, 4]), + "isel": lambda: x.isel(time=[0, 2, 4]), + } + result = results[method]() + assert not (result.labels.values == -1).any() + + +# ===================================================================== +# §10 — named-method join= argument (opt-in alignment) +# ===================================================================== + + +class TestNamedMethodJoin: + """ + Under v1 the bare operators raise on coord mismatch (§8). The + named methods let the caller opt in to a specific join mode. + """ + + @pytest.fixture + def subset(self, time: pd.RangeIndex) -> xr.DataArray: + return xr.DataArray( + [10.0, 30.0], dims=["time"], coords={"time": pd.Index([1, 3], name="time")} + ) + + @pytest.mark.v1 + def test_add_join_inner_intersects(self, x: Variable, subset: xr.DataArray) -> None: + """`.add(other, join="inner")` picks the intersection of coords.""" + result = x.add(subset, join="inner") + assert list(result.coords["time"].values) == [1, 3] + + @pytest.mark.v1 + def test_add_join_outer_fills(self, x: Variable, subset: xr.DataArray) -> None: + """`.add(other, join="outer")` unions coords (gaps are filled).""" + result = x.add(subset, join="outer") + assert list(result.coords["time"].values) == [0, 1, 2, 3, 4] + + @pytest.mark.v1 + def test_mul_join_inner(self, x: Variable, subset: xr.DataArray) -> None: + result = x.mul(subset, join="inner") + assert list(result.coords["time"].values) == [1, 3] + + @pytest.mark.v1 + def test_le_join_inner_on_subset_rhs( + self, x: Variable, subset: xr.DataArray + ) -> None: + """`.le(rhs, join="inner")` lets a subset RHS through cleanly.""" + result = x.le(subset, join="inner") + assert list(result.coords["time"].values) == [1, 3] + + @pytest.mark.v1 + def test_bare_op_still_raises_on_mismatch( + self, x: Variable, subset: xr.DataArray + ) -> None: + """`x + subset` (no `join=`) still raises — opt-in is required.""" + with pytest.raises(ValueError, match="Coordinate mismatch on shared dimension"): + x + subset + + @pytest.mark.v1 + def test_add_join_override_aligns_positionally(self, x: Variable) -> None: + """ + ``join="override"`` is the explicit-positional mode — the right + operand's labels are dropped and the left's are reused. The mode + is opt-in precisely because it can silently mis-pair if the user + didn't mean it. + """ + relabelled = xr.DataArray( + [1.0, 2.0, 3.0, 4.0, 5.0], + dims=["time"], + coords={"time": pd.Index([10, 11, 12, 13, 14], name="time")}, + ) + result = x.add(relabelled, join="override") + # Override keeps the left operand's labels — and silently re-uses + # the right's values at those positions. + assert list(result.coords["time"].values) == [0, 1, 2, 3, 4] + assert result.const.values.tolist() == [1.0, 2.0, 3.0, 4.0, 5.0] + + @pytest.mark.v1 + def test_add_join_override_size_mismatch_raises(self, x: Variable) -> None: + """ + §10 / ``override`` documentation says "positional alignment, made + explicit". Positional pairing is only well-defined when the + shared-dim sizes match; with mismatched sizes ``override`` would + silently mis-pair (or raise opaquely from xarray) instead of + producing a clear error. Regression for the dropped legacy + ``other.sizes == self.const.sizes`` gate. + """ + shorter = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["time"], + coords={"time": pd.Index([0, 1, 2], name="time")}, + ) + with pytest.raises(ValueError, match="join='override' requires matching"): + x.add(shorter, join="override") + + @pytest.mark.v1 + def test_reindex_like_resolves_mismatch_before_bare_op(self, x: Variable) -> None: + """ + §10 names ``.reindex(...)`` / ``.reindex_like(...)`` as + canonical resolutions — pre-aligning lets the bare operator + accept the once-mismatched operand without ``join=``. + """ + other = xr.DataArray( + [1.0, 2.0, 3.0, 4.0, 5.0], + dims=["time"], + coords={"time": pd.Index([10, 11, 12, 13, 14], name="time")}, + ) + aligned = other.reindex_like(x.labels, fill_value=0) + result = x + aligned # bare + succeeds because coords now match + assert list(result.coords["time"].values) == [0, 1, 2, 3, 4] + + @pytest.mark.v1 + def test_assign_coords_resolves_mismatch_before_bare_op(self, x: Variable) -> None: + """ + ``.assign_coords(...)`` is the explicit-positional escape — + relabels one side outright so the bare operator's exact-join + check passes. + """ + other = xr.DataArray( + [1.0, 2.0, 3.0, 4.0, 5.0], + dims=["time"], + coords={"time": pd.Index([10, 11, 12, 13, 14], name="time")}, + ) + relabelled = other.assign_coords(time=x.coords["time"]) + result = x + relabelled # bare + succeeds after relabel + assert list(result.coords["time"].values) == [0, 1, 2, 3, 4] + + +# ===================================================================== +# §12 — constraints follow the same rules +# ===================================================================== + + +_SIGNS = { + "le": operator.le, + "ge": operator.ge, + "eq": operator.eq, +} + + +class TestConstraintRHS: + @pytest.mark.v1 + @pytest.mark.parametrize("sign", ["le", "ge", "eq"]) + def test_subset_rhs_raises(self, x: Variable, sign: str) -> None: + """§12 — all three comparison signs align by §8 the same way.""" + subset = xr.DataArray( + [10.0, 20.0], + dims=["time"], + coords={"time": pd.Index([1, 3], name="time")}, + ) + with pytest.raises(ValueError, match="Coordinate mismatch on shared dimension"): + _SIGNS[sign](x, subset) + + @pytest.mark.v1 + @pytest.mark.parametrize("sign", ["le", "ge", "eq"]) + def test_nan_rhs_raises(self, x: Variable, time: pd.RangeIndex, sign: str) -> None: + """ + §5/§12 — a NaN in a user-supplied RHS raises for every sign, + never silently becomes "no constraint" the way legacy auto_mask + treats it. + """ + nan_rhs = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + with pytest.raises(ValueError, match="NaN"): + _SIGNS[sign](x, nan_rhs) + + @pytest.mark.v1 + @pytest.mark.parametrize("sign", ["le", "ge", "eq"]) + def test_absence_propagates_to_rhs_drops_constraint( + self, x: Variable, sign: str + ) -> None: + """ + §6 → §12 for every sign: a constraint over an absent LHS slot + yields NaN RHS, which downstream auto-mask interprets as "no + constraint here". + """ + xs = x.shift(time=1) + # xs is absent at time=0; the constraint's RHS at that slot + # should be NaN (no constraint), not 10. + constraint = _SIGNS[sign](xs, 10) + rhs = constraint.rhs.values + assert np.isnan(rhs[0]) + assert (rhs[1:] == 10).all() + + @pytest.mark.v1 + def test_pypsa_1683_nan_rhs_raises(self, x: Variable, time: pd.RangeIndex) -> None: + """ + PyPSA #1683 on the constraint side — ``min_pu * nominal_fix`` + with ``p_nom=inf`` and ``p_min_pu=0`` yields NaN at the bad slot; + v1 raises at construction instead of silently passing NaN to + the solver. + """ + nominal = xr.DataArray([np.inf] * 5, dims=["time"], coords={"time": time}) + min_pu = xr.DataArray( + [1.0, 0.0, 1.0, 1.0, 1.0], dims=["time"], coords={"time": time} + ) + bound = min_pu * nominal # 0*inf = NaN at time=1 + with pytest.raises(ValueError, match="NaN"): + x >= bound + + @pytest.mark.legacy + def test_nan_rhs_silently_treated_as_unconstrained( + self, x: Variable, time: pd.RangeIndex + ) -> None: + """ + Document the legacy auto_mask path: a NaN RHS is silently + kept as NaN and the constraint at that row is later dropped. + """ + nan_rhs = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + constraint = x <= nan_rhs + assert np.isnan(constraint.rhs.values[1]) + + @pytest.mark.legacy + def test_warn_on_nan_rhs( + self, x: Variable, time: pd.RangeIndex, unsilenced: None + ) -> None: + nan_rhs = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + with pytest.warns(LinopySemanticsWarning, match="no constraint at this row"): + x <= nan_rhs + + @pytest.mark.legacy + def test_warn_on_coord_mismatch_rhs_distinguishes_from_nan( + self, x: Variable, unsilenced: None + ) -> None: + """ + A subset RHS has no user NaN — legacy's ``reindex_like`` is what + introduces the NaN at the unmatched positions. The warning should + diagnose the *coord mismatch* (fix: ``.sel`` / ``.reindex``), not + the NaN-RHS auto-mask (fix: ``mask=`` / ``.fillna``). Regression + for the conflated warn text where both causes used the same + ``_legacy_nan_rhs_constraint_message``. + """ + subset = xr.DataArray( + [10.0, 20.0], + dims=["time"], + coords={"time": pd.Index([1, 3], name="time")}, + ) + with pytest.warns( + LinopySemanticsWarning, match="Coordinate mismatch in constraint RHS" + ): + x <= subset + + @pytest.mark.legacy + def test_both_warnings_fire_when_rhs_has_user_nan_and_mismatch( + self, x: Variable, unsilenced: None + ) -> None: + """ + Independent causes — when the RHS is both subset (mismatch) and + carries a user NaN, both fix-hints should surface so the caller + sees each problem with its own resolution. + """ + both = xr.DataArray( + [10.0, np.nan], + dims=["time"], + coords={"time": pd.Index([1, 3], name="time")}, + ) + with warnings.catch_warnings(record=True) as caught: + warnings.simplefilter("always", LinopySemanticsWarning) + x <= both + messages = [str(w.message) for w in caught] + assert any("Coordinate mismatch in constraint RHS" in m for m in messages) + assert any("no constraint at this row" in m for m in messages) + + +# ===================================================================== +# §13 — reductions skip absent slots (not propagate) +# ===================================================================== + + +class TestReductionsSkipAbsent: + """ + Per §13, ``sum`` / ``groupby.sum`` skip absent slots rather than + propagating them — the only asymmetry against §6's binary-operator + rule. The expected behaviour falls out of xarray's ``skipna=True`` + default; these tests pin it under v1 so future changes don't drift. + + Scope: §13 also names ``mean``, ``resample``, and ``coarsen``, but + those are not yet exposed on ``LinearExpression`` (see #703). The + spec text is the rule they will follow when implemented; tests + belong with the implementation PR. + """ + + @pytest.fixture + def xs(self, x: Variable) -> Variable: + return x.shift(time=1) + + @pytest.mark.v1 + def test_sum_over_dim_skips_absent(self, xs: Variable) -> None: + """ + ``(xs + 5).sum('time')`` skips the absent slot at t=0 and + sums the four present 5s → 20. + """ + result = (xs + 5).sum("time") + assert float(result.const) == 20.0 + + @pytest.mark.v1 + def test_sum_no_dim_skips_absent(self, xs: Variable) -> None: + result = (xs + 5).sum() + assert float(result.const) == 20.0 + + @pytest.mark.v1 + def test_sum_of_all_absent_is_zero(self, x: Variable) -> None: + """§13 — "the sum of none is the zero expression.""" "" + all_absent = x.shift(time=10).to_linexpr() + assert bool(all_absent.isnull().all().item()) + result = all_absent.sum("time") + assert float(result.const) == 0.0 + + @pytest.mark.v1 + def test_groupby_sum_skips_absent(self, xs: Variable) -> None: + """Each group's sum drops absent members, just like ``.sum``.""" + groups = xr.DataArray( + [0, 0, 1, 1, 1], dims=["time"], coords={"time": xs.coords["time"]} + ) + result = (xs + 5).groupby(groups).sum() + # group 0: [NaN, 5] → 5; group 1: [5, 5, 5] → 15 + assert result.const.values.tolist() == [5.0, 15.0] + + +# ===================================================================== +# §11 — auxiliary (non-dim) coordinate conflicts raise (covers #295) +# ===================================================================== + + +class TestAuxCoordConflict: + """ + Per §11, an auxiliary (non-dim) coord that two operands carry + with disagreeing values must raise — xarray silently drops the + conflict in arithmetic, which is the #295 bug. + """ + + @pytest.fixture + def A(self) -> pd.Index: + return pd.Index([1, 2, 3], name="A") + + @pytest.mark.v1 + def test_expr_plus_dataarray_aux_conflict_raises( + self, m: Model, A: pd.Index + ) -> None: + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + with pytest.raises(ValueError, match="Auxiliary coordinate"): + v + const + + @pytest.mark.v1 + def test_var_plus_var_aux_conflict_raises(self, m: Model, A: pd.Index) -> None: + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + w = m.add_variables(lower=0, coords=[A], name="w").assign_coords( + B=("A", [400, 400, 500]) + ) + with pytest.raises(ValueError, match="Auxiliary coordinate"): + v + w + + @pytest.mark.v1 + def test_mul_constant_aux_conflict_raises(self, m: Model, A: pd.Index) -> None: + """Same rule on the multiplication path — not just ``+``.""" + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [2.0, 3.0, 4.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + with pytest.raises(ValueError, match="Auxiliary coordinate"): + v * const + + @pytest.mark.v1 + def test_constraint_aux_conflict_raises(self, m: Model, A: pd.Index) -> None: + """§11 reaches constraint construction via the same machinery.""" + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + with pytest.raises(ValueError, match="Auxiliary coordinate"): + v == const + + @pytest.mark.v1 + def test_scalar_isel_aux_conflict_raises(self, m: Model, A: pd.Index) -> None: + """ + Scalar isels leave the indexed dim as a non-dim coord whose + value differs between operands picked at different positions. + """ + v = m.add_variables(lower=0, coords=[A], name="v") + a0 = (1 * v).isel({"A": 0}) # scalar A=1 + a1 = (1 * v).isel({"A": 1}) # scalar A=2 + with pytest.raises(ValueError, match="Auxiliary coordinate"): + a0 + a1 + + @pytest.mark.v1 + def test_isel_with_drop_true_avoids_conflict(self, m: Model, A: pd.Index) -> None: + """ + The §11 escape hatch the convention recommends: drop the + leftover scalar coord with ``isel(..., drop=True)``. + """ + v = m.add_variables(lower=0, coords=[A], name="v") + a0 = (1 * v).isel({"A": 0}, drop=True) + a1 = (1 * v).isel({"A": 1}, drop=True) + result = a0 + a1 # no aux coord → no conflict + assert "A" not in result.coords + + @pytest.mark.v1 + def test_assign_coords_resolves_conflict(self, m: Model, A: pd.Index) -> None: + """ + §11's third escape hatch: relabel one side with + ``.assign_coords`` so the coord values agree across operands. + """ + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + relabelled = const.assign_coords(B=v.coords["B"]) + result = v + relabelled + assert result.coords["B"].values.tolist() == [311, 311, 322] + + @pytest.mark.v1 + def test_multi_operand_merge_aux_conflict_raises( + self, m: Model, A: pd.Index + ) -> None: + """ + The merge-path check inspects all operands, not just two — + a 3-way ``sum(...)`` where the third disagrees still raises. + """ + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + w = m.add_variables(lower=0, coords=[A], name="w").assign_coords( + B=("A", [311, 311, 322]) + ) + u = m.add_variables(lower=0, coords=[A], name="u").assign_coords( + B=("A", [999, 999, 999]) + ) + with pytest.raises(ValueError, match="Auxiliary coordinate"): + v + w + u + + @pytest.mark.v1 + def test_aux_conflict_raises_under_explicit_join_constant( + self, m: Model, A: pd.Index + ) -> None: + """ + §11 is independent of §8 — an explicit ``join=`` must not + silence the aux-coord raise. Regression for the + ``if join is None:`` gating bug where ``.add(const, join="override")`` + would silently drop the conflicting coord. + """ + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + for join in ("override", "inner", "outer", "left", "right", "exact"): + with pytest.raises(ValueError, match="Auxiliary coordinate"): + v.add(const, join=join) + + @pytest.mark.v1 + def test_aux_conflict_raises_under_explicit_join_merge( + self, m: Model, A: pd.Index + ) -> None: + """ + Same rule on the merge path: ``linopy.merge([v, w], join="override")`` + with a conflicting aux coord must raise. + """ + import linopy + + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + w = m.add_variables(lower=0, coords=[A], name="w").assign_coords( + B=("A", [400, 400, 500]) + ) + for join in ("override", "inner", "outer", "left", "right", "exact"): + with pytest.raises(ValueError, match="Auxiliary coordinate"): + linopy.merge([1 * v, 1 * w], join=join) + + @pytest.mark.legacy + def test_aux_conflict_silently_keeps_left(self, m: Model, A: pd.Index) -> None: + """ + Document legacy: a conflict is silently resolved by keeping + the left operand's aux coord — the right operand's [400,400,500] + disappears with no signal to the caller. + """ + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + result = v + const + assert result.coords["B"].values.tolist() == [311, 311, 322] + + @pytest.mark.legacy + def test_warn_on_aux_conflict( + self, m: Model, A: pd.Index, unsilenced: None + ) -> None: + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + with pytest.warns(LinopySemanticsWarning, match=r"(?s)'B'.*silently dropped"): + v + const + + +class TestAuxCoordPropagation: + """ + Non-conflicting aux coords must propagate through arithmetic and + into constraints — the positive half of §11. + """ + + @pytest.fixture + def A(self) -> pd.Index: + return pd.Index([1, 2, 3], name="A") + + def test_aux_coord_survives_scalar_mul(self, m: Model, A: pd.Index) -> None: + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + assert "B" in (3 * v).coords + + def test_aux_coord_survives_scalar_add(self, m: Model, A: pd.Index) -> None: + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + assert "B" in (v + 5).coords + + def test_aux_coord_propagates_through_var_plus_var( + self, m: Model, A: pd.Index + ) -> None: + B = ("A", [311, 311, 322]) + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords(B=B) + w = m.add_variables(lower=0, coords=[A], name="w").assign_coords(B=B) + result = v + w + assert "B" in result.coords + assert result.coords["B"].values.tolist() == [311, 311, 322] + + def test_aux_coord_propagates_into_constraint(self, m: Model, A: pd.Index) -> None: + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + c = v <= 10 + assert "B" in c.coords + + def test_aux_coord_only_on_dataarray_propagates( + self, m: Model, A: pd.Index + ) -> None: + """ + ``x * a`` where ``a`` carries an aux coord and ``x`` doesn't — + the coord propagates through every binary operator and into the + constraint. Hits the `_align_constant` path (var-OP-DataArray) + distinct from the `merge` path tested below. + """ + x = m.add_variables(lower=0, coords=[A], name="x") + a = xr.DataArray( + [2.0, 3.0, 4.0], dims=["A"], coords={"A": A, "B": ("A", [10, 20, 30])} + ) + for expr in (x * a, x + a, x / a): + assert "B" in expr.coords + assert expr.coords["B"].values.tolist() == [10, 20, 30] + # And into the constraint + c = x <= a + assert "B" in c.coords + + def test_aux_coord_only_on_one_side_propagates(self, m: Model, A: pd.Index) -> None: + """Var+var counterpart of the above — hits the `merge` path.""" + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + w = m.add_variables(lower=0, coords=[A], name="w") # no B + result = v + w + assert "B" in result.coords + + def test_aux_coord_object_dtype_with_nan_compares_equal( + self, m: Model, A: pd.Index + ) -> None: + """ + Aux coords with object dtype can embed NaN placeholders (e.g. + ragged category labels). Two operands with identical NaN + placement must compare equal — `np.array_equal` alone treats + NaN as self-unequal on object dtype, so the §11 raise would + false-positive without the pandas-equals fallback. + """ + B = np.array([311, np.nan, 322], dtype=object) + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords(B=("A", B)) + w = m.add_variables(lower=0, coords=[A], name="w").assign_coords(B=("A", B)) + # Same B on both sides, NaN at the same slot — should propagate, not raise. + result = v + w + assert "B" in result.coords + + +# ===================================================================== +# Error-message content (raise self-description) +# ===================================================================== + + +class TestErrorMessageContent: + """ + The three v1 raises must be self-describing: name the dim or + coord and show the disagreeing values so the user can act on the + message without re-running with extra prints. Substring assertions + elsewhere don't cover this — these tests pin the rich content. + """ + + @pytest.fixture + def A(self) -> pd.Index: + return pd.Index([1, 2, 3], name="A") + + @pytest.mark.v1 + def test_user_nan_message_separates_intents(self, x: Variable) -> None: + """ + The §5 raise must not collapse `data error` and `absence` into a + single suggestion — they need different fixes. + """ + with pytest.raises(ValueError) as exc: + x + float("nan") + msg = str(exc.value) + assert "data error" in msg and ".fillna(value)" in msg + assert "absent" in msg and "mask=" in msg + assert ".reindex" in msg or ".where(cond)" in msg + + @pytest.mark.v1 + def test_shared_dim_message_names_dim_and_values( + self, m: Model, time: pd.RangeIndex + ) -> None: + """ + The merge-path §8 raise must name the offending dim and show + both sides' labels — otherwise the user can't tell which dim + out of many. + """ + other = m.add_variables( + lower=0, coords=[pd.Index([10, 11, 12, 13, 14], name="time")], name="other" + ) + x_local = m.add_variables(lower=0, coords=[time], name="x_local") + with pytest.raises(ValueError) as exc: + x_local + other + msg = str(exc.value) + assert "'time'" in msg + assert "[0, 1, 2, 3, 4]" in msg + assert "[10, 11, 12, 13, 14]" in msg + + @pytest.mark.v1 + def test_aux_conflict_message_names_coord_and_values( + self, m: Model, A: pd.Index + ) -> None: + """ + The §11 raise must name the conflicting coord and show both + sides' values — and mention `.assign_coords` as a fix, not only + `.drop_vars` and `isel(drop=True)`. + """ + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + const = xr.DataArray( + [10.0, 20.0, 30.0], + dims=["A"], + coords={"A": A, "B": ("A", [400, 400, 500])}, + ) + with pytest.raises(ValueError) as exc: + v + const + msg = str(exc.value) + assert "'B'" in msg + assert "conflicting values" in msg # value-mismatch failure mode + assert "[311, 311, 322]" in msg + assert "[400, 400, 500]" in msg + # All three resolution paths from §11 should be listed. + assert ".drop_vars" in msg + assert ".assign_coords" in msg + assert ".isel" in msg + + @pytest.mark.v1 + def test_aux_conflict_message_distinguishes_shape_vs_value( + self, m: Model, A: pd.Index + ) -> None: + """ + Shape mismatch and value disagreement are different failure + modes — surface that in the message text so the caller can + diagnose without re-reading both arrays. + """ + # scalar-isel leaves a 0-d aux coord on one side; the full vector + # on the other has a different shape, not a different value. + v = m.add_variables(lower=0, coords=[A], name="v").assign_coords( + B=("A", [311, 311, 322]) + ) + scalar_side = (1 * v).isel({"A": 0}) # B becomes a 0-d scalar coord + full_side = 1 * v + with pytest.raises(ValueError, match="differing shapes") as exc: + scalar_side + full_side + msg = str(exc.value) + assert "'B'" in msg + assert "shape" in msg + + +# ===================================================================== +# Rough edges — catches NaN that slips past the operator-level check +# ===================================================================== + + +class TestUserNaNEdgeCases: + """ + Regression guards for three NaN-entry routes that were untested. + The first two are already caught upstream (at the operator that + constructs the expression); the third needed an ``add_objective`` + boundary check because a hand-built expression with NaN const + skips the operator path entirely. + """ + + @pytest.mark.v1 + def test_nan_in_expression_used_in_objective_raises( + self, m: Model, time: pd.RangeIndex + ) -> None: + """ + ``add_objective((x * nan_costs).sum())`` raises at the ``*`` + before the objective even sees the expression — guard against + a regression that lets NaN-cost objectives slip through. + """ + x = m.add_variables(lower=0, coords=[time], name="x") + nan_costs = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + with pytest.raises(ValueError, match="NaN"): + m.add_objective((x * nan_costs).sum()) + + @pytest.mark.v1 + def test_nan_in_constraint_lhs_raises(self, m: Model, time: pd.RangeIndex) -> None: + """ + ``(x + nan_da) <= 5`` raises at the ``+`` on the LHS — the + RHS path is tested elsewhere; this guards the symmetric LHS + case. + """ + x = m.add_variables(lower=0, coords=[time], name="x") + nan_da = xr.DataArray( + [1.0, np.nan, 3.0, 4.0, 5.0], dims=["time"], coords={"time": time} + ) + with pytest.raises(ValueError, match="NaN"): + (x + nan_da) <= 5 diff --git a/test/test_linear_expression.py b/test/test_linear_expression.py index e9535ad6..e66e107c 100644 --- a/test/test_linear_expression.py +++ b/test/test_linear_expression.py @@ -388,9 +388,12 @@ def test_linear_expression_substraction( assert res.data.notnull().all().to_array().all() +@pytest.mark.legacy def test_linear_expression_sum( x: Variable, y: Variable, z: Variable, v: Variable ) -> None: + # Legacy-only: ``v.loc[:9] + v.loc[10:]`` merges disjoint coords + # (forbidden by v1 §8). expr = 10 * x + y + z res = expr.sum("dim_0") @@ -410,9 +413,12 @@ def test_linear_expression_sum( assert len(expr.coords["dim_2"]) == 10 +@pytest.mark.legacy def test_linear_expression_sum_with_const( x: Variable, y: Variable, z: Variable, v: Variable ) -> None: + # Legacy-only: ``v.loc[:9] + v.loc[10:]`` merges disjoint coords + # (forbidden by v1 §8). expr = 10 * x + y + z + 10 res = expr.sum("dim_0") @@ -538,7 +544,17 @@ def test_linear_expression_multiplication_invalid( expr / x +@pytest.mark.legacy class TestCoordinateAlignment: + """ + Documents legacy positional/left-join alignment and silent NaN fill. + + The whole block is legacy-only: v1 raises on these cases (see + `test_legacy_violations.py` and convention.md §5/§8). Once later slices + flesh out v1's equivalent coverage, individual tests here can be + reclassified or removed. + """ + @pytest.fixture(params=["da", "series"]) def subset(self, request: Any) -> xr.DataArray | pd.Series: if request.param == "da": @@ -1631,15 +1647,18 @@ def test_merge(x: Variable, y: Variable, z: Variable) -> None: assert isinstance(res, LinearExpression) # now concat with same length of terms - expr1 = z.sel(dim_0=0).sum("dim_1") - expr2 = z.sel(dim_0=1).sum("dim_1") + # ``drop=True`` so the scalar ``dim_0`` coord doesn't survive each .sel + # and trip §11's aux-coord-conflict check (the two picks pin dim_0=0 + # vs dim_0=1). + expr1 = z.sel(dim_0=0, drop=True).sum("dim_1") + expr2 = z.sel(dim_0=1, drop=True).sum("dim_1") res = merge([expr1, expr2], dim="dim_1", cls=LinearExpression) assert res.nterm == 3 # now with different length of terms - expr1 = z.sel(dim_0=0, dim_1=slice(0, 1)).sum("dim_1") - expr2 = z.sel(dim_0=1).sum("dim_1") + expr1 = z.sel(dim_0=0, dim_1=slice(0, 1), drop=True).sum("dim_1") + expr2 = z.sel(dim_0=1, drop=True).sum("dim_1") res = merge([expr1, expr2], dim="dim_1", cls=LinearExpression) assert res.nterm == 3 @@ -1878,9 +1897,12 @@ def c(self, m2: Model) -> Variable: return m2.variables["c"] class TestAddition: + @pytest.mark.legacy def test_add_join_none_preserves_default( self, a: Variable, b: Variable ) -> None: + # Legacy-only: a and b have different coords on dim ``i``; + # under v1 both arithmetic forms raise (see convention.md §8). result_default = a.to_linexpr() + b.to_linexpr() result_none = a.to_linexpr().add(b.to_linexpr(), join=None) assert_linequal(result_default, result_none) @@ -2139,9 +2161,11 @@ def test_div_constant_outer_fill_values(self, a: Variable) -> None: assert result.coeffs.squeeze().sel(i=0).item() == pytest.approx(1.0) class TestQuadratic: + @pytest.mark.legacy def test_quadratic_add_constant_join_inner( self, a: Variable, b: Variable ) -> None: + # Legacy-only: a*b has misaligned coords on ``i`` (§8 raises in v1). quad = a.to_linexpr() * b.to_linexpr() const = xr.DataArray([10, 20, 30], dims=["i"], coords={"i": [1, 2, 3]}) result = quad.add(const, join="inner") @@ -2153,9 +2177,11 @@ def test_quadratic_add_expr_join_inner(self, a: Variable) -> None: result = quad.add(const, join="inner") assert list(result.data.indexes["i"]) == [0, 1] + @pytest.mark.legacy def test_quadratic_mul_constant_join_inner( self, a: Variable, b: Variable ) -> None: + # Legacy-only: a*b has misaligned coords on ``i`` (§8 raises in v1). quad = a.to_linexpr() * b.to_linexpr() const = xr.DataArray([2, 3, 4], dims=["i"], coords={"i": [1, 2, 3]}) result = quad.mul(const, join="inner") diff --git a/test/test_optimization.py b/test/test_optimization.py index a2912c6f..99a06b6a 100644 --- a/test/test_optimization.py +++ b/test/test_optimization.py @@ -597,6 +597,7 @@ def test_duplicated_variables( assert all(np.isclose(model_with_duplicated_variables.solution["x"], 5, rtol=tol)) +@pytest.mark.legacy @pytest.mark.parametrize("solver,io_api,explicit_coordinate_names", params) def test_non_aligned_variables( model_with_non_aligned_variables: Model, @@ -604,6 +605,10 @@ def test_non_aligned_variables( io_api: str, explicit_coordinate_names: bool, ) -> None: + """ + Legacy-only: var+var on the same dim with different coords (see + convention.md §8). Under v1, the model construction itself raises. + """ status, condition = model_with_non_aligned_variables.solve( solver, io_api=io_api, explicit_coordinate_names=explicit_coordinate_names ) @@ -941,6 +946,7 @@ def test_modified_model( assert (modified_model.solution.y == 10).all() +@pytest.mark.legacy @pytest.mark.parametrize("solver,io_api,explicit_coordinate_names", params) def test_masked_variable_model( masked_variable_model: Model, @@ -948,6 +954,14 @@ def test_masked_variable_model( io_api: str, explicit_coordinate_names: bool, ) -> None: + """ + Legacy-only: asserts that ``x + y >= 10`` with ``y`` masked still + binds ``x >= 10`` at the masked slots — which only works because + legacy collapses the absent ``y`` to 0. Under v1 §6 the absence in + ``y`` propagates into the constraint and the constraint is dropped + at the masked slots, so ``x`` is free to be 0 there. The v1 way to + express the legacy intent is ``x + y.fillna(0) >= 10``. + """ masked_variable_model.solve( solver, io_api=io_api, explicit_coordinate_names=explicit_coordinate_names ) @@ -962,6 +976,70 @@ def test_masked_variable_model( assert_equal(x.add(y).solution, x.solution + y.solution.fillna(0)) +@pytest.mark.v1 +@pytest.mark.parametrize("solver,io_api,explicit_coordinate_names", params) +def test_masked_variable_model_v1_drops_constraint( + masked_variable_model: Model, + solver: str, + io_api: str, + explicit_coordinate_names: bool, +) -> None: + """ + v1 counterpart of ``test_masked_variable_model``. Under §6 the + absence of ``y`` at the last two slots propagates into ``x + y`` + and from there into the constraint, so the constraint drops at + those slots — ``x`` is no longer pinned to 10 there and the + objective ``2x + y`` drives it to 0 where it's still bound. + + Pin two things together: + 1. Model structure: con0 is masked at the absent slots (its label + is -1, no row emitted to the solver). This is the v1 invariant + that distinguishes us from legacy and is solver-independent. + 2. Solver outcome on the bound slots: ``x[:8]`` solves to 0 (the + constraint binds via ``y[:8] = 10``). ``x[-2:]`` is solver- + dependent — some solvers presolve away free variables and the + solution comes back as NaN — so we don't pin it here. + """ + con = masked_variable_model.constraints["con0"] + assert (con.labels.values[-2:] == -1).all() + assert (con.labels.values[:-2] != -1).all() + + masked_variable_model.solve( + solver, io_api=io_api, explicit_coordinate_names=explicit_coordinate_names + ) + x = masked_variable_model.variables.x + y = masked_variable_model.variables.y + tol = GPU_SOL_TOL if solver in gpu_solvers else CPU_SOL_TOL + assert y.solution[-2:].isnull().all() + assert (np.isclose(x.solution[:-2], 0, atol=tol)).all() + + +@pytest.mark.v1 +@pytest.mark.parametrize("solver,io_api,explicit_coordinate_names", params) +def test_masked_variable_model_v1_fillna_binds( + solver: str, + io_api: str, + explicit_coordinate_names: bool, +) -> None: + """ + §7 escape hatch under v1: ``x + y.fillna(0) >= 10`` revives the + masked slots as a present zero, so the constraint binds and the + legacy outcome (``x[-2:] == 10``) is recovered. The placement of + ``fillna`` is the caller's explicit statement of intent. + """ + m = Model() + lower = pd.Series(0, range(10)) + x = m.add_variables(lower, name="x") + mask = pd.Series([True] * 8 + [False, False]) + y = m.add_variables(lower, name="y", mask=mask) + m.add_constraints(x + y.fillna(0), GREATER_EQUAL, 10) + m.add_constraints(y, GREATER_EQUAL, 0) + m.add_objective(2 * x + y) + m.solve(solver, io_api=io_api, explicit_coordinate_names=explicit_coordinate_names) + tol = GPU_SOL_TOL if solver in gpu_solvers else CPU_SOL_TOL + assert (np.isclose(m.variables.x.solution[-2:], 10, rtol=tol)).all() + + @pytest.mark.parametrize("solver,io_api,explicit_coordinate_names", params) def test_masked_constraint_model( masked_constraint_model: Model, @@ -1229,6 +1307,7 @@ def test_auto_mask_variable_model( assert y.solution[:-2].notnull().all() +@pytest.mark.legacy @pytest.mark.parametrize("solver,io_api,explicit_coordinate_names", params) def test_auto_mask_constraint_model( auto_mask_constraint_model: Model, @@ -1236,7 +1315,11 @@ def test_auto_mask_constraint_model( io_api: str, explicit_coordinate_names: bool, ) -> None: - """Test that auto_mask=True correctly masks constraints with NaN RHS.""" + """ + Test that auto_mask=True correctly masks constraints with NaN RHS. + + Legacy-only: v1 forbids NaN constraint RHS (see convention.md §5/§12). + """ auto_mask_constraint_model.solve( solver, io_api=io_api, explicit_coordinate_names=explicit_coordinate_names ) diff --git a/test/test_piecewise_constraints.py b/test/test_piecewise_constraints.py index c44af394..050150d7 100644 --- a/test/test_piecewise_constraints.py +++ b/test/test_piecewise_constraints.py @@ -1389,7 +1389,13 @@ def test_broadcast_over_extra_dims(self) -> None: # =========================================================================== +@pytest.mark.legacy class TestNaNMasking: + """ + NaN-as-masking patterns — legacy-only; v1 requires explicit ``mask=`` + or ``.where()`` (see convention.md §4, §5). + """ + def test_nan_masks_lambda_labels(self) -> None: """NaN in y_points produces masked labels in SOS2 formulation.""" m = Model() @@ -2039,8 +2045,13 @@ def test_expression_name_fallback(self) -> None: ) assert f"pwl0{PWL_LAMBDA_SUFFIX}" in m.variables + @pytest.mark.legacy def test_incremental_with_nan_mask(self) -> None: - """Incremental method with trailing NaN creates masked delta vars.""" + """ + Incremental method with trailing NaN creates masked delta vars. + + Legacy-only: NaN-as-mask in user input (see convention.md §5). + """ m = Model() gens = pd.Index(["a", "b"], name="gen") x = m.add_variables(coords=[gens], name="x") @@ -2317,6 +2328,7 @@ def test_convexity_invariant_to_x_direction(self) -> None: assert f_asc.method != "lp" assert f_desc.method != "lp" + @pytest.mark.legacy def test_lp_per_entity_nan_padding( self, nan_padded_pwl_model: Callable[[Method], Model] ) -> None: @@ -2324,12 +2336,15 @@ def test_lp_per_entity_nan_padding( Per-entity NaN-padded breakpoints with method='lp': padded segments must be masked out so they don't create spurious ``y ≤ 0`` constraints (bug-2 regression). + + Legacy-only: NaN-as-mask in user input (see convention.md §5). """ m = nan_padded_pwl_model("lp") m.solve() # f_b(10) on chord (5,10)→(15,15) is 12.5 assert abs(float(m.solution.sel({"entity": "b"})["y"]) - 12.5) < 1e-3 + @pytest.mark.legacy @pytest.mark.skipif(not _SOS_PATHS, reason="No SOS-capable solver installed") @pytest.mark.parametrize(("solver", "io_api"), _SOS_PATHS) def test_sos2_per_entity_nan_padding( @@ -2511,9 +2526,14 @@ def test_lp_domain_bound_infeasible_when_x_out_of_range(self) -> None: status, _ = m.solve() assert status != "ok" + @pytest.mark.legacy @pytest.mark.skipif(not _any_solvers, reason="no solver available") def test_lp_domain_uses_paired_valid_breakpoints(self) -> None: - """A trailing NaN in y must also shrink the LP x-domain.""" + """ + A trailing NaN in y must also shrink the LP x-domain. + + Legacy-only: NaN-as-mask in user input (see convention.md §5). + """ m = Model() x = m.add_variables(lower=0, upper=2, name="x") y = m.add_variables(lower=0, upper=10, name="y")