Skip to content

Conversation

@FBumann
Copy link
Contributor

@FBumann FBumann commented Jan 20, 2026

Summary

Fixes silent data corruption when adding expressions whose coordinates have the same values but in different order (e.g. ['costs', 'Penalty'] vs ['Penalty', 'costs']). Previously, merge() used positional concatenation via join='override', ignoring labels entirely.

What changed

When merge() detects that two datasets share the same coordinate values in a different order, it now reindexes to the first dataset's order before concatenation. This only triggers when the coordinate value sets are identical — different subsets or different sizes are unaffected.

What is NOT affected

  • Expressions with identical coordinates (same values, same order) — fast override path, unchanged
  • Expressions with different dimension sizes — outer join path, unchanged
  • Positional patterns like v.loc[:9] + v.loc[10:] — different value sets, override preserved
Bug reproduction script
import pandas as pd
import linopy

m = linopy.Model()
var = m.add_variables(coords=[pd.Index(['costs', 'Penalty'], name='effect')], name='var')
factors = pd.Series([2.0, 1.0], index=pd.Index(['Penalty', 'costs'], name='effect'))

result = var - 10 * factors

costs = float(result.const.sel(effect='costs'))
penalty = float(result.const.sel(effect='Penalty'))
print(f"costs: {costs} (expected -10), penalty: {penalty} (expected -20)")
assert costs == -10.0 and penalty == -20.0, "BUG: values misaligned!"
Benchmark script (dev-scripts/benchmark_merge_alignment.py)
"""Benchmark merge performance: compare override (same coords) vs outer join paths.

Run from repo root:
    python dev-scripts/benchmark_merge_alignment.py
"""

import statistics
import timeit

import pandas as pd

import linopy

ROUNDS = 10
CALLS_PER_ROUND = 10

m = linopy.Model()

time = pd.RangeIndex(8760, name="time")  # hourly year
space = pd.Index([f"node_{i}" for i in range(100)], name="space")
tech = pd.Index([f"tech_{i}" for i in range(20)], name="tech")

gen = m.add_variables(coords=[time, space, tech], name="gen")
cap = m.add_variables(coords=[space, tech], name="cap")

# Warm up
_ = 1 * gen


def bench(label: str, func, rounds: int = ROUNDS, number: int = CALLS_PER_ROUND):
    times = [
        timeit.timeit(func, number=number) / number * 1000 for _ in range(rounds)
    ]
    mean = statistics.mean(times)
    stdev = statistics.stdev(times)
    print(f"{label}: {mean:.1f} ± {stdev:.1f} ms  (n={rounds}x{number})")


# 1) Same-shape expressions (override path) — common in energy models
bench(
    "4-term sum (same shape, 8760x100x20)",
    lambda: 3.5 * gen + 1.2 * gen + 0.8 * gen + 2.0 * gen,
)

# 2) Broadcasting dimensions (gen has time, cap doesn't)
bench(
    "2-term sum (broadcasting, gen+cap)",
    lambda: 1 * gen + 1 * cap,
)

# 3) Many small additions
vars_list = [m.add_variables(coords=[time], name=f"v{i}") for i in range(50)]
bench(
    "50-term sum (1D, 8760)",
    lambda: sum(1 * v for v in vars_list),
    number=5,
)

Checklist

  • Code changes are sufficiently documented
  • Unit tests for new features were added
  • A note for the release notes doc/release_notes.rst is included
  • I consent to the release of this PR's code under the MIT license

@FBumann FBumann changed the title First attempt Fix coordinate misalignment in expression merge with join='override' Jan 20, 2026
…ing logic

   - Mark defensive code paths with pragma: no cover (unreachable in practice)
@FBumann
Copy link
Contributor Author

FBumann commented Jan 23, 2026

Im not entirely sure if this affects performance, but its at least worth raising i thought

…ion-based approach. For each non-helper dimension, it computes the union of coordinate values across all

  datasets. If any mismatch is found, all datasets are reindexed to the union coordinates with FILL_VALUE defaults. This handles both the reordered-coords case and the
  overlapping-subset case.

  test/test_linear_expression.py: Added test_merge_with_overlapping_coords that creates variables with ["alice", "bob"] and ["bob", "charlie"], merges them, and verifies
  correct alignment — bob gets both terms, alice and charlie get only their respective term with fill values for the missing one.
…es) and then patching up mismatches with reindexing, _check_coords_match now

  checks that actual coordinate values and order are identical. When they're not, override is False and xarray's join='outer' handles alignment correctly. The entire reindex
  block is gone.
@FBumann FBumann changed the title Fix coordinate misalignment in expression merge with join='override' Fix coordinate misalignment in expression merge Jan 30, 2026
@FBumann FBumann marked this pull request as ready for review January 30, 2026 11:10
@FBumann FBumann marked this pull request as draft January 30, 2026 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant