Skip to content

refactor!: migrate clustering from tsam to tsam_xarray#654

Open
FBumann wants to merge 14 commits intomainfrom
refactor/tsam-xarray
Open

refactor!: migrate clustering from tsam to tsam_xarray#654
FBumann wants to merge 14 commits intomainfrom
refactor/tsam-xarray

Conversation

@FBumann
Copy link
Copy Markdown
Member

@FBumann FBumann commented Mar 31, 2026

Summary

Complete refactoring of clustering from tsam to tsam_xarray. base.py went from 1633 → 393 lines (-76%).

What changed

  • Single tsam_xarray.aggregate() call replaces per-slice tsam.aggregate() loops
  • ClusteringResults class removedClustering delegates directly to tsam_xarray.ClusteringResult
  • ClusteringPlotAccessor removed — users plot with tsam_xarray's DataArrays directly
  • expand_data()/timestep_mapping replaced with tsam_xarray.ClusteringResult.disaggregate() + xarray's ffill/interpolate_na/fillna
  • data_vars parameter removed — use ClusterConfig(weights={var: 0}) instead
  • clustering_group/clustering_weight removed from TimeSeriesData
  • original_data/aggregated_data/_metrics removed from Clustering
  • include_original_data parameter removed from to_dataset()/to_netcdf()
  • Legacy _variable_categories fallback removed from _Expander
  • Serialization uses ClusteringResult.to_dict()/from_dict() (tsam_xarray ≥ 0.5.1)

What remains on Clustering

  • Properties: n_clusters, timesteps_per_cluster, n_original_clusters, n_segments, is_segmented, dim_names, cluster_assignments, cluster_occurrences, segment_assignments, segment_durations
  • Methods: disaggregate(), apply(), to_json()/from_json()
  • Access: clustering_result (tsam_xarray), aggregation_result (pre-IO only)

Dependencies

  • tsam_xarray >= 0.5.1 (was tsam >= 3.1.2)
  • tsam >= 3.1.2 (explicit, for ClusterConfig/ExtremeConfig/SegmentConfig)

Test plan

  • All clustering tests pass (213 + 20 regression)
  • Regression test: non-segmented and segmented cluster → optimize → expand matches reference values
  • Full test suite passes (1717 tests)
  • CI

🤖 Generated with Claude Code

FBumann and others added 2 commits March 31, 2026 14:26
…ate() call

tsam_xarray handles (period, scenario) slicing internally, eliminating
manual iteration, expand_dims/combine helpers, and per-key DataFrame
conversion. Simplifies _ReducedFlowSystemBuilder, ClusteringResults.apply(),
and the main cluster/apply_clustering methods.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ray migration

- Remove broken Clustering.sel() (ignored args, use aggregation_result instead)
- Remove unused _from_aggregation_result classmethod
- Fix no-op rename({'cluster': 'cluster'}) in build_typical_periods
- Update docstrings: tsam 3.0 → tsam_xarray, DataFrame → DataArray

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces the per-slice tsam-based clustering API with a single tsam_xarray-backed model: Clustering now wraps tsam_xarray.ClusteringInfo (optionally holding a tsam_xarray.AggregationResult), removes ClusteringResults/AggregationResults exports and .sel()-style access, and updates TransformAccessor, serialization, and tests to the new model.

Changes

Cohort / File(s) Summary
Clustering exports & module
flixopt/clustering/__init__.py
Removed AggregationResults and ClusteringResults from public exports; module now exposes only Clustering and updated docstring/example to use clustering.clustering_info / clustering.aggregation_result.
Clustering core implementation
flixopt/clustering/base.py
Replaced multi-slice ClusteringResults with a single Clustering backed by tsam_xarray.ClusteringInfo (+ optional AggregationResult); removed .sel()/per-slice APIs; updated constructor, apply(), JSON I/O, and cluster/segment properties to delegate to tsam_xarray outputs (dim renames, coord alignment, _unrename_map); added aggregation_result property and removed legacy aliases.
Transform accessor & reduced-flow builder
flixopt/transform_accessor.py
Refactored to aggregate once into a stacked DataArray and consume a single tsam_xarray.AggregationResult; _ReducedFlowSystemBuilder now reads cluster weights, representatives, segment durations and metrics from tsam_xarray outputs (explicit dim renames/unrename map and reshaping); removed per-(period,scenario) loops and pandas metric conversion; cluster()/apply_clustering() updated to call tsam_xarray.aggregate() / ClusteringInfo.apply() and to use weights mapping rather than data_vars.
Dependency metadata
pyproject.toml
Added tsam_xarray >= 0.5.1, < 1 to the full extra and replaced tsam==3.1.2 in dev with tsam_xarray>=0.5.1.
Core TimeSeriesData API
flixopt/core.py
Removed clustering_group and clustering_weight parameters/properties from TimeSeriesData (constructor, from_dataarray, fit_to_coords, and repr no longer include clustering metadata).
IO restoration
flixopt/io.py
Restores flow_system.cluster_weight from clustering.cluster_occurrences (renamed to cluster_weight) instead of using representative_weights.
Tests & fixtures
tests/test_clustering/*, tests/test_clustering/test_clustering_io.py, tests/conftest.py, tests/deprecated/**, tests/utilities/*
Updated tests to construct Clustering from tsam_xarray.ClusteringInfo, removed ClusteringResults-centric tests and data_vars-parameter tests, adapted NetCDF roundtrip expectations to clustering_info, and removed clustering metadata from TimeSeriesData usages in fixtures/examples.
Removed/Deleted tests
tests/test_clustering/test_cluster_reduce_expand.py
Deleted TestDataVarsParameter test class and its five tests that exercised the data_vars parameter.

Sequence Diagram(s)

sequenceDiagram
  participant FlowSystem as FlowSystem
  participant Transform as TransformAccessor
  participant TSamX as tsam_xarray
  participant Clustering as flixopt.Clustering

  FlowSystem->>Transform: request .cluster(...) or apply clustering
  Transform->>TSamX: stack variables (rename reserved dims) -> tsam_xarray.aggregate()
  TSamX-->>Transform: AggregationResult
  Transform->>Clustering: construct with ClusteringInfo + AggregationResult + _unrename_map
  Clustering->>TSamX: clustering_info.apply(stacked DataArray) (with unrename_map)
  TSamX-->>Clustering: AggregationResult (assignments, durations, metrics)
  Clustering-->>Transform: expose cluster_assignments, segment_durations, representatives, metrics
  Transform->>FlowSystem: build reduced FlowSystem using clustering properties
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hopped through dims and stitched them into one,

Stacked variables, renamed — no more slice-by-slice run,
ClusteringInfo hummed, AggregationResult sang,
Assignments and reps in a tidy new gang,
A carrot-coded change — neat, compact, and fun.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 90.22% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The PR title clearly and concisely summarizes the main refactoring effort: migrating clustering from tsam to tsam_xarray library, which is the dominant theme across all changed files.
Description check ✅ Passed The PR description comprehensively documents changes, type, testing status, and dependencies. It follows the repository template with clear sections and checklist completion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/tsam-xarray

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flixopt/transform_accessor.py (1)

1590-1608: ⚠️ Potential issue | 🟠 Major

Add temporal_resolution to reserved_tsam_keys.

The guard blocks timestep_duration, but the code sets temporal_resolution at line 1662 and then merges **tsam_kwargs afterwards at line 1671. A caller can silently override the computed timestep size by passing temporal_resolution in tsam_kwargs, breaking the validation intended to prevent parameter conflicts.

🛡️ Minimal fix
         reserved_tsam_keys = {
             'n_clusters',
             'period_duration',  # exposed as cluster_duration
+            'temporal_resolution',  # computed automatically
             'timestep_duration',  # computed automatically
             'cluster',
             'segments',
             'extremes',
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 1590 - 1608, The
reserved_tsam_keys set currently blocks keys like 'timestep_duration' but not
'temporal_resolution', allowing callers to override the computed
temporal_resolution when **tsam_kwargs is merged; update the reserved_tsam_keys
set (the set literal defined near reserved_tsam_keys) to include the string
'temporal_resolution' so that the conflicts = reserved_tsam_keys &
set(tsam_kwargs.keys()) check will catch and raise the ValueError, preventing
silent overrides of the computed temporal_resolution before the code that sets
temporal_resolution and later merges **tsam_kwargs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flixopt/clustering/base.py`:
- Around line 1129-1151: The current clustering_info property returns
self._aggregation_result.clustering when _aggregation_result is present,
exposing internal renamed dimension names; change clustering_info to always
reconstruct a ClusteringInfo from self.results (use from tsam_xarray import
ClusteringInfo as ClusteringInfoClass) so dimensions are mapped back to original
names via rename_map = {v: k for k, v in self._unrename_map.items()}, compute
slice_dims = [rename_map.get(d, d) for d in results.dim_names], and return
ClusteringInfoClass(time_dim='time', cluster_dim=['variable'],
slice_dims=slice_dims, clusterings=dict(results._results)) instead of returning
_aggregation_result.clustering.

In `@pyproject.toml`:
- Line 66: The project imports symbols directly from the tsam package (e.g.,
ClusterConfig in flixopt/transform_accessor.py, ExtremeConfig in
flixopt/clustering/__init__.py, and ClusteringResult in
flixopt/clustering/base.py) but only declares tsam_xarray as a dependency; add
an explicit dependency entry for tsam in pyproject.toml (matching the same
compatible version range as tsam_xarray, e.g. "tsam >= 0.1.1, < 1") so that tsam
is a declared runtime dependency and not relied on transitively.

---

Outside diff comments:
In `@flixopt/transform_accessor.py`:
- Around line 1590-1608: The reserved_tsam_keys set currently blocks keys like
'timestep_duration' but not 'temporal_resolution', allowing callers to override
the computed temporal_resolution when **tsam_kwargs is merged; update the
reserved_tsam_keys set (the set literal defined near reserved_tsam_keys) to
include the string 'temporal_resolution' so that the conflicts =
reserved_tsam_keys & set(tsam_kwargs.keys()) check will catch and raise the
ValueError, preventing silent overrides of the computed temporal_resolution
before the code that sets temporal_resolution and later merges **tsam_kwargs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6b5ea3f2-8cd3-4bf8-a970-b7e96ccc6813

📥 Commits

Reviewing files that changed from the base of the PR and between 0580751 and 0324b59.

📒 Files selected for processing (4)
  • flixopt/clustering/__init__.py
  • flixopt/clustering/base.py
  • flixopt/transform_accessor.py
  • pyproject.toml

…ngInfo

BREAKING: ClusteringResults class removed. Clustering now delegates all
structure properties (cluster_assignments, cluster_occurrences, segment_*,
dims, coords, etc.) directly to tsam_xarray.ClusteringInfo.

- Remove ClusteringResults class (~450 lines) and all _build_property_array machinery
- Remove AggregationResults backwards-compat alias
- Remove get_result(), cluster_start_positions, timesteps_per_period
- Serialization uses ClusteringInfo.to_dict()/from_dict() (tsam_xarray>=0.5.1)
- Auto-detect _unrename_map from slice_dims for deserialization
- Update tests: remove TestClusteringResults, simplify test fixtures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
flixopt/transform_accessor.py (2)

1776-1777: Clarify error message reference to avoid confusion.

The error message says "Ensure self._fs.timesteps matches..." which exposes internal implementation details. Consider rewording to be more user-friendly.

📝 Suggested improvement
-                f'Ensure self._fs.timesteps matches the original data used for clustering.'
+                f'Ensure this FlowSystem has the same timesteps as the original data used for clustering.'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 1776 - 1777, The error message
exposes internal details by referencing self._fs.timesteps; update the string in
transform_accessor.py where that message is raised to a user-friendly message
that avoids internal attribute names (e.g., "Ensure the time-step index/length
used for clustering matches the input data" or similar), optionally include
expected vs actual counts for clarity; locate the message near the raise/check
that references self._fs.timesteps and replace the literal with the new wording
while keeping the validation logic intact.

1703-1710: Weight clearing workaround for tsam 3.2 KeyError.

The use of object.__setattr__(cr, 'weights', {}) is a hack to bypass what appears to be a frozen/immutable ClusteringResult object. This avoids a KeyError when applying clustering to data with different columns than the original.

Consider adding a comment explaining this is a workaround for a specific tsam version issue, and whether this can be removed when upgrading tsam.

📝 Suggested comment addition
             # Stack full dataset and apply existing clustering
             da_full = ds.to_dataarray(dim='variable')
             # Clear weight dict to avoid tsam 3.2 KeyError when applying to
             # data with different columns than original clustering
+            # TODO: Check if this workaround can be removed in future tsam versions
             info = agg_result.clustering
             for cr in info.clusterings.values():
                 object.__setattr__(cr, 'weights', {})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 1703 - 1710, Add a concise
explanatory comment above the loop that clears weights describing that the
object.__setattr__(cr, 'weights', {}) is an intentional workaround for a known
tsam 3.2 KeyError when applying a ClusteringResult to data with different
columns; mention that this bypasses immutability on the ClusteringResult (cr)
and note the specific tsam version and that the code can be removed once tsam is
upgraded/fixed. Also, ensure the comment references agg_result, info, and cr so
future maintainers can easily find and revisit the workaround.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@flixopt/transform_accessor.py`:
- Around line 1776-1777: The error message exposes internal details by
referencing self._fs.timesteps; update the string in transform_accessor.py where
that message is raised to a user-friendly message that avoids internal attribute
names (e.g., "Ensure the time-step index/length used for clustering matches the
input data" or similar), optionally include expected vs actual counts for
clarity; locate the message near the raise/check that references
self._fs.timesteps and replace the literal with the new wording while keeping
the validation logic intact.
- Around line 1703-1710: Add a concise explanatory comment above the loop that
clears weights describing that the object.__setattr__(cr, 'weights', {}) is an
intentional workaround for a known tsam 3.2 KeyError when applying a
ClusteringResult to data with different columns; mention that this bypasses
immutability on the ClusteringResult (cr) and note the specific tsam version and
that the code can be removed once tsam is upgraded/fixed. Also, ensure the
comment references agg_result, info, and cr so future maintainers can easily
find and revisit the workaround.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9795502e-8122-416e-85c4-b44b64e4b85a

📥 Commits

Reviewing files that changed from the base of the PR and between 0324b59 and 5e3abb9.

📒 Files selected for processing (6)
  • flixopt/clustering/__init__.py
  • flixopt/clustering/base.py
  • flixopt/transform_accessor.py
  • pyproject.toml
  • tests/test_clustering/test_base.py
  • tests/test_clustering/test_clustering_io.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • flixopt/clustering/init.py

FBumann and others added 2 commits March 31, 2026 16:08
- Add TODO comments for object.__setattr__ weight-clearing workaround
- Add tsam as explicit dependency (directly imported for ClusterConfig etc.)
- Fix dev version pin: tsam_xarray>=0.5.1 (was >=0.1.1)
- Add assertions for variable dim detection in build_typical_periods/metrics
- Inline _build_property_array into timestep_mapping (single-use helper)
- Document when legacy 'results' kwarg can be removed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When data_vars is specified, instead of clustering a subset and re-applying
to full data, now all time-varying data goes to tsam_xarray with weight=0
for excluded variables. This removes the object.__setattr__ hack in cluster().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
flixopt/transform_accessor.py (3)

1783-1785: Minor: Internal reference in user-facing error message.

The error message mentions self._fs.timesteps which exposes implementation details. Consider rephrasing:

Suggested improvement
-                f'Ensure self._fs.timesteps matches the original data used for clustering.'
+                f'Ensure the FlowSystem timesteps match the original data used for clustering.'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 1783 - 1785, The error message
currently exposes an internal attribute name (self._fs.timesteps) — update the
message constructed near where clustering.timesteps_per_cluster is referenced to
avoid implementation details; replace the explicit attribute mention with a
user-facing phrase like "the dataset's timesteps" or "the filesystem's recorded
timesteps" (or include expected vs actual counts) so the check still informs the
user to ensure the timesteps match the data used for clustering without
referencing self._fs.timesteps directly.

160-163: Same assertion pattern in build_metrics.

Apply the same refactor here for consistency.

Suggested improvement
                 unknown_dims = [d for d in metric_da.dims if d not in known_metric_dims]
-                assert len(unknown_dims) == 1, f'Expected 1 variable dim in {metric_name}, got {unknown_dims}'
+                if len(unknown_dims) != 1:
+                    raise ValueError(f'Expected 1 variable dim in {metric_name}, got {unknown_dims}')
                 variable_dim = unknown_dims[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 160 - 163, Replace the bare
assert in this block with the same explicit validation used in build_metrics:
after computing known_metric_dims and unknown_dims from metric_da.dims, check if
len(unknown_dims) != 1 and raise a clear exception (e.g. ValueError) that
includes metric_name and the unexpected unknown_dims; then safely set
variable_dim = unknown_dims[0]. This mirrors the build_metrics refactor and
removes the assertion while keeping the same variable names (known_metric_dims,
unknown_dims, metric_da, metric_name, variable_dim).

110-114: Consider softer handling for variable dimension detection.

The assertion assert len(unknown_dims) == 1 will crash with an AssertionError in production if tsam_xarray's output structure changes. Consider raising a ValueError with the descriptive message instead, or adding a fallback.

Suggested improvement
         unknown_dims = [d for d in representatives.dims if d not in known_dims]
-        assert len(unknown_dims) == 1, (
-            f'Expected exactly 1 variable dim, got {unknown_dims} (known: {known_dims}, all: {representatives.dims})'
-        )
+        if len(unknown_dims) != 1:
+            raise ValueError(
+                f'Expected exactly 1 variable dim, got {unknown_dims} (known: {known_dims}, all: {representatives.dims})'
+            )
         variable_dim = unknown_dims[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 110 - 114, Replace the hard
assertion that checks variable-dim detection with a safer runtime error or
fallback: instead of using assert len(unknown_dims) == 1 in the block that
builds known_dims and unknown_dims (references: known_dims, unknown_dims,
representatives.dims, self._unrename_map), raise a ValueError with the same
descriptive message when the condition fails (or implement a fallback branch
that handles 0 or >1 unknown dims by choosing a default dimension or logging and
returning an empty/validated result). Ensure the raised error contains the
original diagnostic text f'Expected exactly 1 variable dim, got {unknown_dims}
(known: {known_dims}, all: {representatives.dims})' so callers can debug easily.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flixopt/transform_accessor.py`:
- Around line 1711-1718: The TODO uses a placeholder issue link and the
workaround mutates frozen dataclass fields via object.__setattr__, so replace
"https://github.com/FZJ-IEK3-VSA/tsam/issues/XXX" with the actual tsam_xarray
issue number (or PR) that tracks mismatched weights in ClusteringInfo.apply(),
and update the comment to reference that real issue; keep the existing
workaround (setting agg_result.clustering -> for cr in
info.clusterings.values(): object.__setattr__(cr, 'weights', {})) but add a
short note in the comment that this is a temporary hack and must be removed once
the referenced issue is resolved, referring to ClusteringInfo.apply,
agg_result.clustering, and the cr.weights mutation so future maintainers can
find and revert it.

---

Nitpick comments:
In `@flixopt/transform_accessor.py`:
- Around line 1783-1785: The error message currently exposes an internal
attribute name (self._fs.timesteps) — update the message constructed near where
clustering.timesteps_per_cluster is referenced to avoid implementation details;
replace the explicit attribute mention with a user-facing phrase like "the
dataset's timesteps" or "the filesystem's recorded timesteps" (or include
expected vs actual counts) so the check still informs the user to ensure the
timesteps match the data used for clustering without referencing
self._fs.timesteps directly.
- Around line 160-163: Replace the bare assert in this block with the same
explicit validation used in build_metrics: after computing known_metric_dims and
unknown_dims from metric_da.dims, check if len(unknown_dims) != 1 and raise a
clear exception (e.g. ValueError) that includes metric_name and the unexpected
unknown_dims; then safely set variable_dim = unknown_dims[0]. This mirrors the
build_metrics refactor and removes the assertion while keeping the same variable
names (known_metric_dims, unknown_dims, metric_da, metric_name, variable_dim).
- Around line 110-114: Replace the hard assertion that checks variable-dim
detection with a safer runtime error or fallback: instead of using assert
len(unknown_dims) == 1 in the block that builds known_dims and unknown_dims
(references: known_dims, unknown_dims, representatives.dims,
self._unrename_map), raise a ValueError with the same descriptive message when
the condition fails (or implement a fallback branch that handles 0 or >1 unknown
dims by choosing a default dimension or logging and returning an empty/validated
result). Ensure the raised error contains the original diagnostic text
f'Expected exactly 1 variable dim, got {unknown_dims} (known: {known_dims}, all:
{representatives.dims})' so callers can debug easily.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9cd80fc4-e0c9-4529-ac63-7ed7085daa96

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3abb9 and d0cb2cf.

📒 Files selected for processing (3)
  • flixopt/clustering/base.py
  • flixopt/transform_accessor.py
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml

…t from API

BREAKING CHANGES:
- Remove `data_vars` parameter from `cluster()` — use `ClusterConfig(weights={var: 0})`
  to exclude variables from influencing clustering instead
- Remove `clustering_group` and `clustering_weight` from `TimeSeriesData` — use
  `ClusterConfig(weights={...})` directly
- Remove `_calculate_clustering_weights` and `_build_cluster_config_with_weights`
- Stop dropping constant arrays before clustering — pass all time-dim data to tsam_xarray
- Weight validation now handled by tsam_xarray directly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flixopt/transform_accessor.py (1)

1487-1498: ⚠️ Potential issue | 🟠 Major

Reserve temporal_resolution, weights, time_dim, and cluster_dim in the tsam_kwargs guard.

Lines 1554-1559 build tsam_kwargs_full with explicit temporal_resolution=dt, then merge **tsam_kwargs. Because this merged dict is unpacked in the tsam_xarray.aggregate() call (lines 1599-1604), a caller passing any of these four keys can override the method's own constraints: temporal_resolution silently alters weighting/duration logic, while weights, time_dim, and cluster_dim cause duplicate-key TypeErrors.

Proposed fix
         reserved_tsam_keys = {
             'n_clusters',
             'period_duration',  # exposed as cluster_duration
+            'temporal_resolution',
             'timestep_duration',  # computed automatically
             'cluster',
             'segments',
             'extremes',
             'preserve_column_means',
             'rescale_exclude_columns',
             'round_decimals',
             'numerical_tolerance',
+            'weights',
+            'time_dim',
+            'cluster_dim',
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 1487 - 1498, The
reserved_tsam_keys set used to guard tsam_kwargs must include the four keys that
the method itself sets or passes to tsam_xarray.aggregate to prevent callers
from overriding them; update the reserved_tsam_keys (currently defined near the
top of transform_accessor.py) to also contain 'temporal_resolution', 'weights',
'time_dim', and 'cluster_dim' so that the merge building tsam_kwargs_full (where
temporal_resolution=dt is set and then **tsam_kwargs is merged) cannot be
overridden and will not produce duplicate-key errors when calling
tsam_xarray.aggregate.
♻️ Duplicate comments (1)
flixopt/transform_accessor.py (1)

1676-1693: ⚠️ Potential issue | 🟠 Major

apply_clustering() still drops invariant scenario slices.

Unlike the cluster() path at Lines 1534-1547, Lines 1677-1681 only re-add dims that were renamed into rename_map. On multi-scenario systems where no variable explicitly carries scenario, ds.to_dataarray(dim='variable') can lose that dimension while slice_dims still expects it, so info.apply() no longer applies the stored per-scenario clustering reliably.

💡 Proposed fix
-            # Ensure extra dims are present in DataArray
-            for _orig_name, renamed in rename_map.items():
-                if renamed not in da_full.dims and renamed in ds.dims:
-                    if renamed in da_full.coords:
-                        da_full = da_full.drop_vars(renamed)
-                    da_full = da_full.expand_dims({renamed: ds.coords[renamed].values})
+            # Ensure period/scenario slice dims are present even if no variable varies across them
+            extra_dims = []
+            if self._fs.periods is not None:
+                extra_dims.append(rename_map.get('period', 'period'))
+            if self._fs.scenarios is not None:
+                extra_dims.append(rename_map.get('scenario', 'scenario'))
+            for dim_name in extra_dims:
+                if dim_name not in da_full.dims and dim_name in ds.dims:
+                    if dim_name in da_full.coords:
+                        da_full = da_full.drop_vars(dim_name)
+                    da_full = da_full.expand_dims({dim_name: ds.coords[dim_name].values})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/transform_accessor.py` around lines 1676 - 1693, apply_clustering()
currently only re-adds dims found in rename_map, so invariant dims like
"scenario" can be lost before building ClusteringInfo and calling info.apply();
modify the loop that ensures extra dims on da_full (the block that iterates
rename_map) to instead iterate clustering.dim_names and for each dim compute the
renamed name via rename_map.get(dim, dim), then if that renamed dim is missing
from da_full but present in ds.dims add it back (drop coord first if present)
using da_full.expand_dims({renamed: ds.coords[dim].values}); this mirrors the
behavior in the cluster() path and guarantees that dims expected by
ClusteringInfoClass and info.apply() (e.g., scenario) are preserved even when
not explicitly renamed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flixopt/core.py`:
- Around line 67-72: The docstring for from_dataarray is outdated — it claims
metadata is extracted from attrs but the current implementation just wraps the
incoming xr.DataArray; update the from_dataarray(cls, da: xr.DataArray)
docstring to accurately describe its behavior (e.g., "Return a TimeSeriesData
wrapping the given DataArray without extracting or modifying attrs") and remove
any mention of extracting metadata from attrs so the API documentation matches
the implementation in the from_dataarray method.

---

Outside diff comments:
In `@flixopt/transform_accessor.py`:
- Around line 1487-1498: The reserved_tsam_keys set used to guard tsam_kwargs
must include the four keys that the method itself sets or passes to
tsam_xarray.aggregate to prevent callers from overriding them; update the
reserved_tsam_keys (currently defined near the top of transform_accessor.py) to
also contain 'temporal_resolution', 'weights', 'time_dim', and 'cluster_dim' so
that the merge building tsam_kwargs_full (where temporal_resolution=dt is set
and then **tsam_kwargs is merged) cannot be overridden and will not produce
duplicate-key errors when calling tsam_xarray.aggregate.

---

Duplicate comments:
In `@flixopt/transform_accessor.py`:
- Around line 1676-1693: apply_clustering() currently only re-adds dims found in
rename_map, so invariant dims like "scenario" can be lost before building
ClusteringInfo and calling info.apply(); modify the loop that ensures extra dims
on da_full (the block that iterates rename_map) to instead iterate
clustering.dim_names and for each dim compute the renamed name via
rename_map.get(dim, dim), then if that renamed dim is missing from da_full but
present in ds.dims add it back (drop coord first if present) using
da_full.expand_dims({renamed: ds.coords[dim].values}); this mirrors the behavior
in the cluster() path and guarantees that dims expected by ClusteringInfoClass
and info.apply() (e.g., scenario) are preserved even when not explicitly
renamed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 621a5a54-6457-40b7-baae-69f81924b804

📥 Commits

Reviewing files that changed from the base of the PR and between d0cb2cf and e803041.

📒 Files selected for processing (8)
  • flixopt/core.py
  • flixopt/transform_accessor.py
  • tests/conftest.py
  • tests/deprecated/conftest.py
  • tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py
  • tests/test_clustering/test_cluster_reduce_expand.py
  • tests/test_clustering/test_integration.py
  • tests/utilities/test_dataconverter.py
💤 Files with no reviewable changes (1)
  • tests/test_clustering/test_cluster_reduce_expand.py

Comment on lines 67 to +72
def from_dataarray(
cls,
da: xr.DataArray,
clustering_group: str | None = None,
clustering_weight: float | None = None,
):
"""Create TimeSeriesData from DataArray, extracting metadata from attrs."""
final_clustering_group = clustering_group if clustering_group is not None else da.attrs.get('clustering_group')
final_clustering_weight = (
clustering_weight if clustering_weight is not None else da.attrs.get('clustering_weight')
)

return cls(da, clustering_group=final_clustering_group, clustering_weight=final_clustering_weight)
return cls(da)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the from_dataarray() docstring.

Line 71 still says this path extracts metadata from attrs, but the new implementation just wraps the incoming DataArray. Leaving that wording in place makes the post-refactor API behavior misleading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/core.py` around lines 67 - 72, The docstring for from_dataarray is
outdated — it claims metadata is extracted from attrs but the current
implementation just wraps the incoming xr.DataArray; update the
from_dataarray(cls, da: xr.DataArray) docstring to accurately describe its
behavior (e.g., "Return a TimeSeriesData wrapping the given DataArray without
extracting or modifying attrs") and remove any mention of extracting metadata
from attrs so the API documentation matches the implementation in the
from_dataarray method.

FBumann and others added 2 commits April 1, 2026 07:46
…ering_data, etc.

BREAKING CHANGES:
- Remove from Clustering: dims, coords, cluster_centers, segment_centers,
  n_representatives, representative_weights, metrics, position_within_segment
- Remove clustering_data() from TransformAccessor
- Move position_within_segment computation into _Expander (only consumer)
- Use clustering_info directly for advanced access (dims, coords, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rray

Aligns with tsam_xarray's ClusteringResult class name. Renames property,
internal attribute, parameters, and local variables throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
flixopt/clustering/base.py (1)

228-230: ⚠️ Potential issue | 🟠 Major

Public clustering_info / apply() still expose internal _period naming.

TransformAccessor.cluster() stores reserved dims as _period / _cluster inside _clustering_info. Returning that object verbatim here and calling it directly on Line 481 means multi-period or multi-scenario callers still have to rename their input arrays manually, even though the rest of Clustering presents the public period / cluster names.

Also applies to: 469-481

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flixopt/clustering/base.py` around lines 228 - 230, clustering_info currently
returns the internal _clustering_info object which exposes reserved dim names as
_period/_cluster; change clustering_info to return a copy with those internal
names mapped to the public names (period/cluster) so callers and
apply()/apply_clustering() receive public-facing dim names. Locate the
clustering_info method and _clustering_info attribute, create/return a shallow
copy or new ClusteringInfo with reserved dims renamed from "_period" -> "period"
and "_cluster" -> "cluster" (preserving other fields), and ensure any code using
apply()/apply_clustering() consumes this renamed copy so
multi-period/multi-scenario callers no longer need manual renaming.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@flixopt/clustering/base.py`:
- Around line 135-146: The code treats any dict passed to clustering_info or
results as a new ClusteringInfo payload, skipping the legacy adapter; update the
dict-handling branches so that when a dict resembles the old schema (the form
produced by from_json()/legacy "results" payloads or ClusteringResults), you
call the legacy adapter (e.g. _clustering_info_from_legacy_dict) instead of
_clustering_info_from_dict. Specifically, in the blocks that inspect
clustering_info and results (the isinstance(..., dict) paths and the from_json
flow), detect legacy keys/shape and route to _clustering_info_from_legacy_dict
(fall back to _clustering_info_from_dict only for current-schema dicts) so old
JSON/NetCDF payloads are handled correctly while preserving the new-path
behavior.

In `@flixopt/transform_accessor.py`:
- Around line 1645-1658: The current code builds a shallow copy via
dict(info.clusterings) which still references the same clustering result
objects, so the subsequent loop using object.__setattr__(cr, 'weights', {})
mutates the caller's clustering; fix by creating an independent deep copy of the
clustering result objects before mutating them (e.g., replace
dict(info.clusterings) with a deep copy of
clustering.clustering_info.clusterings or construct a new dict that clones each
clustering result object) so modifications to weights affect only the local
ClusteringInfoClass instance used before calling info.apply(da_full).

In `@tests/test_clustering/test_multiperiod_extremes.py`:
- Around line 525-527: The current assertion uses n_clusters * n_segments ==
n_clusters * 4 which is tautological after n_segments == 4; replace it with a
concrete shape/length check on the clustered time axis to ensure the result size
is correct. For example, assert the length/shape of the clustered time array
equals n_clusters * 4 (e.g. len(fs_clustered.time) == n_clusters * 4 or
fs_clustered.time.shape[0] == n_clusters * 4) or use the clustering object's
time array if available (e.g. fs_clustered.clustering.time.shape[0] ==
n_clusters * 4) instead of the current tautological comparison.

---

Duplicate comments:
In `@flixopt/clustering/base.py`:
- Around line 228-230: clustering_info currently returns the internal
_clustering_info object which exposes reserved dim names as _period/_cluster;
change clustering_info to return a copy with those internal names mapped to the
public names (period/cluster) so callers and apply()/apply_clustering() receive
public-facing dim names. Locate the clustering_info method and _clustering_info
attribute, create/return a shallow copy or new ClusteringInfo with reserved dims
renamed from "_period" -> "period" and "_cluster" -> "cluster" (preserving other
fields), and ensure any code using apply()/apply_clustering() consumes this
renamed copy so multi-period/multi-scenario callers no longer need manual
renaming.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 215af09e-63c9-45f5-a4a1-bb6a7e8b9bc1

📥 Commits

Reviewing files that changed from the base of the PR and between e803041 and 1ce6d87.

📒 Files selected for processing (8)
  • flixopt/clustering/__init__.py
  • flixopt/clustering/base.py
  • flixopt/io.py
  • flixopt/transform_accessor.py
  • tests/test_clustering/test_base.py
  • tests/test_clustering/test_cluster_reduce_expand.py
  • tests/test_clustering/test_integration.py
  • tests/test_clustering/test_multiperiod_extremes.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • flixopt/clustering/init.py
  • tests/test_clustering/test_cluster_reduce_expand.py
  • tests/test_clustering/test_base.py

Comment on lines +525 to +527
# n_clusters * n_segments
n_clusters = fs_clustered.clustering.n_clusters
assert fs_clustered.clustering.n_representatives == n_clusters * 4
assert n_clusters * fs_clustered.clustering.n_segments == n_clusters * 4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This assertion is tautological now.

Once Line 523 has already established n_segments == 4, this check can never fail. Assert a concrete reduced shape instead, e.g. the clustered time axis length.

Suggested fix
-        # n_clusters * n_segments
-        n_clusters = fs_clustered.clustering.n_clusters
-        assert n_clusters * fs_clustered.clustering.n_segments == n_clusters * 4
+        assert len(fs_clustered.timesteps) == 4
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_clustering/test_multiperiod_extremes.py` around lines 525 - 527,
The current assertion uses n_clusters * n_segments == n_clusters * 4 which is
tautological after n_segments == 4; replace it with a concrete shape/length
check on the clustered time axis to ensure the result size is correct. For
example, assert the length/shape of the clustered time array equals n_clusters *
4 (e.g. len(fs_clustered.time) == n_clusters * 4 or fs_clustered.time.shape[0]
== n_clusters * 4) or use the clustering object's time array if available (e.g.
fs_clustered.clustering.time.shape[0] == n_clusters * 4) instead of the current
tautological comparison.

FBumann and others added 2 commits April 1, 2026 08:14
…ggregate()

Replace the entire custom expansion machinery with tsam_xarray's
ClusteringResult.disaggregate(), combined with xarray's built-in
ffill/interpolate_na/fillna for post-processing segmented data.

Removed:
- expand_data(), build_expansion_divisor(), timestep_mapping (Clustering)
- _build_timestep_mapping() helper function
- _interpolate_charge_state_segmented(), _expand_first_timestep_only() (_Expander)
- _compute_position_within_segment() (_Expander)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove backwards-compat pattern matching (_is_state_variable,
_is_first_timestep_variable, _build_segment_total_varnames) that was
only needed for FlowSystems saved before _variable_categories was added
in v6.0. Now requires _variable_categories to always be present.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lude_original_data (#655)

BREAKING CHANGES:
- Remove ClusteringPlotAccessor (compare, heatmap, clusters methods)
- Remove original_data and aggregated_data from Clustering
- Remove _metrics from Clustering (use aggregation_result.accuracy instead)
- Remove include_original_data parameter from to_dataset/to_netcdf
- Simplify _create_reference_structure to just serialize ClusteringResult dict
- Remove build_metrics from _ReducedFlowSystemBuilder

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@FBumann FBumann changed the title refactor: migrate clustering from tsam to tsam_xarray refactor!: migrate clustering from tsam to tsam_xarray Apr 1, 2026
FBumann and others added 3 commits April 1, 2026 14:09
…pipeline

Verifies numerical equivalence of expanded solution values for both
non-segmented and segmented clustering, including flow rates, costs,
objectives, shapes, and NaN-free expansion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BREAKING: Remove 3 clustering notebooks, update 1:
- Remove 08d-clustering-multiperiod.ipynb (tsam_xarray handles this)
- Remove 08e-clustering-internals.ipynb (internals no longer exist)
- Remove 08f-clustering-segmentation.ipynb (just a parameter, covered in 08c)
- Update 08c-clustering.ipynb: remove clustering_data, data_vars, plot.compare,
  metrics sections; add tsam_xarray links; update API reference

5 clustering notebooks → 2 (08c + 08c2-storage-modes)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents users from silently overriding the computed timestep size
via tsam_kwargs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant