refactor!: migrate clustering from tsam to tsam_xarray#654
refactor!: migrate clustering from tsam to tsam_xarray#654
Conversation
…ate() call tsam_xarray handles (period, scenario) slicing internally, eliminating manual iteration, expand_dims/combine helpers, and per-key DataFrame conversion. Simplifies _ReducedFlowSystemBuilder, ClusteringResults.apply(), and the main cluster/apply_clustering methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ray migration
- Remove broken Clustering.sel() (ignored args, use aggregation_result instead)
- Remove unused _from_aggregation_result classmethod
- Fix no-op rename({'cluster': 'cluster'}) in build_typical_periods
- Update docstrings: tsam 3.0 → tsam_xarray, DataFrame → DataArray
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces the per-slice tsam-based clustering API with a single tsam_xarray-backed model: Changes
Sequence Diagram(s)sequenceDiagram
participant FlowSystem as FlowSystem
participant Transform as TransformAccessor
participant TSamX as tsam_xarray
participant Clustering as flixopt.Clustering
FlowSystem->>Transform: request .cluster(...) or apply clustering
Transform->>TSamX: stack variables (rename reserved dims) -> tsam_xarray.aggregate()
TSamX-->>Transform: AggregationResult
Transform->>Clustering: construct with ClusteringInfo + AggregationResult + _unrename_map
Clustering->>TSamX: clustering_info.apply(stacked DataArray) (with unrename_map)
TSamX-->>Clustering: AggregationResult (assignments, durations, metrics)
Clustering-->>Transform: expose cluster_assignments, segment_durations, representatives, metrics
Transform->>FlowSystem: build reduced FlowSystem using clustering properties
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flixopt/transform_accessor.py (1)
1590-1608:⚠️ Potential issue | 🟠 MajorAdd
temporal_resolutionto reserved_tsam_keys.The guard blocks
timestep_duration, but the code setstemporal_resolutionat line 1662 and then merges**tsam_kwargsafterwards at line 1671. A caller can silently override the computed timestep size by passingtemporal_resolutionintsam_kwargs, breaking the validation intended to prevent parameter conflicts.🛡️ Minimal fix
reserved_tsam_keys = { 'n_clusters', 'period_duration', # exposed as cluster_duration + 'temporal_resolution', # computed automatically 'timestep_duration', # computed automatically 'cluster', 'segments', 'extremes',🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 1590 - 1608, The reserved_tsam_keys set currently blocks keys like 'timestep_duration' but not 'temporal_resolution', allowing callers to override the computed temporal_resolution when **tsam_kwargs is merged; update the reserved_tsam_keys set (the set literal defined near reserved_tsam_keys) to include the string 'temporal_resolution' so that the conflicts = reserved_tsam_keys & set(tsam_kwargs.keys()) check will catch and raise the ValueError, preventing silent overrides of the computed temporal_resolution before the code that sets temporal_resolution and later merges **tsam_kwargs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@flixopt/clustering/base.py`:
- Around line 1129-1151: The current clustering_info property returns
self._aggregation_result.clustering when _aggregation_result is present,
exposing internal renamed dimension names; change clustering_info to always
reconstruct a ClusteringInfo from self.results (use from tsam_xarray import
ClusteringInfo as ClusteringInfoClass) so dimensions are mapped back to original
names via rename_map = {v: k for k, v in self._unrename_map.items()}, compute
slice_dims = [rename_map.get(d, d) for d in results.dim_names], and return
ClusteringInfoClass(time_dim='time', cluster_dim=['variable'],
slice_dims=slice_dims, clusterings=dict(results._results)) instead of returning
_aggregation_result.clustering.
In `@pyproject.toml`:
- Line 66: The project imports symbols directly from the tsam package (e.g.,
ClusterConfig in flixopt/transform_accessor.py, ExtremeConfig in
flixopt/clustering/__init__.py, and ClusteringResult in
flixopt/clustering/base.py) but only declares tsam_xarray as a dependency; add
an explicit dependency entry for tsam in pyproject.toml (matching the same
compatible version range as tsam_xarray, e.g. "tsam >= 0.1.1, < 1") so that tsam
is a declared runtime dependency and not relied on transitively.
---
Outside diff comments:
In `@flixopt/transform_accessor.py`:
- Around line 1590-1608: The reserved_tsam_keys set currently blocks keys like
'timestep_duration' but not 'temporal_resolution', allowing callers to override
the computed temporal_resolution when **tsam_kwargs is merged; update the
reserved_tsam_keys set (the set literal defined near reserved_tsam_keys) to
include the string 'temporal_resolution' so that the conflicts =
reserved_tsam_keys & set(tsam_kwargs.keys()) check will catch and raise the
ValueError, preventing silent overrides of the computed temporal_resolution
before the code that sets temporal_resolution and later merges **tsam_kwargs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6b5ea3f2-8cd3-4bf8-a970-b7e96ccc6813
📒 Files selected for processing (4)
flixopt/clustering/__init__.pyflixopt/clustering/base.pyflixopt/transform_accessor.pypyproject.toml
…ngInfo BREAKING: ClusteringResults class removed. Clustering now delegates all structure properties (cluster_assignments, cluster_occurrences, segment_*, dims, coords, etc.) directly to tsam_xarray.ClusteringInfo. - Remove ClusteringResults class (~450 lines) and all _build_property_array machinery - Remove AggregationResults backwards-compat alias - Remove get_result(), cluster_start_positions, timesteps_per_period - Serialization uses ClusteringInfo.to_dict()/from_dict() (tsam_xarray>=0.5.1) - Auto-detect _unrename_map from slice_dims for deserialization - Update tests: remove TestClusteringResults, simplify test fixtures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
flixopt/transform_accessor.py (2)
1776-1777: Clarify error message reference to avoid confusion.The error message says
"Ensure self._fs.timesteps matches..."which exposes internal implementation details. Consider rewording to be more user-friendly.📝 Suggested improvement
- f'Ensure self._fs.timesteps matches the original data used for clustering.' + f'Ensure this FlowSystem has the same timesteps as the original data used for clustering.'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 1776 - 1777, The error message exposes internal details by referencing self._fs.timesteps; update the string in transform_accessor.py where that message is raised to a user-friendly message that avoids internal attribute names (e.g., "Ensure the time-step index/length used for clustering matches the input data" or similar), optionally include expected vs actual counts for clarity; locate the message near the raise/check that references self._fs.timesteps and replace the literal with the new wording while keeping the validation logic intact.
1703-1710: Weight clearing workaround for tsam 3.2 KeyError.The use of
object.__setattr__(cr, 'weights', {})is a hack to bypass what appears to be a frozen/immutableClusteringResultobject. This avoids a KeyError when applying clustering to data with different columns than the original.Consider adding a comment explaining this is a workaround for a specific tsam version issue, and whether this can be removed when upgrading tsam.
📝 Suggested comment addition
# Stack full dataset and apply existing clustering da_full = ds.to_dataarray(dim='variable') # Clear weight dict to avoid tsam 3.2 KeyError when applying to # data with different columns than original clustering + # TODO: Check if this workaround can be removed in future tsam versions info = agg_result.clustering for cr in info.clusterings.values(): object.__setattr__(cr, 'weights', {})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 1703 - 1710, Add a concise explanatory comment above the loop that clears weights describing that the object.__setattr__(cr, 'weights', {}) is an intentional workaround for a known tsam 3.2 KeyError when applying a ClusteringResult to data with different columns; mention that this bypasses immutability on the ClusteringResult (cr) and note the specific tsam version and that the code can be removed once tsam is upgraded/fixed. Also, ensure the comment references agg_result, info, and cr so future maintainers can easily find and revisit the workaround.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@flixopt/transform_accessor.py`:
- Around line 1776-1777: The error message exposes internal details by
referencing self._fs.timesteps; update the string in transform_accessor.py where
that message is raised to a user-friendly message that avoids internal attribute
names (e.g., "Ensure the time-step index/length used for clustering matches the
input data" or similar), optionally include expected vs actual counts for
clarity; locate the message near the raise/check that references
self._fs.timesteps and replace the literal with the new wording while keeping
the validation logic intact.
- Around line 1703-1710: Add a concise explanatory comment above the loop that
clears weights describing that the object.__setattr__(cr, 'weights', {}) is an
intentional workaround for a known tsam 3.2 KeyError when applying a
ClusteringResult to data with different columns; mention that this bypasses
immutability on the ClusteringResult (cr) and note the specific tsam version and
that the code can be removed once tsam is upgraded/fixed. Also, ensure the
comment references agg_result, info, and cr so future maintainers can easily
find and revisit the workaround.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9795502e-8122-416e-85c4-b44b64e4b85a
📒 Files selected for processing (6)
flixopt/clustering/__init__.pyflixopt/clustering/base.pyflixopt/transform_accessor.pypyproject.tomltests/test_clustering/test_base.pytests/test_clustering/test_clustering_io.py
🚧 Files skipped from review as they are similar to previous changes (2)
- pyproject.toml
- flixopt/clustering/init.py
- Add TODO comments for object.__setattr__ weight-clearing workaround - Add tsam as explicit dependency (directly imported for ClusterConfig etc.) - Fix dev version pin: tsam_xarray>=0.5.1 (was >=0.1.1) - Add assertions for variable dim detection in build_typical_periods/metrics - Inline _build_property_array into timestep_mapping (single-use helper) - Document when legacy 'results' kwarg can be removed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When data_vars is specified, instead of clustering a subset and re-applying to full data, now all time-varying data goes to tsam_xarray with weight=0 for excluded variables. This removes the object.__setattr__ hack in cluster(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
flixopt/transform_accessor.py (3)
1783-1785: Minor: Internal reference in user-facing error message.The error message mentions
self._fs.timestepswhich exposes implementation details. Consider rephrasing:Suggested improvement
- f'Ensure self._fs.timesteps matches the original data used for clustering.' + f'Ensure the FlowSystem timesteps match the original data used for clustering.'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 1783 - 1785, The error message currently exposes an internal attribute name (self._fs.timesteps) — update the message constructed near where clustering.timesteps_per_cluster is referenced to avoid implementation details; replace the explicit attribute mention with a user-facing phrase like "the dataset's timesteps" or "the filesystem's recorded timesteps" (or include expected vs actual counts) so the check still informs the user to ensure the timesteps match the data used for clustering without referencing self._fs.timesteps directly.
160-163: Same assertion pattern in build_metrics.Apply the same refactor here for consistency.
Suggested improvement
unknown_dims = [d for d in metric_da.dims if d not in known_metric_dims] - assert len(unknown_dims) == 1, f'Expected 1 variable dim in {metric_name}, got {unknown_dims}' + if len(unknown_dims) != 1: + raise ValueError(f'Expected 1 variable dim in {metric_name}, got {unknown_dims}') variable_dim = unknown_dims[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 160 - 163, Replace the bare assert in this block with the same explicit validation used in build_metrics: after computing known_metric_dims and unknown_dims from metric_da.dims, check if len(unknown_dims) != 1 and raise a clear exception (e.g. ValueError) that includes metric_name and the unexpected unknown_dims; then safely set variable_dim = unknown_dims[0]. This mirrors the build_metrics refactor and removes the assertion while keeping the same variable names (known_metric_dims, unknown_dims, metric_da, metric_name, variable_dim).
110-114: Consider softer handling for variable dimension detection.The assertion
assert len(unknown_dims) == 1will crash with anAssertionErrorin production if tsam_xarray's output structure changes. Consider raising aValueErrorwith the descriptive message instead, or adding a fallback.Suggested improvement
unknown_dims = [d for d in representatives.dims if d not in known_dims] - assert len(unknown_dims) == 1, ( - f'Expected exactly 1 variable dim, got {unknown_dims} (known: {known_dims}, all: {representatives.dims})' - ) + if len(unknown_dims) != 1: + raise ValueError( + f'Expected exactly 1 variable dim, got {unknown_dims} (known: {known_dims}, all: {representatives.dims})' + ) variable_dim = unknown_dims[0]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 110 - 114, Replace the hard assertion that checks variable-dim detection with a safer runtime error or fallback: instead of using assert len(unknown_dims) == 1 in the block that builds known_dims and unknown_dims (references: known_dims, unknown_dims, representatives.dims, self._unrename_map), raise a ValueError with the same descriptive message when the condition fails (or implement a fallback branch that handles 0 or >1 unknown dims by choosing a default dimension or logging and returning an empty/validated result). Ensure the raised error contains the original diagnostic text f'Expected exactly 1 variable dim, got {unknown_dims} (known: {known_dims}, all: {representatives.dims})' so callers can debug easily.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@flixopt/transform_accessor.py`:
- Around line 1711-1718: The TODO uses a placeholder issue link and the
workaround mutates frozen dataclass fields via object.__setattr__, so replace
"https://github.com/FZJ-IEK3-VSA/tsam/issues/XXX" with the actual tsam_xarray
issue number (or PR) that tracks mismatched weights in ClusteringInfo.apply(),
and update the comment to reference that real issue; keep the existing
workaround (setting agg_result.clustering -> for cr in
info.clusterings.values(): object.__setattr__(cr, 'weights', {})) but add a
short note in the comment that this is a temporary hack and must be removed once
the referenced issue is resolved, referring to ClusteringInfo.apply,
agg_result.clustering, and the cr.weights mutation so future maintainers can
find and revert it.
---
Nitpick comments:
In `@flixopt/transform_accessor.py`:
- Around line 1783-1785: The error message currently exposes an internal
attribute name (self._fs.timesteps) — update the message constructed near where
clustering.timesteps_per_cluster is referenced to avoid implementation details;
replace the explicit attribute mention with a user-facing phrase like "the
dataset's timesteps" or "the filesystem's recorded timesteps" (or include
expected vs actual counts) so the check still informs the user to ensure the
timesteps match the data used for clustering without referencing
self._fs.timesteps directly.
- Around line 160-163: Replace the bare assert in this block with the same
explicit validation used in build_metrics: after computing known_metric_dims and
unknown_dims from metric_da.dims, check if len(unknown_dims) != 1 and raise a
clear exception (e.g. ValueError) that includes metric_name and the unexpected
unknown_dims; then safely set variable_dim = unknown_dims[0]. This mirrors the
build_metrics refactor and removes the assertion while keeping the same variable
names (known_metric_dims, unknown_dims, metric_da, metric_name, variable_dim).
- Around line 110-114: Replace the hard assertion that checks variable-dim
detection with a safer runtime error or fallback: instead of using assert
len(unknown_dims) == 1 in the block that builds known_dims and unknown_dims
(references: known_dims, unknown_dims, representatives.dims,
self._unrename_map), raise a ValueError with the same descriptive message when
the condition fails (or implement a fallback branch that handles 0 or >1 unknown
dims by choosing a default dimension or logging and returning an empty/validated
result). Ensure the raised error contains the original diagnostic text
f'Expected exactly 1 variable dim, got {unknown_dims} (known: {known_dims}, all:
{representatives.dims})' so callers can debug easily.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9cd80fc4-e0c9-4529-ac63-7ed7085daa96
📒 Files selected for processing (3)
flixopt/clustering/base.pyflixopt/transform_accessor.pypyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
…t from API
BREAKING CHANGES:
- Remove `data_vars` parameter from `cluster()` — use `ClusterConfig(weights={var: 0})`
to exclude variables from influencing clustering instead
- Remove `clustering_group` and `clustering_weight` from `TimeSeriesData` — use
`ClusterConfig(weights={...})` directly
- Remove `_calculate_clustering_weights` and `_build_cluster_config_with_weights`
- Stop dropping constant arrays before clustering — pass all time-dim data to tsam_xarray
- Weight validation now handled by tsam_xarray directly
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flixopt/transform_accessor.py (1)
1487-1498:⚠️ Potential issue | 🟠 MajorReserve
temporal_resolution,weights,time_dim, andcluster_dimin thetsam_kwargsguard.Lines 1554-1559 build
tsam_kwargs_fullwith explicittemporal_resolution=dt, then merge**tsam_kwargs. Because this merged dict is unpacked in thetsam_xarray.aggregate()call (lines 1599-1604), a caller passing any of these four keys can override the method's own constraints:temporal_resolutionsilently alters weighting/duration logic, whileweights,time_dim, andcluster_dimcause duplicate-key TypeErrors.Proposed fix
reserved_tsam_keys = { 'n_clusters', 'period_duration', # exposed as cluster_duration + 'temporal_resolution', 'timestep_duration', # computed automatically 'cluster', 'segments', 'extremes', 'preserve_column_means', 'rescale_exclude_columns', 'round_decimals', 'numerical_tolerance', + 'weights', + 'time_dim', + 'cluster_dim', }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 1487 - 1498, The reserved_tsam_keys set used to guard tsam_kwargs must include the four keys that the method itself sets or passes to tsam_xarray.aggregate to prevent callers from overriding them; update the reserved_tsam_keys (currently defined near the top of transform_accessor.py) to also contain 'temporal_resolution', 'weights', 'time_dim', and 'cluster_dim' so that the merge building tsam_kwargs_full (where temporal_resolution=dt is set and then **tsam_kwargs is merged) cannot be overridden and will not produce duplicate-key errors when calling tsam_xarray.aggregate.
♻️ Duplicate comments (1)
flixopt/transform_accessor.py (1)
1676-1693:⚠️ Potential issue | 🟠 Major
apply_clustering()still drops invariantscenarioslices.Unlike the
cluster()path at Lines 1534-1547, Lines 1677-1681 only re-add dims that were renamed intorename_map. On multi-scenario systems where no variable explicitly carriesscenario,ds.to_dataarray(dim='variable')can lose that dimension whileslice_dimsstill expects it, soinfo.apply()no longer applies the stored per-scenario clustering reliably.💡 Proposed fix
- # Ensure extra dims are present in DataArray - for _orig_name, renamed in rename_map.items(): - if renamed not in da_full.dims and renamed in ds.dims: - if renamed in da_full.coords: - da_full = da_full.drop_vars(renamed) - da_full = da_full.expand_dims({renamed: ds.coords[renamed].values}) + # Ensure period/scenario slice dims are present even if no variable varies across them + extra_dims = [] + if self._fs.periods is not None: + extra_dims.append(rename_map.get('period', 'period')) + if self._fs.scenarios is not None: + extra_dims.append(rename_map.get('scenario', 'scenario')) + for dim_name in extra_dims: + if dim_name not in da_full.dims and dim_name in ds.dims: + if dim_name in da_full.coords: + da_full = da_full.drop_vars(dim_name) + da_full = da_full.expand_dims({dim_name: ds.coords[dim_name].values})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/transform_accessor.py` around lines 1676 - 1693, apply_clustering() currently only re-adds dims found in rename_map, so invariant dims like "scenario" can be lost before building ClusteringInfo and calling info.apply(); modify the loop that ensures extra dims on da_full (the block that iterates rename_map) to instead iterate clustering.dim_names and for each dim compute the renamed name via rename_map.get(dim, dim), then if that renamed dim is missing from da_full but present in ds.dims add it back (drop coord first if present) using da_full.expand_dims({renamed: ds.coords[dim].values}); this mirrors the behavior in the cluster() path and guarantees that dims expected by ClusteringInfoClass and info.apply() (e.g., scenario) are preserved even when not explicitly renamed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@flixopt/core.py`:
- Around line 67-72: The docstring for from_dataarray is outdated — it claims
metadata is extracted from attrs but the current implementation just wraps the
incoming xr.DataArray; update the from_dataarray(cls, da: xr.DataArray)
docstring to accurately describe its behavior (e.g., "Return a TimeSeriesData
wrapping the given DataArray without extracting or modifying attrs") and remove
any mention of extracting metadata from attrs so the API documentation matches
the implementation in the from_dataarray method.
---
Outside diff comments:
In `@flixopt/transform_accessor.py`:
- Around line 1487-1498: The reserved_tsam_keys set used to guard tsam_kwargs
must include the four keys that the method itself sets or passes to
tsam_xarray.aggregate to prevent callers from overriding them; update the
reserved_tsam_keys (currently defined near the top of transform_accessor.py) to
also contain 'temporal_resolution', 'weights', 'time_dim', and 'cluster_dim' so
that the merge building tsam_kwargs_full (where temporal_resolution=dt is set
and then **tsam_kwargs is merged) cannot be overridden and will not produce
duplicate-key errors when calling tsam_xarray.aggregate.
---
Duplicate comments:
In `@flixopt/transform_accessor.py`:
- Around line 1676-1693: apply_clustering() currently only re-adds dims found in
rename_map, so invariant dims like "scenario" can be lost before building
ClusteringInfo and calling info.apply(); modify the loop that ensures extra dims
on da_full (the block that iterates rename_map) to instead iterate
clustering.dim_names and for each dim compute the renamed name via
rename_map.get(dim, dim), then if that renamed dim is missing from da_full but
present in ds.dims add it back (drop coord first if present) using
da_full.expand_dims({renamed: ds.coords[dim].values}); this mirrors the behavior
in the cluster() path and guarantees that dims expected by ClusteringInfoClass
and info.apply() (e.g., scenario) are preserved even when not explicitly
renamed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 621a5a54-6457-40b7-baae-69f81924b804
📒 Files selected for processing (8)
flixopt/core.pyflixopt/transform_accessor.pytests/conftest.pytests/deprecated/conftest.pytests/deprecated/examples/03_Optimization_modes/example_optimization_modes.pytests/test_clustering/test_cluster_reduce_expand.pytests/test_clustering/test_integration.pytests/utilities/test_dataconverter.py
💤 Files with no reviewable changes (1)
- tests/test_clustering/test_cluster_reduce_expand.py
| def from_dataarray( | ||
| cls, | ||
| da: xr.DataArray, | ||
| clustering_group: str | None = None, | ||
| clustering_weight: float | None = None, | ||
| ): | ||
| """Create TimeSeriesData from DataArray, extracting metadata from attrs.""" | ||
| final_clustering_group = clustering_group if clustering_group is not None else da.attrs.get('clustering_group') | ||
| final_clustering_weight = ( | ||
| clustering_weight if clustering_weight is not None else da.attrs.get('clustering_weight') | ||
| ) | ||
|
|
||
| return cls(da, clustering_group=final_clustering_group, clustering_weight=final_clustering_weight) | ||
| return cls(da) |
There was a problem hiding this comment.
Update the from_dataarray() docstring.
Line 71 still says this path extracts metadata from attrs, but the new implementation just wraps the incoming DataArray. Leaving that wording in place makes the post-refactor API behavior misleading.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@flixopt/core.py` around lines 67 - 72, The docstring for from_dataarray is
outdated — it claims metadata is extracted from attrs but the current
implementation just wraps the incoming xr.DataArray; update the
from_dataarray(cls, da: xr.DataArray) docstring to accurately describe its
behavior (e.g., "Return a TimeSeriesData wrapping the given DataArray without
extracting or modifying attrs") and remove any mention of extracting metadata
from attrs so the API documentation matches the implementation in the
from_dataarray method.
…ering_data, etc. BREAKING CHANGES: - Remove from Clustering: dims, coords, cluster_centers, segment_centers, n_representatives, representative_weights, metrics, position_within_segment - Remove clustering_data() from TransformAccessor - Move position_within_segment computation into _Expander (only consumer) - Use clustering_info directly for advanced access (dims, coords, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rray Aligns with tsam_xarray's ClusteringResult class name. Renames property, internal attribute, parameters, and local variables throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
flixopt/clustering/base.py (1)
228-230:⚠️ Potential issue | 🟠 MajorPublic
clustering_info/apply()still expose internal_periodnaming.
TransformAccessor.cluster()stores reserved dims as_period/_clusterinside_clustering_info. Returning that object verbatim here and calling it directly on Line 481 means multi-period or multi-scenario callers still have to rename their input arrays manually, even though the rest ofClusteringpresents the publicperiod/clusternames.Also applies to: 469-481
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flixopt/clustering/base.py` around lines 228 - 230, clustering_info currently returns the internal _clustering_info object which exposes reserved dim names as _period/_cluster; change clustering_info to return a copy with those internal names mapped to the public names (period/cluster) so callers and apply()/apply_clustering() receive public-facing dim names. Locate the clustering_info method and _clustering_info attribute, create/return a shallow copy or new ClusteringInfo with reserved dims renamed from "_period" -> "period" and "_cluster" -> "cluster" (preserving other fields), and ensure any code using apply()/apply_clustering() consumes this renamed copy so multi-period/multi-scenario callers no longer need manual renaming.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@flixopt/clustering/base.py`:
- Around line 135-146: The code treats any dict passed to clustering_info or
results as a new ClusteringInfo payload, skipping the legacy adapter; update the
dict-handling branches so that when a dict resembles the old schema (the form
produced by from_json()/legacy "results" payloads or ClusteringResults), you
call the legacy adapter (e.g. _clustering_info_from_legacy_dict) instead of
_clustering_info_from_dict. Specifically, in the blocks that inspect
clustering_info and results (the isinstance(..., dict) paths and the from_json
flow), detect legacy keys/shape and route to _clustering_info_from_legacy_dict
(fall back to _clustering_info_from_dict only for current-schema dicts) so old
JSON/NetCDF payloads are handled correctly while preserving the new-path
behavior.
In `@flixopt/transform_accessor.py`:
- Around line 1645-1658: The current code builds a shallow copy via
dict(info.clusterings) which still references the same clustering result
objects, so the subsequent loop using object.__setattr__(cr, 'weights', {})
mutates the caller's clustering; fix by creating an independent deep copy of the
clustering result objects before mutating them (e.g., replace
dict(info.clusterings) with a deep copy of
clustering.clustering_info.clusterings or construct a new dict that clones each
clustering result object) so modifications to weights affect only the local
ClusteringInfoClass instance used before calling info.apply(da_full).
In `@tests/test_clustering/test_multiperiod_extremes.py`:
- Around line 525-527: The current assertion uses n_clusters * n_segments ==
n_clusters * 4 which is tautological after n_segments == 4; replace it with a
concrete shape/length check on the clustered time axis to ensure the result size
is correct. For example, assert the length/shape of the clustered time array
equals n_clusters * 4 (e.g. len(fs_clustered.time) == n_clusters * 4 or
fs_clustered.time.shape[0] == n_clusters * 4) or use the clustering object's
time array if available (e.g. fs_clustered.clustering.time.shape[0] ==
n_clusters * 4) instead of the current tautological comparison.
---
Duplicate comments:
In `@flixopt/clustering/base.py`:
- Around line 228-230: clustering_info currently returns the internal
_clustering_info object which exposes reserved dim names as _period/_cluster;
change clustering_info to return a copy with those internal names mapped to the
public names (period/cluster) so callers and apply()/apply_clustering() receive
public-facing dim names. Locate the clustering_info method and _clustering_info
attribute, create/return a shallow copy or new ClusteringInfo with reserved dims
renamed from "_period" -> "period" and "_cluster" -> "cluster" (preserving other
fields), and ensure any code using apply()/apply_clustering() consumes this
renamed copy so multi-period/multi-scenario callers no longer need manual
renaming.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 215af09e-63c9-45f5-a4a1-bb6a7e8b9bc1
📒 Files selected for processing (8)
flixopt/clustering/__init__.pyflixopt/clustering/base.pyflixopt/io.pyflixopt/transform_accessor.pytests/test_clustering/test_base.pytests/test_clustering/test_cluster_reduce_expand.pytests/test_clustering/test_integration.pytests/test_clustering/test_multiperiod_extremes.py
🚧 Files skipped from review as they are similar to previous changes (3)
- flixopt/clustering/init.py
- tests/test_clustering/test_cluster_reduce_expand.py
- tests/test_clustering/test_base.py
| # n_clusters * n_segments | ||
| n_clusters = fs_clustered.clustering.n_clusters | ||
| assert fs_clustered.clustering.n_representatives == n_clusters * 4 | ||
| assert n_clusters * fs_clustered.clustering.n_segments == n_clusters * 4 |
There was a problem hiding this comment.
This assertion is tautological now.
Once Line 523 has already established n_segments == 4, this check can never fail. Assert a concrete reduced shape instead, e.g. the clustered time axis length.
Suggested fix
- # n_clusters * n_segments
- n_clusters = fs_clustered.clustering.n_clusters
- assert n_clusters * fs_clustered.clustering.n_segments == n_clusters * 4
+ assert len(fs_clustered.timesteps) == 4🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_clustering/test_multiperiod_extremes.py` around lines 525 - 527,
The current assertion uses n_clusters * n_segments == n_clusters * 4 which is
tautological after n_segments == 4; replace it with a concrete shape/length
check on the clustered time axis to ensure the result size is correct. For
example, assert the length/shape of the clustered time array equals n_clusters *
4 (e.g. len(fs_clustered.time) == n_clusters * 4 or fs_clustered.time.shape[0]
== n_clusters * 4) or use the clustering object's time array if available (e.g.
fs_clustered.clustering.time.shape[0] == n_clusters * 4) instead of the current
tautological comparison.
…ggregate() Replace the entire custom expansion machinery with tsam_xarray's ClusteringResult.disaggregate(), combined with xarray's built-in ffill/interpolate_na/fillna for post-processing segmented data. Removed: - expand_data(), build_expansion_divisor(), timestep_mapping (Clustering) - _build_timestep_mapping() helper function - _interpolate_charge_state_segmented(), _expand_first_timestep_only() (_Expander) - _compute_position_within_segment() (_Expander) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove backwards-compat pattern matching (_is_state_variable, _is_first_timestep_variable, _build_segment_total_varnames) that was only needed for FlowSystems saved before _variable_categories was added in v6.0. Now requires _variable_categories to always be present. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lude_original_data (#655) BREAKING CHANGES: - Remove ClusteringPlotAccessor (compare, heatmap, clusters methods) - Remove original_data and aggregated_data from Clustering - Remove _metrics from Clustering (use aggregation_result.accuracy instead) - Remove include_original_data parameter from to_dataset/to_netcdf - Simplify _create_reference_structure to just serialize ClusteringResult dict - Remove build_metrics from _ReducedFlowSystemBuilder Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…pipeline Verifies numerical equivalence of expanded solution values for both non-segmented and segmented clustering, including flow rates, costs, objectives, shapes, and NaN-free expansion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BREAKING: Remove 3 clustering notebooks, update 1: - Remove 08d-clustering-multiperiod.ipynb (tsam_xarray handles this) - Remove 08e-clustering-internals.ipynb (internals no longer exist) - Remove 08f-clustering-segmentation.ipynb (just a parameter, covered in 08c) - Update 08c-clustering.ipynb: remove clustering_data, data_vars, plot.compare, metrics sections; add tsam_xarray links; update API reference 5 clustering notebooks → 2 (08c + 08c2-storage-modes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents users from silently overriding the computed timestep size via tsam_kwargs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Complete refactoring of clustering from tsam to tsam_xarray.
base.pywent from 1633 → 393 lines (-76%).What changed
tsam_xarray.aggregate()call replaces per-slicetsam.aggregate()loopsClusteringResultsclass removed —Clusteringdelegates directly totsam_xarray.ClusteringResultClusteringPlotAccessorremoved — users plot with tsam_xarray's DataArrays directlyexpand_data()/timestep_mappingreplaced withtsam_xarray.ClusteringResult.disaggregate()+ xarray'sffill/interpolate_na/fillnadata_varsparameter removed — useClusterConfig(weights={var: 0})insteadclustering_group/clustering_weightremoved fromTimeSeriesDataoriginal_data/aggregated_data/_metricsremoved fromClusteringinclude_original_dataparameter removed fromto_dataset()/to_netcdf()_variable_categoriesfallback removed from_ExpanderClusteringResult.to_dict()/from_dict()(tsam_xarray ≥ 0.5.1)What remains on
Clusteringn_clusters,timesteps_per_cluster,n_original_clusters,n_segments,is_segmented,dim_names,cluster_assignments,cluster_occurrences,segment_assignments,segment_durationsdisaggregate(),apply(),to_json()/from_json()clustering_result(tsam_xarray),aggregation_result(pre-IO only)Dependencies
tsam_xarray >= 0.5.1(wastsam >= 3.1.2)tsam >= 3.1.2(explicit, forClusterConfig/ExtremeConfig/SegmentConfig)Test plan
🤖 Generated with Claude Code