Skip to content

Fix slicing capabilities#206

Open
d33bs wants to merge 7 commits intocytomining:mainfrom
d33bs:dataframe-formatting-fixes
Open

Fix slicing capabilities#206
d33bs wants to merge 7 commits intocytomining:mainfrom
d33bs:dataframe-formatting-fixes

Conversation

@d33bs
Copy link
Copy Markdown
Member

@d33bs d33bs commented Mar 21, 2026

Description

This PR addresses an issue with CytoDataFrame when accessing slices (CDF[x:x:x], or CDF[x:x]).

Closes #204

What kind of change(s) are included?

  • Documentation (changes docs or other related content)
  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • These changes pass all pre-commit checks.
  • I have added comments to my code to help provide understanding
  • I have added a test which covers the code changes found within this PR
  • I have deleted all non-relevant text in this pull request template.

Summary by CodeRabbit

  • Bug Fixes

    • Slicing and transpose operations consistently preserve interactive/widget state and notebook HTML rendering; non-table (scalar) results are returned unchanged.
  • Chores

    • Updated pinned dev-tool versions and reorganized project tool configuration.
  • Tests

    • Added tests for slicing behavior, HTML rendering preservation, and passthrough of non-table results.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 21, 2026

📝 Walkthrough

Walkthrough

Introduces lifecycle flag and centralized result construction in CytoDataFrame to preserve widget/HTML state for pandas-derived results; adjusts transposition and wrapping behavior. Also pins two pre-commit hooks, reorders Bandit config in pyproject, and extends tests for slicing and passthrough behavior.

Changes

Cohort / File(s) Summary
Build & Configuration
.
\.pre-commit-config.yaml, pyproject.toml
Bumped pinned pre-commit hooks (tox-dev/pyproject-fmt v2.19.0→v2.20.0, astral-sh/ruff-pre-commit v0.15.6→v0.15.8). Moved [tool.bandit] exclude_dirs = ["tests"] earlier in pyproject.toml; no functional changes.
Core Frame Implementation
src/cytodataframe/frame.py
Added _custom_attrs["_initializing"] flag, changed varargs typing, introduced _constructor and _build_result_cdf() to centralize DataFrame→CytoDataFrame construction and preserve widget state (is_transposed, widget state, sliders, output). Updated __getitem__, _return_cytodataframe(), wrapper signatures, and HTML/transposition/copy branches to use the new helpers and avoid unnecessary copies when transposed.
Tests
tests/test_frame.py
Expanded tests: assert bracket and .iloc slicing return CytoDataFrame (including stepped and empty slices); added test confirming non-DataFrame results are returned unchanged; added tests that slicing preserves _custom_attrs identity and HTML rendering for regular and empty slices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through slices, keeping widgets tight,
Flags and constructors stitching states just right.
I build the frame gently, avoid a copy's fright,
Tests cheer the slices, empty or in flight —
A floppy-eared hooray under soft moonlight 🌙

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix slicing capabilities' is concise and directly addresses the main change—fixing CytoDataFrame slicing issues addressed in issue #204.
Linked Issues check ✅ Passed The PR addresses issue #204 by fixing slicing behavior to return properly formatted CytoDataFrame results, with comprehensive changes to handle bracket slicing, .iloc slicing, and transposition state preservation.
Out of Scope Changes check ✅ Passed All changes directly support the slicing fix objective: core slicing logic in frame.py, tests validating slicing behavior, and minor dependency/config updates that are appropriately scoped.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@d33bs
Copy link
Copy Markdown
Member Author

d33bs commented Mar 21, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@d33bs
Copy link
Copy Markdown
Member Author

d33bs commented Mar 27, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes CytoDataFrame slice access (cdf.iloc[x:y:z]) so derived frames preserve CytoDataFrame behavior and notebook HTML/widget display state (Issue #204).

Changes:

  • Add _constructor / _build_result_cdf pathway to preserve CytoDataFrame metadata across pandas operations (including iloc slicing).
  • Adjust transpose/HTML-generation logic to avoid relying on CytoDataFrame’s overridden transpose during rendering.
  • Add tests asserting iloc slices return CytoDataFrame and keep HTML formatting; bump pre-commit tool versions and relocate Bandit config in pyproject.toml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/cytodataframe/frame.py Introduces _constructor + _build_result_cdf to preserve CytoDataFrame state through slicing/operations; tweaks transpose + HTML generation logic.
tests/test_frame.py Adds regression tests for iloc slicing returning CytoDataFrame and preserving HTML renderer/widget state.
pyproject.toml Moves Bandit config block earlier in the file (plus comment retained).
.pre-commit-config.yaml Updates pyproject-fmt and ruff-pre-commit hook revisions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/cytodataframe/frame.py (1)

385-419: Fix star-arg unpacking after keyword arguments (B026).

The static analysis tool correctly flags line 406-407: unpacking *args after keyword arguments is strongly discouraged and can lead to confusing behavior. Since _build_result_cdf is primarily called from __getitem__ and _return_cytodataframe without extra positional args, consider removing the *args forwarding or restructuring.

♻️ Proposed fix to address B026 warning
 def _build_result_cdf(
     self: CytoDataFrame_type,
     data: Union["CytoDataFrame", pd.DataFrame, pd.Series, Any],
-    *args: Tuple[Any, ...],
     **kwargs: Dict[str, Any],
 ) -> CytoDataFrame_type:
     """Construct a result frame while preserving CytoDataFrame metadata."""
     if data is None and "data" in kwargs:
         data = kwargs.pop("data")

     cdf = CytoDataFrame(
         data=data,
         data_context_dir=self._custom_attrs["data_context_dir"],
         data_image_paths=self._custom_attrs["data_image_paths"],
         data_bounding_box=self._custom_attrs["data_bounding_box"],
         compartment_center_xy=self._custom_attrs["compartment_center_xy"],
         data_mask_context_dir=self._custom_attrs["data_mask_context_dir"],
         data_outline_context_dir=self._custom_attrs["data_outline_context_dir"],
         segmentation_file_regex=self._custom_attrs["segmentation_file_regex"],
         image_adjustment=self._custom_attrs["image_adjustment"],
         display_options=self._custom_attrs["display_options"],
-        *args,
         **kwargs,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/frame.py` around lines 385 - 419, The _build_result_cdf
function currently forwards *args after many keyword args to the CytoDataFrame
constructor which triggers B026 (star-arg unpacking after keywords); remove the
*args forwarding and any unused positional propagation (or move positional args
before the explicit keywords) so the call becomes an explicit keyword-only
construction of CytoDataFrame; update callers like __getitem__ and
_return_cytodataframe if they relied on passing extra positional args (validate
they don't pass extras) and ensure _build_result_cdf signature can omit *args
(or accept only **kwargs) to avoid the unsafe unpacking while preserving
metadata assignments to cdf._custom_attrs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cytodataframe/frame.py`:
- Around line 450-463: The code computes result (variable result) then only
assigns cdf when result is a pandas DataFrame, but always returns cdf, which can
trigger UnboundLocalError if result isn't a DataFrame; change the control flow
in the method that calls _build_result_cdf so that after computing result it
checks isinstance(result, pd.DataFrame) and: if True, call
self._build_result_cdf(result), set cdf._custom_attrs["is_transposed"] when
method_name == "transpose" and not self._custom_attrs["is_transposed"], and
return that cdf; otherwise directly return result (or explicitly return result
when not a DataFrame). Ensure references to result, cdf, method_name,
_build_result_cdf and _custom_attrs["is_transposed"] are updated accordingly.

---

Nitpick comments:
In `@src/cytodataframe/frame.py`:
- Around line 385-419: The _build_result_cdf function currently forwards *args
after many keyword args to the CytoDataFrame constructor which triggers B026
(star-arg unpacking after keywords); remove the *args forwarding and any unused
positional propagation (or move positional args before the explicit keywords) so
the call becomes an explicit keyword-only construction of CytoDataFrame; update
callers like __getitem__ and _return_cytodataframe if they relied on passing
extra positional args (validate they don't pass extras) and ensure
_build_result_cdf signature can omit *args (or accept only **kwargs) to avoid
the unsafe unpacking while preserving metadata assignments to cdf._custom_attrs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3fb3a13d-7af5-40c7-addd-bc765b2cd7f7

📥 Commits

Reviewing files that changed from the base of the PR and between cf60c30 and 29ef1a9.

📒 Files selected for processing (4)
  • .pre-commit-config.yaml
  • pyproject.toml
  • src/cytodataframe/frame.py
  • tests/test_frame.py

@d33bs d33bs marked this pull request as ready for review March 28, 2026 14:02
@d33bs d33bs requested a review from jenna-tomkinson as a code owner March 28, 2026 14:02
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_frame.py (1)

799-801: Add bracket-slice coverage too.

These tests only hit .iloc, but the production change also rewires CytoDataFrame.__getitem__() for cdf[x:y] / cdf[x:y:z]. A regression in the direct slice path would still pass this suite, so I'd add at least one type + _repr_html_(debug=True) assertion for cdf[0:2], cdf[1:1], and cdf[0:5:2].

Also applies to: 814-832

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_frame.py` around lines 799 - 801, The tests only exercise iloc
slicing but the change also affects CytoDataFrame.__getitem__ for direct bracket
slices; add parallel assertions for cdf[0:2], cdf[1:1], and cdf[0:5:2] that
check isinstance(..., CytoDataFrame) and call _repr_html_(debug=True) to ensure
the direct-slice path is covered; update the tests near the existing iloc checks
(and likewise for the block around 814-832) to include these bracket-slice
assertions so regressions in CytoDataFrame.__getitem__ are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cytodataframe/frame.py`:
- Around line 459-463: The transpose handling currently only sets
cdf._custom_attrs["is_transposed"] = True when the parent isn't transposed,
which fails to flip the flag on repeated transposes (e.g., cdf.T.T); change the
logic in the block after _build_result_cdf(result) (the code checking
method_name == "transpose") to toggle the transposed flag on the resulting cdf
(e.g., set cdf._custom_attrs["is_transposed"] to the inverse of the current flag
instead of only setting True) so _generate_jupyter_dataframe_html sees the
correct transposed state.

---

Nitpick comments:
In `@tests/test_frame.py`:
- Around line 799-801: The tests only exercise iloc slicing but the change also
affects CytoDataFrame.__getitem__ for direct bracket slices; add parallel
assertions for cdf[0:2], cdf[1:1], and cdf[0:5:2] that check isinstance(...,
CytoDataFrame) and call _repr_html_(debug=True) to ensure the direct-slice path
is covered; update the tests near the existing iloc checks (and likewise for the
block around 814-832) to include these bracket-slice assertions so regressions
in CytoDataFrame.__getitem__ are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 252f24c8-1e5b-4d56-a13a-9e9129d3ca76

📥 Commits

Reviewing files that changed from the base of the PR and between 29ef1a9 and 78a496f.

📒 Files selected for processing (3)
  • pyproject.toml
  • src/cytodataframe/frame.py
  • tests/test_frame.py
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/cytodataframe/frame.py (2)

4376-4381: Consider simplifying the conditional logic.

The condition if self._custom_attrs["is_transposed"] or bounding_box_externally_joined determines whether to reuse data or call self.copy(). However, at this point in the code:

  • If is_transposed is true, data was already assigned at line 4360
  • If bounding_box_externally_joined is true, data was assigned via join at lines 4369-4373

The condition correctly avoids redundant self.copy() calls when data is already initialized, but the pattern is repeated three times (lines 4376-4381, 4398-4404, 4428-4435). Consider extracting a helper flag like data_already_initialized set after each data = ... assignment to make the logic clearer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/frame.py` around lines 4376 - 4381, The conditional that
decides whether to reuse `data` or call `self.copy()` is repeated and unclear;
introduce a boolean flag (e.g., `data_already_initialized`) set to True
immediately after each `data = ...` assignment (the assignments tied to
`self._custom_attrs["is_transposed"]`, the join that sets `data` when
`bounding_box_externally_joined`, and any other place `data` is assigned) and
then replace the three repeated ternary checks with a single check `if
data_already_initialized: use data else: data = self.copy()` so that the logic
around `data` reuse in the blocks referencing `data` is explicit and only calls
`self.copy()` when needed.

392-393: Consider removing the defensive data check.

The data is None and "data" in kwargs check appears unreachable since data is a required positional parameter with no default value. If data is None, it was explicitly passed as None. Popping from kwargs would only matter if someone called _build_result_cdf(None, data=some_df), which is an unusual pattern.

If this is intentional for a specific use case (e.g., pandas internal calls), consider adding a brief comment explaining when this scenario occurs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cytodataframe/frame.py` around lines 392 - 393, The conditional check `if
data is None and "data" in kwargs: data = kwargs.pop("data")` in
_build_result_cdf is unnecessary because `data` is a required positional
parameter and this branch is effectively unreachable; remove that `if` block (or
if you intentionally expect callers to pass `None` positionally and a `data`
kwarg, keep it but add a concise comment above the line explaining that specific
pandas/internal calling pattern) so _build_result_cdf no longer pops `data` from
kwargs unexpectedly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/cytodataframe/frame.py`:
- Around line 4376-4381: The conditional that decides whether to reuse `data` or
call `self.copy()` is repeated and unclear; introduce a boolean flag (e.g.,
`data_already_initialized`) set to True immediately after each `data = ...`
assignment (the assignments tied to `self._custom_attrs["is_transposed"]`, the
join that sets `data` when `bounding_box_externally_joined`, and any other place
`data` is assigned) and then replace the three repeated ternary checks with a
single check `if data_already_initialized: use data else: data = self.copy()` so
that the logic around `data` reuse in the blocks referencing `data` is explicit
and only calls `self.copy()` when needed.
- Around line 392-393: The conditional check `if data is None and "data" in
kwargs: data = kwargs.pop("data")` in _build_result_cdf is unnecessary because
`data` is a required positional parameter and this branch is effectively
unreachable; remove that `if` block (or if you intentionally expect callers to
pass `None` positionally and a `data` kwarg, keep it but add a concise comment
above the line explaining that specific pandas/internal calling pattern) so
_build_result_cdf no longer pops `data` from kwargs unexpectedly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90cc0305-2d5e-4f72-a629-74495ed002a6

📥 Commits

Reviewing files that changed from the base of the PR and between 78a496f and 877b5ff.

📒 Files selected for processing (2)
  • src/cytodataframe/frame.py
  • tests/test_frame.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_frame.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Double check cdf.iloc[x:y:z] capabilities

2 participants