Skip to content

Preserve Categorical dtype for color_vector with non-unique colors#542

Merged
timtreis merged 2 commits intomainfrom
fix/render-labels-categorical-performance
Mar 5, 2026
Merged

Preserve Categorical dtype for color_vector with non-unique colors#542
timtreis merged 2 commits intomainfrom
fix/render-labels-categorical-performance

Conversation

@timtreis
Copy link
Member

@timtreis timtreis commented Mar 5, 2026

Summary

  • pd.Categorical.map() silently demotes to object dtype when mapped values aren't unique (e.g. two categories share the same color via .uns colors, or >103 categories all mapping to grey). This one-line fix wraps the result back in pd.Categorical.
  • Labels: _map_color_seg now always takes the fast code-based label2rgb path (2-3x faster per render pass, compounding with fill+outline)
  • Shapes/Points: datashader color_key is now correctly built from assigned colors instead of being silently None
  • Also adds explicit na_action="ignore" to silence the FutureWarning from pandas >=2.1

Closes #469
Closes #540

Details

The root cause is in _set_color_source_vec where color_source_vector.map(color_mapping) can return an Index with object dtype instead of a Categorical when the color mapping contains duplicate values. This happens because pandas can't represent non-unique values as categorical categories.

Downstream effects of the object dtype:

  • _map_color_seg falls into Case D (slow path): label2rgb processes N colors per instance instead of K unique category colors
  • Shapes/points datashader: color_key check type(color_vector) is Categorical fails → assigned colors are ignored
  • Labels rasterize path: hits assert color_source_vector is None (latent crash when rasterization drops labels)

The fix is verified to produce pixel-identical output at all tested scales (20–5000 cells).

Test plan

  • Pixel-identical output verified for Categorical vs object-dtype paths (small/medium/large)
  • NaN values correctly preserved (code=-1 → background)
  • End-to-end render_labels with non-unique .uns colors completes without error
  • Existing test suite passes

🤖 Generated with Claude Code

pd.Categorical.map() silently demotes to object dtype when mapped
values aren't unique (e.g. two categories share the same color).
This caused _map_color_seg to take a slow path where label2rgb
processed one color per instance instead of just the unique
categories, and caused datashader to ignore assigned colors for
shapes/points.

Wrap the .map() result back in pd.Categorical to ensure downstream
consumers always receive a Categorical for categorical data.

Also adds explicit na_action="ignore" to silence the FutureWarning
from pandas >=2.1.

Closes #469
Closes #540

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link

codecov-commenter commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.96%. Comparing base (c538e7f) to head (75d3dfa).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #542   +/-   ##
=======================================
  Coverage   83.96%   83.96%           
=======================================
  Files           9        9           
  Lines        2595     2595           
=======================================
  Hits         2179     2179           
  Misses        416      416           
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/render.py 88.80% <ø> (ø)
src/spatialdata_plot/pl/utils.py 79.30% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Replace three inconsistent idioms for checking categorical dtype:
- pd.api.types.is_categorical_dtype() (deprecated in pandas 2.1)
- type(x) is pd.core.arrays.categorical.Categorical (checks array type)
- isinstance(x.dtype, pd.CategoricalDtype) (canonical)

All four call sites now use the same isinstance pattern, matching
the rest of the codebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@timtreis timtreis merged commit c0ab403 into main Mar 5, 2026
4 checks passed
@timtreis timtreis deleted the fix/render-labels-categorical-performance branch March 5, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants