Preserve Categorical dtype for color_vector with non-unique colors#542
Merged
Preserve Categorical dtype for color_vector with non-unique colors#542
Conversation
pd.Categorical.map() silently demotes to object dtype when mapped values aren't unique (e.g. two categories share the same color). This caused _map_color_seg to take a slow path where label2rgb processed one color per instance instead of just the unique categories, and caused datashader to ignore assigned colors for shapes/points. Wrap the .map() result back in pd.Categorical to ensure downstream consumers always receive a Categorical for categorical data. Also adds explicit na_action="ignore" to silence the FutureWarning from pandas >=2.1. Closes #469 Closes #540 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #542 +/- ##
=======================================
Coverage 83.96% 83.96%
=======================================
Files 9 9
Lines 2595 2595
=======================================
Hits 2179 2179
Misses 416 416
🚀 New features to boost your workflow:
|
Replace three inconsistent idioms for checking categorical dtype: - pd.api.types.is_categorical_dtype() (deprecated in pandas 2.1) - type(x) is pd.core.arrays.categorical.Categorical (checks array type) - isinstance(x.dtype, pd.CategoricalDtype) (canonical) All four call sites now use the same isinstance pattern, matching the rest of the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pd.Categorical.map()silently demotes toobjectdtype when mapped values aren't unique (e.g. two categories share the same color via.unscolors, or >103 categories all mapping to grey). This one-line fix wraps the result back inpd.Categorical._map_color_segnow always takes the fast code-basedlabel2rgbpath (2-3x faster per render pass, compounding with fill+outline)color_keyis now correctly built from assigned colors instead of being silentlyNonena_action="ignore"to silence theFutureWarningfrom pandas >=2.1Closes #469
Closes #540
Details
The root cause is in
_set_color_source_vecwherecolor_source_vector.map(color_mapping)can return anIndexwithobjectdtype instead of aCategoricalwhen the color mapping contains duplicate values. This happens because pandas can't represent non-unique values as categorical categories.Downstream effects of the
objectdtype:_map_color_segfalls into Case D (slow path):label2rgbprocesses N colors per instance instead of K unique category colorscolor_keychecktype(color_vector) is Categoricalfails → assigned colors are ignoredassert color_source_vector is None(latent crash when rasterization drops labels)The fix is verified to produce pixel-identical output at all tested scales (20–5000 cells).
Test plan
render_labelswith non-unique.unscolors completes without error🤖 Generated with Claude Code