Skip to content

Remove BooleanCoder from zarr writes#11318

Open
elyall wants to merge 6 commits intopydata:mainfrom
elyall:remove-BooleanCoder-from-zarr-writes
Open

Remove BooleanCoder from zarr writes#11318
elyall wants to merge 6 commits intopydata:mainfrom
elyall:remove-BooleanCoder-from-zarr-writes

Conversation

@elyall
Copy link
Copy Markdown

@elyall elyall commented Apr 30, 2026

Description

The zarr backend currently converts boolean arrays to int8 before writing, using the BooleanCoder from the CF encoding pipeline. This was inherited from NetCDF compatibility (which can't store booleans natively), but zarr v2 and v3 both support bool dtype directly. The conversion results in:

  • On-disk arrays stored as int8 with a dtype: "bool" attribute, which non-xarray zarr readers don't understand
  • Unnecessary dtype mismatch when reading zarr stores outside xarray

This PR skips BooleanCoder during zarr writes by:

  1. Adding an optional coders parameter to conventions.encode_cf_variable() so backends can customize the encoding chain
  2. Having encode_zarr_variable() pass a filtered coder list that excludes BooleanCoder

The decode path is unchanged -- BooleanCoder.decode is attribute-driven (attrs.get('dtype') == 'bool'), so existing zarr stores written with the old int8 encoding still decode to bool correctly.

Checklist

AI Disclosure

  • This PR contains AI-generated content.
    • I have tested any AI-generated content in my PR.
    • I take responsibility for any AI-generated content in my PR.
  • Tools: Claude Opus 4.6
  • Prompt: "Please plan out the cleanest approach to skipping BooleanCoder for boolean arrays when writing to or reading from zarr files. Automatically skip BooleanCoder for all zarr writes (v2 and v3). Please provide a "Description" for the pull request and document changes in whats-new.rst."

@elyall elyall force-pushed the remove-BooleanCoder-from-zarr-writes branch from 64db4da to 1b0046a Compare April 30, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

encoding of boolean dtype in zarr

1 participant