fix: handle io array shape dimensionality by ilan-gold · Pull Request #157 · zarrs/zarrs-python

ilan-gold · 2026-03-11T16:26:43Z

Background

Looking into #152 I discovered an extra copy that was occurring on partial reads and fixed that with zarrs/zarrs#362.

Previously, before zarrs/zarrs#362, partial_decode_into had the output_view simply copy_from_slice i.e., no shape/dimensionality checks but it required an extra copy. As long as the number of bytes were correct, the copy occurred. So now with the fix in zarrs, checks occur because zarrs APIs check this dimensionality when constructing the subsets to read directly into the output_view.

Why is this a problem for zarrs?

Outputs with extra singleton dimensions now would fail against main with 0.26.3 zarrs, even though the number of bytes is correct. The view's dimensionality must match that of the chunk subsets. So that needs to be handled explicitly now, hence this PR.

Unfortunately, drop_axes is only meant to indicate dropped axes when indexing by the chunk subset would not drop an axis that the input/output would expect to be dropped due to the way zarr implements a given indexing operation i.e., it makes the chunk subset match the output/input subset and not the reverse (or not to indicate that they both have a dropped axis compared to the overall array shape). So you have something like

https://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L616-L621

where there are no dropped axes marked on the basic class i.e., drop_axes == () always. But in reality, the axes can be implicitly dropped thanks to numpy indexing on both the input/output array and each subsetted chunk when applicable.

For example, you could be reading into out which could have shape (128, 128) but your array has shape (128, 128, 128) and you just requested z[0, ...] i.e., (1, 0..128, 0..128). In this case, drop_axes would actually be () because both the input/output array and the subset have dropped axes compared to the array shape.

The only place there are non-() drop_axes is advanced orthogonal indexing i.e., numpy.ndarray:

https://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L941-L949

But if someone does oindex-ing with, say, a numpy.ndarray and an int, drop_axes gets picked up

https://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L941-L949

And this makes sense given numpy behavior:

import numpy as np

np.arange(16).reshape((4, 4))[0, np.array([0, 2])].shape == (2,)

but zarr implements this code path through outer indexing + ix_, which would result in something like

import numpy as np

np.arange(16).reshape((4, 4))[np.ix_(np.array([0]), np.array([0, 2]))].shape == (1, 2)

and so the dropped axes has to be explicitly tracked (instead of relying on numpy's own axis-dropping behavior).

Solution

I have tried to encapsulate the logic where a new singleton dimension has been added relative to the underlying zarr array, beyond just outer indexing (i.e., v-indexing or mixed indexing). In an ideal world, everything that hits the codec pipeline would have the same dimensionality as the underlying zarr.Array and then there is just a final reshape or similar.

cc @d-v-b re: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/Clarification.20on.20zarr.2EArray.2E__setitem__.20behaviour/with/578643578 This analysis may be of interest

ilan-gold · 2026-03-11T16:28:02Z

python/zarrs/utils.py

+            shape_chunk_selection_slices
+        ):
+            shape_ctr = 0
+            io_array_shape = []


In theory, this value should be constant - it might be worth only calculating it once

flying-sheep

Nice! Annoying that we have to deal with it and very thorough commenting here.

python/zarrs/utils.py

fix: handle io array shape dimensionality

21317c7

ilan-gold commented Mar 11, 2026

View reviewed changes

ilan-gold added 2 commits March 11, 2026 18:09

refactor: simplify logic

3715aed

chore: format

0608549

ilan-gold force-pushed the ig/fix_shape branch from 876e4fe to 0608549 Compare March 11, 2026 17:23

ilan-gold added 2 commits March 11, 2026 18:24

fix: handle else

dedf1c3

chore: clearer comments + bigger test

33e2e44

ilan-gold requested review from LDeakin and flying-sheep March 12, 2026 09:41

ilan-gold marked this pull request as ready for review March 12, 2026 09:41

flying-sheep reviewed Mar 12, 2026

View reviewed changes

python/zarrs/utils.py Show resolved Hide resolved

python/zarrs/utils.py Outdated Show resolved Hide resolved

python/zarrs/utils.py Outdated Show resolved Hide resolved

refactor: use iterators instead of ctr for size-1 axis

6c4c348

flying-sheep approved these changes Mar 13, 2026

View reviewed changes

ilan-gold merged commit 56c2da9 into main Mar 13, 2026
17 checks passed

ilan-gold deleted the ig/fix_shape branch March 13, 2026 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle io array shape dimensionality#157

fix: handle io array shape dimensionality#157
ilan-gold merged 6 commits intomainfrom
ig/fix_shape

ilan-gold commented Mar 11, 2026 •

edited

Loading

Uh oh!

ilan-gold Mar 11, 2026

Uh oh!

flying-sheep left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ilan-gold commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Why is this a problem for zarrs?

Solution

Uh oh!

ilan-gold Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

flying-sheep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilan-gold commented Mar 11, 2026 •

edited

Loading