Skip to content

fix: handle io array shape dimensionality#157

Merged
ilan-gold merged 6 commits intomainfrom
ig/fix_shape
Mar 13, 2026
Merged

fix: handle io array shape dimensionality#157
ilan-gold merged 6 commits intomainfrom
ig/fix_shape

Conversation

@ilan-gold
Copy link
Collaborator

@ilan-gold ilan-gold commented Mar 11, 2026

Background

Looking into #152 I discovered an extra copy that was occurring on partial reads and fixed that with zarrs/zarrs#362.

Previously, before zarrs/zarrs#362, partial_decode_into had the output_view simply copy_from_slice i.e., no shape/dimensionality checks but it required an extra copy. As long as the number of bytes were correct, the copy occurred. So now with the fix in zarrs, checks occur because zarrs APIs check this dimensionality when constructing the subsets to read directly into the output_view.

Why is this a problem for zarrs?

Outputs with extra singleton dimensions now would fail against main with 0.26.3 zarrs, even though the number of bytes is correct. The view's dimensionality must match that of the chunk subsets. So that needs to be handled explicitly now, hence this PR.

Unfortunately, drop_axes is only meant to indicate dropped axes when indexing by the chunk subset would not drop an axis that the input/output would expect to be dropped due to the way zarr implements a given indexing operation i.e., it makes the chunk subset match the output/input subset and not the reverse (or not to indicate that they both have a dropped axis compared to the overall array shape). So you have something like

https://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L616-L621

where there are no dropped axes marked on the basic class i.e., drop_axes == () always. But in reality, the axes can be implicitly dropped thanks to numpy indexing on both the input/output array and each subsetted chunk when applicable.

For example, you could be reading into out which could have shape (128, 128) but your array has shape (128, 128, 128) and you just requested z[0, ...] i.e., (1, 0..128, 0..128). In this case, drop_axes would actually be () because both the input/output array and the subset have dropped axes compared to the array shape.

The only place there are non-() drop_axes is advanced orthogonal indexing i.e., numpy.ndarray:

https://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L941-L949

But if someone does oindex-ing with, say, a numpy.ndarray and an int, drop_axes gets picked up

https://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L941-L949

And this makes sense given numpy behavior:

import numpy as np

np.arange(16).reshape((4, 4))[0, np.array([0, 2])].shape == (2,)

but zarr implements this code path through outer indexing + ix_, which would result in something like

import numpy as np

np.arange(16).reshape((4, 4))[np.ix_(np.array([0]), np.array([0, 2]))].shape == (1, 2)

and so the dropped axes has to be explicitly tracked (instead of relying on numpy's own axis-dropping behavior).

Solution

I have tried to encapsulate the logic where a new singleton dimension has been added relative to the underlying zarr array, beyond just outer indexing (i.e., v-indexing or mixed indexing). In an ideal world, everything that hits the codec pipeline would have the same dimensionality as the underlying zarr.Array and then there is just a final reshape or similar.

cc @d-v-b re: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/Clarification.20on.20zarr.2EArray.2E__setitem__.20behaviour/with/578643578 This analysis may be of interest

shape_chunk_selection_slices
):
shape_ctr = 0
io_array_shape = []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, this value should be constant - it might be worth only calculating it once

@ilan-gold ilan-gold marked this pull request as ready for review March 12, 2026 09:41
Copy link
Collaborator

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Annoying that we have to deal with it and very thorough commenting here.

@ilan-gold ilan-gold merged commit 56c2da9 into main Mar 13, 2026
17 checks passed
@ilan-gold ilan-gold deleted the ig/fix_shape branch March 13, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants