Conversation
ilan-gold
commented
Mar 11, 2026
python/zarrs/utils.py
Outdated
| shape_chunk_selection_slices | ||
| ): | ||
| shape_ctr = 0 | ||
| io_array_shape = [] |
Collaborator
Author
There was a problem hiding this comment.
In theory, this value should be constant - it might be worth only calculating it once
Collaborator
flying-sheep
left a comment
There was a problem hiding this comment.
Nice! Annoying that we have to deal with it and very thorough commenting here.
flying-sheep
approved these changes
Mar 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Looking into #152 I discovered an extra copy that was occurring on partial reads and fixed that with zarrs/zarrs#362.
Previously, before zarrs/zarrs#362,
partial_decode_intohad theoutput_viewsimplycopy_from_slicei.e., no shape/dimensionality checks but it required an extra copy. As long as the number of bytes were correct, the copy occurred. So now with the fix inzarrs, checks occur becausezarrsAPIs check this dimensionality when constructing the subsets to read directly into theoutput_view.Why is this a problem for zarrs?
Outputs with extra singleton dimensions now would fail against
mainwith0.26.3zarrs, even though the number of bytes is correct. The view's dimensionality must match that of the chunk subsets. So that needs to be handled explicitly now, hence this PR.Unfortunately,
drop_axesis only meant to indicate dropped axes when indexing by the chunk subset would not drop an axis that the input/output would expect to be dropped due to the way zarr implements a given indexing operation i.e., it makes the chunk subset match the output/input subset and not the reverse (or not to indicate that they both have a dropped axis compared to the overall array shape). So you have something likehttps://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L616-L621
where there are no dropped axes marked on the basic class i.e.,
drop_axes == ()always. But in reality, the axes can be implicitly dropped thanks to numpy indexing on both the input/output array and each subsetted chunk when applicable.For example, you could be reading into
outwhich could have shape(128, 128)but your array has shape(128, 128, 128)and you just requestedz[0, ...]i.e.,(1, 0..128, 0..128). In this case,drop_axeswould actually be()because both the input/output array and the subset have dropped axes compared to the array shape.The only place there are non-
()drop_axesis advanced orthogonal indexing i.e.,numpy.ndarray:https://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L941-L949
But if someone does oindex-ing with, say, a
numpy.ndarrayand anint,drop_axesgets picked uphttps://github.com/zarr-developers/zarr-python/blob/fa61ed8e9d1f2c92cf1b7b763c35affd5175a429/src/zarr/core/indexing.py#L941-L949
And this makes sense given
numpybehavior:but zarr implements this code path through outer indexing +
ix_, which would result in something likeand so the dropped axes has to be explicitly tracked (instead of relying on numpy's own axis-dropping behavior).
Solution
I have tried to encapsulate the logic where a new singleton dimension has been added relative to the underlying zarr array, beyond just outer indexing (i.e., v-indexing or mixed indexing). In an ideal world, everything that hits the codec pipeline would have the same dimensionality as the underlying
zarr.Arrayand then there is just a finalreshapeor similar.cc @d-v-b re: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/Clarification.20on.20zarr.2EArray.2E__setitem__.20behaviour/with/578643578 This analysis may be of interest