Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
a07764e
Add `codec_pipeline.fill_missing_chunks` config
tomwhite Oct 9, 2025
7438a03
Set default for `fill_missing_chunks` in config.py. Add test replicating
williamsnell Mar 5, 2026
38e5acf
Add fill_missing_chunks to examples of config options.
williamsnell Mar 5, 2026
1d51b37
Add to /changes
williamsnell Mar 5, 2026
ad3e2ed
Parameterize tests to make sure we hit both branches of `if
williamsnell Mar 5, 2026
2c9b31b
Fix lint errors: remove parentheses, type kwargs.
williamsnell Mar 5, 2026
2846ed9
Move config from codec_pipeline -> array. Update docs, tests.
williamsnell Mar 5, 2026
de7afd8
Delegate missing-shard detection away from _get_chunk_spec. Codify
williamsnell Mar 6, 2026
233ddce
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 10, 2026
c8b0d11
Merge branch 'main' into fill-missing-chunks
d-v-b Mar 11, 2026
d460521
Define ChunkNotFoundError; expose chunk key and chunk index in ChunkN…
d-v-b Mar 11, 2026
9c9a096
update docs
d-v-b Mar 11, 2026
40da713
Merge branch 'fill-missing-chunks' of https://github.com/williamsnell…
d-v-b Mar 11, 2026
16517d1
Merge branch 'main' into fill-missing-chunks
d-v-b Mar 11, 2026
70b7c7b
fix links
d-v-b Mar 11, 2026
338a494
Merge branch 'fill-missing-chunks' of https://github.com/williamsnell…
d-v-b Mar 11, 2026
8f2cdc4
cleanup
d-v-b Mar 11, 2026
df73f2a
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 13, 2026
332ddc2
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 16, 2026
130a91b
Merge branch 'main' into fill-missing-chunks
d-v-b Mar 16, 2026
89df859
Merge branch 'main' into fill-missing-chunks
d-v-b Mar 17, 2026
404e4ac
Merge branch 'main' into fill-missing-chunks
d-v-b Mar 23, 2026
093c7f4
Pass chunk indexes up
maxrjones Mar 17, 2026
4d8cfe7
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 26, 2026
45a68e6
fill_missing_chunks -> read_missing_chunks
williamsnell Mar 26, 2026
c578231
Resolve behavioural differences between main and maxrjones@37a40e3.
williamsnell Mar 26, 2026
f01b049
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 26, 2026
4d7d0d2
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 27, 2026
b9e4206
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 27, 2026
620a1ff
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 30, 2026
d9bfe98
Merge branch 'main' into fill-missing-chunks
williamsnell Mar 31, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changes/3748.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added `array.fill_missing_chunks` configuration option. When set to `False`, reading missing chunks raises a `MissingChunkError` instead of filling them with the array's fill value.
7 changes: 6 additions & 1 deletion docs/user-guide/arrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,13 +158,18 @@ print(f"Shape after second append: {z.shape}")

Zarr arrays are parametrized with a configuration that determines certain aspects of array behavior.

We currently support two configuration options for arrays: `write_empty_chunks` and `order`.
We currently support three configuration options for arrays: `write_empty_chunks`, `fill_missing_chunks`, and `order`.

| field | type | default | description |
| - | - | - | - |
| `write_empty_chunks` | `bool` | `False` | Controls whether empty chunks are written to storage. See [Empty chunks](performance.md#empty-chunks).
| `fill_missing_chunks` | `bool` | `True` | Controls whether missing chunks are filled with the array's fill value on read. If `False`, reading missing chunks raises a `MissingChunkError`.
| `order` | `Literal["C", "F"]` | `"C"` | The memory layout of arrays returned when reading data from the store.

!!! note
`write_empty_chunks=False` skips writing chunks that are entirely the array's fill value.
If `fill_missing_chunks=False`, attempting to read these missing chunks will raise an error.

You can specify the configuration when you create an array with the `config` keyword argument.
`config` can be passed as either a `dict` or an `ArrayConfig` object.

Expand Down
1 change: 1 addition & 0 deletions docs/user-guide/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Configuration options include the following:
- Default Zarr format `default_zarr_version`
- Default array order in memory `array.order`
- Whether empty chunks are written to storage `array.write_empty_chunks`
- Whether missing chunks are filled with the array's fill value on read `array.fill_missing_chunks` (default `True`). Set to `False` to raise a `MissingChunkError` instead.
- Async and threading options, e.g. `async.concurrency` and `threading.max_workers`
- Selections of implementations of codecs, codec pipelines and buffers
- Enabling GPU support with `zarr.config.enable_gpu()`. See GPU support for more.
Expand Down
9 changes: 7 additions & 2 deletions src/zarr/codecs/sharding.py
Original file line number Diff line number Diff line change
Expand Up @@ -711,17 +711,22 @@ def _get_index_chunk_spec(self, chunks_per_shard: tuple[int, ...]) -> ArraySpec:
dtype=UInt64(endianness="little"),
fill_value=MAX_UINT_64,
config=ArrayConfig(
order="C", write_empty_chunks=False
order="C", write_empty_chunks=False, fill_missing_chunks=True
), # Note: this is hard-coded for simplicity -- it is not surfaced into user code,
prototype=default_buffer_prototype(),
)

def _get_chunk_spec(self, shard_spec: ArraySpec) -> ArraySpec:
# Because the shard index and inner chunks should be stored
# together, we detect missing data via the shard index.
# The inner chunks defined here are thus allowed to return
# None, even if fill_missing_chunks=False at the array level.
config = replace(shard_spec.config, fill_missing_chunks=True)
return ArraySpec(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that chunk spec is implemented as something called ArraySpec because that would be nice for handing over the chunk_coords if ArraySpec could take in a coordinate arg but of course that doesn't make sense semantically

shape=self.chunk_shape,
dtype=shard_spec.dtype,
fill_value=shard_spec.fill_value,
config=shard_spec.config,
config=config,
prototype=shard_spec.prototype,
)

Expand Down
21 changes: 18 additions & 3 deletions src/zarr/core/array_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class ArrayConfigParams(TypedDict):

order: NotRequired[MemoryOrder]
write_empty_chunks: NotRequired[bool]
fill_missing_chunks: NotRequired[bool]


@dataclass(frozen=True)
Expand All @@ -41,17 +42,25 @@ class ArrayConfig:
The memory layout of the arrays returned when reading data from the store.
write_empty_chunks : bool
If True, empty chunks will be written to the store.
fill_missing_chunks : bool
If True, missing chunks will be filled with the array's fill value on read.
If False, reading missing chunks will raise a ``MissingChunkError``.
"""

order: MemoryOrder
write_empty_chunks: bool
fill_missing_chunks: bool

def __init__(self, order: MemoryOrder, write_empty_chunks: bool) -> None:
def __init__(
self, order: MemoryOrder, write_empty_chunks: bool, fill_missing_chunks: bool
) -> None:
order_parsed = parse_order(order)
write_empty_chunks_parsed = parse_bool(write_empty_chunks)
fill_missing_chunks_parsed = parse_bool(fill_missing_chunks)

object.__setattr__(self, "order", order_parsed)
object.__setattr__(self, "write_empty_chunks", write_empty_chunks_parsed)
object.__setattr__(self, "fill_missing_chunks", fill_missing_chunks_parsed)

@classmethod
def from_dict(cls, data: ArrayConfigParams) -> Self:
Expand All @@ -62,7 +71,9 @@ def from_dict(cls, data: ArrayConfigParams) -> Self:
"""
kwargs_out: ArrayConfigParams = {}
for f in fields(ArrayConfig):
field_name = cast("Literal['order', 'write_empty_chunks']", f.name)
field_name = cast(
"Literal['order', 'write_empty_chunks', 'fill_missing_chunks']", f.name
)
if field_name not in data:
kwargs_out[field_name] = zarr_config.get(f"array.{field_name}")
else:
Expand All @@ -73,7 +84,11 @@ def to_dict(self) -> ArrayConfigParams:
"""
Serialize an instance of this class to a dict.
"""
return {"order": self.order, "write_empty_chunks": self.write_empty_chunks}
return {
"order": self.order,
"write_empty_chunks": self.write_empty_chunks,
"fill_missing_chunks": self.fill_missing_chunks,
}


ArrayConfigLike = ArrayConfig | ArrayConfigParams
Expand Down
10 changes: 7 additions & 3 deletions src/zarr/core/codec_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from zarr.core.common import concurrent_map
from zarr.core.config import config
from zarr.core.indexing import SelectorTuple, is_scalar
from zarr.errors import ZarrUserWarning
from zarr.errors import MissingChunkError, ZarrUserWarning
from zarr.registry import register_pipeline

if TYPE_CHECKING:
Expand Down Expand Up @@ -264,8 +264,10 @@ async def read_batch(
):
if chunk_array is not None:
out[out_selection] = chunk_array
else:
elif chunk_spec.config.fill_missing_chunks:
out[out_selection] = fill_value_or_default(chunk_spec)
else:
raise MissingChunkError
else:
chunk_bytes_batch = await concurrent_map(
[(byte_getter, array_spec.prototype) for byte_getter, array_spec, *_ in batch_info],
Expand All @@ -288,8 +290,10 @@ async def read_batch(
if drop_axes != ():
tmp = tmp.squeeze(axis=drop_axes)
out[out_selection] = tmp
else:
elif chunk_spec.config.fill_missing_chunks:
out[out_selection] = fill_value_or_default(chunk_spec)
else:
raise MissingChunkError
Comment thread
d-v-b marked this conversation as resolved.
Outdated

def _merge_chunk_array(
self,
Expand Down
1 change: 1 addition & 0 deletions src/zarr/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ def enable_gpu(self) -> ConfigSet:
"array": {
"order": "C",
"write_empty_chunks": False,
"fill_missing_chunks": True,
"target_shard_size_bytes": None,
},
"async": {"concurrency": 10, "timeout": None},
Expand Down
3 changes: 3 additions & 0 deletions src/zarr/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,3 +144,6 @@ class BoundsCheckError(IndexError): ...


class ArrayIndexError(IndexError): ...


class MissingChunkError(IndexError): ...
Comment thread
d-v-b marked this conversation as resolved.
Outdated
Comment thread
d-v-b marked this conversation as resolved.
Outdated
97 changes: 96 additions & 1 deletion tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
from zarr.core.codec_pipeline import BatchedCodecPipeline
from zarr.core.config import BadConfigError, config
from zarr.core.indexing import SelectorTuple
from zarr.errors import ZarrUserWarning
from zarr.errors import MissingChunkError, ZarrUserWarning
from zarr.registry import (
fully_qualified_name,
get_buffer_class,
Expand Down Expand Up @@ -53,6 +53,7 @@ def test_config_defaults_set() -> None:
"array": {
"order": "C",
"write_empty_chunks": False,
"fill_missing_chunks": True,
"target_shard_size_bytes": None,
},
"async": {"concurrency": 10, "timeout": None},
Expand Down Expand Up @@ -319,6 +320,100 @@ class NewCodec2(BytesCodec):
get_codec_class("new_codec")


@pytest.mark.parametrize("store", ["local", "memory"], indirect=["store"])
@pytest.mark.parametrize(
"kwargs",
[
{"shards": (4, 4)},
{"compressors": None},
],
ids=["partial_decode", "full_decode"],
)
def test_config_fill_missing_chunks(store: Store, kwargs: dict[str, Any]) -> None:
arr = zarr.create_array(
store=store,
shape=(4, 4),
chunks=(2, 2),
dtype="int32",
fill_value=42,
**kwargs,
)

# default behavior: missing chunks are filled with the fill value
result = zarr.open_array(store)[:]
assert np.array_equal(result, np.full((4, 4), 42, dtype="int32"))

# with fill_missing_chunks=False, reading missing chunks raises an error
with config.set({"array.fill_missing_chunks": False}):
with pytest.raises(MissingChunkError):
zarr.open_array(store)[:]

# after writing data, all chunks exist and no error is raised
arr[:] = np.arange(16, dtype="int32").reshape(4, 4)
with config.set({"array.fill_missing_chunks": False}):
result = zarr.open_array(store)[:]
assert np.array_equal(result, np.arange(16, dtype="int32").reshape(4, 4))


@pytest.mark.parametrize("store", ["local", "memory"], indirect=["store"])
def test_config_fill_missing_chunks_sharded_inner(store: Store) -> None:
"""Missing inner chunks within a shard are always filled with the array's
fill value, even when fill_missing_chunks=False."""
arr = zarr.create_array(
store=store,
shape=(8, 4),
chunks=(2, 2),
shards=(4, 4),
dtype="int32",
fill_value=42,
)

# write only one inner chunk in the first shard, leaving the second shard empty
arr[0:2, 0:2] = np.ones((2, 2), dtype="int32")

with config.set({"array.fill_missing_chunks": False}):
a = zarr.open_array(store)

# first shard exists: missing inner chunks are filled, no error
result = a[:4]
expected = np.full((4, 4), 42, dtype="int32")
expected[0:2, 0:2] = 1
assert np.array_equal(result, expected)

# second shard is entirely missing: raises an error
with pytest.raises(MissingChunkError):
a[4:]


@pytest.mark.parametrize("store", ["local", "memory"], indirect=["store"])
def test_config_fill_missing_chunks_write_empty_chunks(store: Store) -> None:
"""write_empty_chunks=False drops chunks equal to fill_value, which then
appear missing to fill_missing_chunks=False."""
arr = zarr.create_array(
store=store,
shape=(4,),
chunks=(2,),
dtype="int32",
fill_value=0,
config={"write_empty_chunks": False, "fill_missing_chunks": False},
)

# write non-fill-value data: chunks are stored
arr[:] = [1, 2, 3, 4]
assert np.array_equal(arr[:], [1, 2, 3, 4])

# overwrite with fill_value: chunks are dropped by write_empty_chunks=False
arr[:] = 0
with pytest.raises(MissingChunkError):
arr[:]

# with write_empty_chunks=True, chunks are kept and no error is raised
with config.set({"array.write_empty_chunks": True}):
arr = zarr.open_array(store)
arr[:] = 0
assert np.array_equal(arr[:], [0, 0, 0, 0])


@pytest.mark.parametrize(
"key",
[
Expand Down