Update FAQ with chunking and virtualization details by TomNicholas · Pull Request #1007 · zarr-developers/VirtualiZarr

TomNicholas · 2026-06-02T02:40:53Z

Added explanation about format chunk requirements and virtualization restrictions for multi-file datasets.

What I did

Acceptance criteria:

Closes #xxxx
Tests added
Tests passing
No test coverage regression
Full type hint coverage
Changes are documented in docs/releases.md
New functions/methods are listed in an appropriate *.md file under docs/api
New functionality has documentation

Added explanation about format chunk requirements and virtualization restrictions for multi-file datasets.

maxrjones

I made a couple recommendations for improved readability. thanks for opening this!

maxrjones · 2026-06-15T19:02:33Z

@@ -25,6 +25,8 @@ Depends on some details of your data.
 VirtualiZarr works by mapping your data to the zarr data model from whatever data model is used by the format it was saved in.
 This means that if your data contains anything that cannot be represented within the zarr data model, it cannot be virtualized.


Suggested change

This means that if your data contains anything that cannot be represented within the zarr data model, it cannot be virtualized.

This means that if your data contains anything that cannot be represented within the zarr data model, it cannot be virtualized. The following restrictions influence whether you can virtualize a data file.

I think a pre-ample to the list would help

maxrjones · 2026-06-15T19:04:19Z

 VirtualiZarr works by mapping your data to the zarr data model from whatever data model is used by the format it was saved in.
 This means that if your data contains anything that cannot be represented within the zarr data model, it cannot be virtualized.

+- **Format chunks span contiguous byte ranges** - It's only possible to efficiently access individual chunks of data inside blobs in object storage if each chunk can be fetched via a single HTTP range request, which requires each chunk to occupy a contiguous localized series of bytes within the file layout. Well-designed formats such as netCDF and GRIB have this property, but other formats such as CSV do not. Note also this means that any additional processing which scrambles the byte locations will prevent virtualization - a single netCDF file is virtualizable, but a zipped or gzipped netCDF file is not!


Suggested change

- **Format chunks span contiguous byte ranges** - It's only possible to efficiently access individual chunks of data inside blobs in object storage if each chunk can be fetched via a single HTTP range request, which requires each chunk to occupy a contiguous localized series of bytes within the file layout. Well-designed formats such as netCDF and GRIB have this property, but other formats such as CSV do not. Note also this means that any additional processing which scrambles the byte locations will prevent virtualization - a single netCDF file is virtualizable, but a zipped or gzipped netCDF file is not!

- **File must contain chunks of data, where each chunk spans a contiguous segment of the file** - For virtualization to work, each chunk must occupy a contiguous localized series of bytes within the file layout, so that the chunks can be fetched via a single HTTP range request. Well-designed formats such as netCDF and GRIB have this property, but other formats such as CSV do not. Note also this means that any additional processing which scrambles the byte locations will prevent virtualization - a single netCDF file is virtualizable, but a zipped or gzipped netCDF file is not!

Update FAQ with chunking and virtualization details

11dcd7e

Added explanation about format chunk requirements and virtualization restrictions for multi-file datasets.

TomNicholas added the documentation Improvements or additions to documentation label Jun 2, 2026

TomNicholas temporarily deployed to test-release June 2, 2026 02:41 — with GitHub Actions Inactive

TomNicholas enabled auto-merge (squash) June 2, 2026 02:43

maxrjones approved these changes Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update FAQ with chunking and virtualization details#1007

Update FAQ with chunking and virtualization details#1007
TomNicholas wants to merge 1 commit into
mainfrom
docs-faq-contiguous-byte-ranges2

TomNicholas commented Jun 2, 2026

Uh oh!

maxrjones left a comment

Uh oh!

maxrjones Jun 15, 2026

Uh oh!

maxrjones Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -25,6 +25,8 @@ Depends on some details of your data.
		VirtualiZarr works by mapping your data to the zarr data model from whatever data model is used by the format it was saved in.
		This means that if your data contains anything that cannot be represented within the zarr data model, it cannot be virtualized.

Conversation

TomNicholas commented Jun 2, 2026

What I did

Uh oh!

maxrjones left a comment

Choose a reason for hiding this comment

Uh oh!

maxrjones Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

maxrjones Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants