-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add zarrs DecodeMode support
#155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| issue_152.zarr |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,54 @@ | ||||||
| import platform | ||||||
| import subprocess | ||||||
| import time | ||||||
|
|
||||||
| import numpy as np | ||||||
| import zarr | ||||||
|
|
||||||
|
|
||||||
| def clear_cache(): | ||||||
| if platform.system() == "Darwin": | ||||||
| subprocess.call(["sync", "&&", "sudo", "purge"]) | ||||||
| elif platform.system() == "Linux": | ||||||
| subprocess.call(["sudo", "sh", "-c", "sync; echo 3 > /proc/sys/vm/drop_caches"]) | ||||||
| else: | ||||||
| raise Exception("Unsupported platform") | ||||||
|
|
||||||
|
|
||||||
| zarr.config.set({"codec_pipeline.path": "zarrs.ZarrsCodecPipeline"}) | ||||||
|
|
||||||
| # zarr.config.set({"codec_pipeline.decode_mode": "auto"}) | ||||||
| # full read took: 3.3279900550842285 | ||||||
| # partial shard read (4095) took: 1.211921215057373 | ||||||
| # partial shard read (4096) took: 2.3402509689331055 | ||||||
|
|
||||||
| zarr.config.set({"codec_pipeline.decode_mode": "partial"}) | ||||||
| # full read took: 2.2892508506774902 | ||||||
| # partial shard read (4095) took: 1.1934266090393066 | ||||||
| # partial shard read (4096) took: 1.1788337230682373 | ||||||
|
|
||||||
| z = zarr.create_array( | ||||||
| "examples/issue_152.zarr", | ||||||
| shape=(8192, 4, 128, 128), | ||||||
| shards=(4096, 4, 128, 128), | ||||||
| chunks=(1, 1, 128, 128), | ||||||
| dtype=np.float64, | ||||||
| overwrite=True, | ||||||
| ) | ||||||
| z[...] = np.random.randn(8192, 4, 128, 128) | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Ok so if you do this (i.e., let compression do its thing on non-random data), the issue becomes apparent. I also changed the sub-shard read size to highlight this isn't some chunk-boundary condition of sorts. On my linux machine, with with and These numbers are much more like what I'm seeing. I believe (as I just learned from you) that this also explains the mac-linux performance difference. Again, apologies, but this appears to the core of the issue. This matches the behavior I see on my "real" data.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note this problem appears to also be independent of the chunking - chunks of |
||||||
|
|
||||||
| clear_cache() | ||||||
| t = time.time() | ||||||
| z[...] | ||||||
| print("full read took: ", time.time() - t) | ||||||
|
|
||||||
| clear_cache() | ||||||
| t = time.time() | ||||||
| z[:4095, ...] | ||||||
| print("partial shard read (4095) took: ", time.time() - t) | ||||||
|
|
||||||
|
|
||||||
| clear_cache() | ||||||
| t = time.time() | ||||||
| z[:4096, ...] | ||||||
| print("partial shard read (4096) took: ", time.time() - t) | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perf exactly matches my expectations with this mode @ilan-gold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thin there is something strange going on here - these numbers are reversed for me i.e., forcing
partialmakes(4096)slower for me (which makes sense since it is a full outer-chunk read) but leaves the4095one unchanged. What I do see to be constant also on my own machine is that4095is the same in both cases, but I would expect this feature to help with that request specifically i.e., somehow make it faster.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up, if I use
autohere (i.e., from what I can tell, what is currently onmaininzarrs-python) or go back to the original example in #152 with themainbranch, but make the partial read2048, it's still slow - it takes just as long to read2048as it does the full4096for me.Apologies for the crosstalk, but I mentioned in my comment in the zarrs PR that I believe the hotpath is in the partial decoder and this behavior would confirm that.
Reading half the data takes just as long as reading the full data - even with sequential i/o, I can't believe those numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is a consequence of using a compressed filesystem. Mine isn't and I assume yours is on Mac? Otherwise I can't really explain this without actually profiling on a machine like yours
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are definitely right. With this reproducer on linux, I see that now. But I think my bad reproducer masked my original problem when I moved from linux to mac to create the repro (since the numbers looked like the ones I was seeing on the linux machine).
The data is fundamentally not random, and contains a lot of zeros (so compression kicks in, presumably). Aghh, this is getting to be a mess.
I guess my question here would be why this specific fix here works i.e., why is partial mode causing the full-shard reads to be faster? I suspect this to be somehow related to my problem.