Skip to content

Commit bfd8664

Browse files
committed
Add JOSS paper draft
1 parent 0f4dbb7 commit bfd8664

File tree

2 files changed

+200
-0
lines changed

2 files changed

+200
-0
lines changed

paper.bib

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
@article{numpy,
2+
title = {Array programming with NumPy},
3+
author = {Harris, Charles R. and Millman, K. Jarrod and van der Walt, St{\'{e}}fan J. and others},
4+
journal = {Nature},
5+
year = {2020},
6+
doi = {10.1038/s41586-020-2649-2}
7+
}
8+
9+
@misc{pytorch,
10+
title = {PyTorch},
11+
author = {{PyTorch Contributors}},
12+
howpublished = {\url{https://pytorch.org/}}
13+
}
14+
15+
@misc{jax,
16+
title = {JAX},
17+
author = {{JAX Developers}},
18+
howpublished = {\url{https://github.com/google/jax}}
19+
}
20+
21+
@misc{tensorflow,
22+
title = {TensorFlow},
23+
author = {{TensorFlow Developers}},
24+
howpublished = {\url{https://www.tensorflow.org/}}
25+
}
26+
27+
@misc{cupy,
28+
title = {CuPy},
29+
author = {{CuPy Developers}},
30+
howpublished = {\url{https://cupy.dev/}}
31+
}
32+
33+
@misc{zarr,
34+
title = {Zarr},
35+
author = {{Zarr Developers}},
36+
howpublished = {\url{https://zarr.dev/}}
37+
}
38+
39+
@misc{ome_zarr,
40+
title = {OME-Zarr},
41+
author = {{Open Microscopy Environment}},
42+
howpublished = {\url{https://github.com/ome/ome-zarr-py}}
43+
}
44+
45+
@misc{zeromq,
46+
title = {ZeroMQ},
47+
author = {{ZeroMQ Contributors}},
48+
howpublished = {\url{https://zeromq.org/}}
49+
}
50+
51+
@misc{napari,
52+
title = {napari},
53+
author = {{napari Developers}},
54+
howpublished = {\url{https://napari.org/}}
55+
}
56+
57+
@misc{fiji,
58+
title = {Fiji},
59+
author = {{Fiji Developers}},
60+
howpublished = {\url{https://fiji.sc/}}
61+
}
62+
63+
@misc{omero,
64+
title = {OMERO},
65+
author = {{Open Microscopy Environment}},
66+
howpublished = {\url{https://www.openmicroscopy.org/omero/}}
67+
}
68+
69+
@misc{fsspec,
70+
title = {fsspec: Filesystem Spec},
71+
author = {{fsspec Developers}},
72+
howpublished = {\url{https://filesystem-spec.readthedocs.io/}}
73+
}
74+
75+
@article{xarray,
76+
title = {xarray: N-D labeled arrays and datasets in Python},
77+
author = {Hoyer, Stephan and Hamman, Joe},
78+
journal = {Journal of Open Research Software},
79+
year = {2017},
80+
doi = {10.5334/jors.148}
81+
}
82+
83+
@misc{metaclassregistry,
84+
title = {metaclass-registry: Automatic class registration via metaclass},
85+
author = {Simas, Tristan},
86+
howpublished = {\url{https://github.com/trissim/metaclass-registry}}
87+
}

paper.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
title: "PolyStore: Unified Storage Abstraction with Streaming Backends for Scientific Python"
3+
tags:
4+
- Python
5+
- storage
6+
- scientific computing
7+
- microscopy
8+
- streaming
9+
authors:
10+
- name: Tristan Simas
11+
orcid: 0000-0002-6526-3149
12+
affiliation: 1
13+
affiliations:
14+
- name: McGill University
15+
index: 1
16+
date: 15 January 2026
17+
bibliography: paper.bib
18+
---
19+
20+
# Summary
21+
22+
PolyStore provides a unified API for heterogeneous storage backends—disk, memory, Zarr, and live streaming to Napari or Fiji—through a single interface. The key insight: **streaming viewers are just backends**:
23+
24+
```python
25+
from polystore import FileManager, BackendRegistry
26+
27+
fm = FileManager(BackendRegistry())
28+
29+
# Same API for persistent storage, cache, and live visualization
30+
fm.save(image, "result.npy", backend="disk")
31+
fm.save(image, "result.npy", backend="memory")
32+
fm.save(image, "result.npy", backend="napari_stream") # Appears in Napari
33+
```
34+
35+
The `FileManager` routes operations to explicitly selected backends with no implicit fallback. Backends auto-register via metaclass, support lazy imports for optional dependencies, and provide atomic file operations for concurrent metadata updates.
36+
37+
# Statement of Need
38+
39+
Scientific pipelines move data between arrays, files, chunked formats, and visualization tools. Each destination has different I/O conventions:
40+
41+
```python
42+
# Without PolyStore: per-backend code everywhere
43+
np.save("result.npy", data) # Disk
44+
memory_store["result.npy"] = data # Memory
45+
zarr.save("result.zarr", data) # Zarr
46+
socket.send(msgpack.packb({"data": data})) # Streaming
47+
```
48+
49+
With PolyStore, one call handles all backends. The explicit `backend=` parameter ensures deterministic behavior—no silent fallbacks, no hidden resolution logic.
50+
51+
# State of the Field
52+
53+
| Feature | PolyStore | fsspec | zarr | xarray |
54+
|---------|:---------:|:------:|:----:|:------:|
55+
| Unified storage API |||||
56+
| Streaming backends |||||
57+
| Multi-framework I/O |||||
58+
| Atomic concurrent writes |||||
59+
| Explicit backend selection |||||
60+
| Zero implicit fallback |||||
61+
62+
**fsspec** [@fsspec] provides unified filesystem access but lacks streaming and array framework handling. **zarr** [@zarr] handles chunked arrays but is a single format, not a storage abstraction. **xarray** [@xarray] provides multi-dimensional arrays with NetCDF/Zarr backends but no streaming or explicit backend routing.
63+
64+
# Software Design
65+
66+
**Backend Hierarchy**: `DataSource` (read-only), `DataSink` (write-only), `StorageBackend` (read/write). Backends auto-register via `metaclass-registry` [@metaclassregistry] and are lazily instantiated.
67+
68+
**FileManager**: Thin router enforcing explicit backend selection. No magic resolution—if you don't specify a backend, you get an error.
69+
70+
**Streaming Backends**: ZeroMQ transport with shared memory for zero-copy image transfer. ROI data model provides backend-neutral shapes/points with converters for Napari and Fiji.
71+
72+
**Atomic Operations**: Cross-platform file locking (`fcntl` on Unix, `portalocker` on Windows) with `atomic_update_json()` for concurrent metadata writes from multiple pipeline workers.
73+
74+
```python
75+
# Multiple workers safely update shared metadata
76+
from polystore import AtomicMetadataWriter
77+
78+
writer = AtomicMetadataWriter()
79+
writer.merge_subdirectory_metadata(metadata_path, {
80+
"TimePoint_1": {"available_backends": {"zarr": True}}
81+
})
82+
```
83+
84+
# Research Application
85+
86+
PolyStore was developed for OpenHCS (Open High-Content Screening) where microscopy pipelines:
87+
88+
- Load images from disk or virtual workspace
89+
- Process in memory (avoiding I/O between steps)
90+
- Write results to Zarr (chunked, compressed)
91+
- Stream intermediate results to Napari for live preview
92+
93+
All through one interface:
94+
95+
```python
96+
# Load → process → save → stream: same API
97+
images = fm.load_batch(paths, backend="disk")
98+
processed = pipeline(images)
99+
fm.save_batch(processed, paths, backend="zarr")
100+
fm.save_batch(processed, paths, backend="napari_stream")
101+
```
102+
103+
The explicit backend model eliminated an entire class of bugs where code assumed disk storage but ran against memory or streaming backends.
104+
105+
# AI Usage Disclosure
106+
107+
Generative AI (Claude) assisted with code generation and documentation. All content was reviewed and tested.
108+
109+
# Acknowledgements
110+
111+
This work was supported in part by the Fournier lab at the Montreal Neurological Institute, McGill University.
112+
113+
# References

0 commit comments

Comments
 (0)