You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper.md
+18-8Lines changed: 18 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,18 +81,20 @@ The `FileManager` routes to any `DataSink`. Pipeline code doesn't know whether d
81
81
82
82
**Streaming Internals Hidden**: Streaming backends handle substantial complexity internally—GPU tensor conversion, shared memory allocation, ZMQ socket management, ROI serialization—all behind the same `save_batch()` interface. The orchestrator remains backend-agnostic.
83
83
84
-
**Atomic Operations**: Cross-platform file locking (`fcntl` on Unix, `portalocker` on Windows) with `atomic_update_json()` for concurrent metadata writes from multiple pipeline workers.
84
+
**Atomic Operations**: Cross-platform file locking (`fcntl` on Unix, `portalocker` on Windows) with `atomic_update_json()` for concurrent metadata writes from multiple pipeline workers. This is critical for OpenHCS where multiple worker processes write metadata simultaneously—without atomic operations, race conditions corrupt JSON files.
85
85
86
-
Backends auto-register via `metaclass-registry`[@metaclassregistry] and are lazily instantiated, keeping optional dependencies unloaded until used.
86
+
**Lazy Backend Instantiation**: Backends auto-register via `metaclass-registry`[@metaclassregistry] and are lazily instantiated, keeping optional dependencies unloaded until used. For example, the Napari streaming backend only imports `napari` when first used, avoiding dependency bloat for users who don't need visualization.
87
+
88
+
**Batch Operations**: The `save_batch()` and `load_batch()` interfaces accept lists of paths and data, enabling backends to optimize I/O. The Zarr backend can write multiple arrays in a single transaction; the Napari backend can batch ROI updates into a single viewer refresh. This is more efficient than per-file operations.
87
89
88
90
# Research Application
89
91
90
-
PolyStore was developed for OpenHCS (Open High-Content Screening) where microscopy pipelines:
92
+
PolyStore was developed for OpenHCS (Open High-Content Screening) where microscopy pipelines process thousands of images per experiment. A typical workflow:
91
93
92
-
-Loadimages from disk or virtual workspace
93
-
-Process in memory (avoiding I/O between steps)
94
-
-Write results to Zarr (chunked, compressed)
95
-
-Streamintermediate results to Napari for live preview
94
+
1.**Load**: Read raw images from disk (TIFF, OME-TIFF) or virtual workspace (lazy-loaded)
95
+
2.**Process**: Apply filters, segmentation, feature extraction in memory
96
+
3.**Save**: Write results to Zarr (chunked, compressed for efficient storage)
97
+
4.**Stream**: Send intermediate results to Napari for live preview and quality control
The explicit backend model eliminated an entire class of bugs where code assumed disk storage but ran against memory or streaming backends.
109
+
**Concrete Example**: A user processes 10,000 images. Without PolyStore, the pipeline code would contain:
110
+
-`np.load()` for disk reads
111
+
-`zarr.open_array()` for Zarr writes
112
+
-`napari.Viewer.add_image()` for visualization
113
+
- Custom socket code for streaming to remote Fiji instances
114
+
115
+
With PolyStore, all I/O goes through `FileManager`, and the user can switch backends by changing a config parameter—no code changes needed.
116
+
117
+
**Bug Prevention**: The explicit backend model eliminated an entire class of bugs where code assumed disk storage but ran against memory or streaming backends. For example, a function that called `os.path.exists()` would fail silently against a memory backend. With PolyStore, the backend is explicit, and such mismatches are caught immediately.
0 commit comments