Skip to content

Commit 69ed4f4

Browse files
committed
docs: clarify DataSink abstraction vs fsspec in JOSS paper
- Explain why fsspec cannot support streaming (filesystem vs data sink abstraction) - Show DataSink/StorageBackend hierarchy with code example - Highlight streaming complexity hidden behind save_batch() interface - Add fsspec, xarray, metaclass-registry references to bibliography
1 parent bfd8664 commit 69ed4f4

1 file changed

Lines changed: 17 additions & 13 deletions

File tree

paper.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -59,27 +59,31 @@ With PolyStore, one call handles all backends. The explicit `backend=` parameter
5959
| Explicit backend selection |||||
6060
| Zero implicit fallback |||||
6161

62-
**fsspec** [@fsspec] provides unified filesystem access but lacks streaming and array framework handling. **zarr** [@zarr] handles chunked arrays but is a single format, not a storage abstraction. **xarray** [@xarray] provides multi-dimensional arrays with NetCDF/Zarr backends but no streaming or explicit backend routing.
62+
**fsspec** [@fsspec] provides unified filesystem access but cannot support streaming because its abstraction is *filesystems*—everything must behave like a file. PolyStore's abstraction is *data sinks*, which includes destinations that consume data without persisting it. This distinction is fundamental: a Napari viewer is not a filesystem, but it is a valid data sink.
63+
64+
**zarr** [@zarr] handles chunked arrays but is a single format, not a storage abstraction. **xarray** [@xarray] provides multi-dimensional arrays with NetCDF/Zarr backends but no streaming or explicit backend routing.
6365

6466
# Software Design
6567

66-
**Backend Hierarchy**: `DataSource` (read-only), `DataSink` (write-only), `StorageBackend` (read/write). Backends auto-register via `metaclass-registry` [@metaclassregistry] and are lazily instantiated.
68+
**Backend Hierarchy**: The key architectural decision is the base abstraction. `DataSink` (write-only) is the root interface—not `StorageBackend`. This allows streaming backends that consume data without supporting reads:
6769

68-
**FileManager**: Thin router enforcing explicit backend selection. No magic resolution—if you don't specify a backend, you get an error.
70+
```python
71+
class StreamingBackend(DataSink): # Write-only sink
72+
def save_batch(self, data, paths, **kwargs): ...
73+
# No load() method - streaming is one-way
6974

70-
**Streaming Backends**: ZeroMQ transport with shared memory for zero-copy image transfer. ROI data model provides backend-neutral shapes/points with converters for Napari and Fiji.
75+
class StorageBackend(DataSink): # Read/write storage
76+
def save_batch(self, data, paths, **kwargs): ...
77+
def load_batch(self, paths, **kwargs): ...
78+
```
7179

72-
**Atomic Operations**: Cross-platform file locking (`fcntl` on Unix, `portalocker` on Windows) with `atomic_update_json()` for concurrent metadata writes from multiple pipeline workers.
80+
The `FileManager` routes to any `DataSink`. Pipeline code doesn't know whether data goes to disk, memory, or a live Napari viewer—and doesn't need to.
7381

74-
```python
75-
# Multiple workers safely update shared metadata
76-
from polystore import AtomicMetadataWriter
82+
**Streaming Internals Hidden**: Streaming backends handle substantial complexity internally—GPU tensor conversion, shared memory allocation, ZMQ socket management, ROI serialization—all behind the same `save_batch()` interface. The orchestrator remains backend-agnostic.
7783

78-
writer = AtomicMetadataWriter()
79-
writer.merge_subdirectory_metadata(metadata_path, {
80-
"TimePoint_1": {"available_backends": {"zarr": True}}
81-
})
82-
```
84+
**Atomic Operations**: Cross-platform file locking (`fcntl` on Unix, `portalocker` on Windows) with `atomic_update_json()` for concurrent metadata writes from multiple pipeline workers.
85+
86+
Backends auto-register via `metaclass-registry` [@metaclassregistry] and are lazily instantiated, keeping optional dependencies unloaded until used.
8387

8488
# Research Application
8589

0 commit comments

Comments
 (0)