From 12f6879fd679925ad61e8e05b998fb1d6063e246 Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 11:06:36 -0600
Subject: [PATCH 1/9] docs: fix external dtype notation in codec comparison
 table

Changed <hash> to <hash@> in the External dtype row to match
the correct store-only notation used throughout the documentation.
---
 src/reference/specs/type-system.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/reference/specs/type-system.md b/src/reference/specs/type-system.md
index 9fe16489..80c6bd05 100644
--- a/src/reference/specs/type-system.md
+++ b/src/reference/specs/type-system.md
@@ -636,7 +636,7 @@ def garbage_collect(store_name):
 |---------|----------|------------|-------------|--------------|---------------|
 | Storage modes | Both | Both | External only | External only | External only |
 | Internal dtype | `bytes` | `bytes` | N/A | N/A | N/A |
-| External dtype | `<hash>` | `<hash>` | `json` | `json` | `json` |
+| External dtype | `<hash@>` | `<hash@>` | `json` | `json` | `json` |
 | Addressing | Hash | Hash | Primary key | Hash | Relative path |
 | Deduplication | Yes (external) | Yes (external) | No | Yes | No |
 | Structure | Single blob | Single file | Files, folders | Single blob | Any |

From 3aa243dfc38fe33491b6ceeb7cb870aef5bc1158 Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 11:49:19 -0600
Subject: [PATCH 2/9] docs: Add plugin codecs guide with dj-zarr-codecs example

Add comprehensive how-to guide for using plugin codecs - codec packages
that extend DataJoint via entry point discovery. Uses dj-zarr-codecs
as the primary example.

Key sections:
- Installation and automatic registration via entry points
- Complete Zarr codec usage example with storage structure
- Finding DataJoint-maintained and community codecs
- Comparison with built-in codecs (<npy@>, <blob@>)
- Best practices for dependency management
- Troubleshooting common issues

Terminology: Uses 'plugin codecs' instead of 'external/third-party' to
accurately describe the architectural pattern (separate packages with
entry point discovery) without implying ownership.
---
 mkdocs.yaml                     |   1 +
 src/how-to/index.md             |   1 +
 src/how-to/use-plugin-codecs.md | 330 ++++++++++++++++++++++++++++++++
 3 files changed, 332 insertions(+)
 create mode 100644 src/how-to/use-plugin-codecs.md

diff --git a/mkdocs.yaml b/mkdocs.yaml
index 0a0812a9..0a8d29ab 100644
--- a/mkdocs.yaml
+++ b/mkdocs.yaml
@@ -74,6 +74,7 @@ nav:
       - Object Storage:
           - Use Object Storage: how-to/use-object-storage.md
           - Use NPY Codec: how-to/use-npy-codec.md
+          - Use Plugin Codecs: how-to/use-plugin-codecs.md
           - Create Custom Codecs: how-to/create-custom-codec.md
           - Manage Large Data: how-to/manage-large-data.md
           - Clean Up Storage: how-to/garbage-collection.md
diff --git a/src/how-to/index.md b/src/how-to/index.md
index 95c99f28..d54246ab 100644
--- a/src/how-to/index.md
+++ b/src/how-to/index.md
@@ -42,6 +42,7 @@ they assume you understand the basics and focus on getting things done.
 - [Object Storage Overview](object-storage-overview.md) — Navigation guide for all storage docs
 - [Choose a Storage Type](choose-storage-type.md) — Decision guide for codecs
 - [Use Object Storage](use-object-storage.md) — When and how
+- [Use Plugin Codecs](use-plugin-codecs.md) — Install codec packages via entry points
 - [Create Custom Codecs](create-custom-codec.md) — Domain-specific types
 - [Manage Large Data](manage-large-data.md) — Blobs, streaming, efficiency
 - [Clean Up External Storage](garbage-collection.md) — Garbage collection
diff --git a/src/how-to/use-plugin-codecs.md b/src/how-to/use-plugin-codecs.md
new file mode 100644
index 00000000..6820915c
--- /dev/null
+++ b/src/how-to/use-plugin-codecs.md
@@ -0,0 +1,330 @@
+# Use Plugin Codecs
+
+Install and use plugin codec packages to extend DataJoint's type system.
+
+## Overview
+
+Plugin codecs are distributed as separate Python packages that extend DataJoint's type system. They add support for domain-specific data types without modifying DataJoint itself. Once installed, they register automatically via Python's entry point system and work seamlessly with DataJoint.
+
+**Benefits:**
+- Automatic registration via entry points - no code changes needed
+- Domain-specific types maintained independently
+- Clean separation of core framework from specialized formats
+- Easy to share across projects and teams
+
+## Quick Start
+
+### 1. Install the Codec Package
+
+```bash
+pip install dj-zarr-codecs
+```
+
+### 2. Use in Table Definitions
+
+```python
+import datajoint as dj
+
+schema = dj.Schema('my_schema')
+
+@schema
+class Recording(dj.Manual):
+    definition = """
+    recording_id : int
+    ---
+    waveform : <zarr@>  # Automatically available after install
+    """
+```
+
+That's it! No imports or registration needed. The codec is automatically discovered via Python's entry point system.
+
+## Example: Zarr Array Storage
+
+The `dj-zarr-codecs` package adds support for storing NumPy arrays in Zarr format with schema-addressed paths.
+
+### Installation
+
+```bash
+pip install dj-zarr-codecs
+```
+
+### Configuration
+
+Configure object storage for external data:
+
+```python
+import datajoint as dj
+
+dj.config['stores'] = {
+    'mystore': {
+        'protocol': 's3',
+        'endpoint': 's3.amazonaws.com',
+        'bucket': 'my-bucket',
+        'location': 'data',
+    }
+}
+```
+
+### Basic Usage
+
+```python
+import numpy as np
+
+schema = dj.Schema('neuroscience')
+
+@schema
+class Recording(dj.Manual):
+    definition = """
+    recording_id : int
+    ---
+    waveform : <zarr@mystore>  # Store as Zarr array
+    """
+
+# Insert NumPy array
+Recording.insert1({
+    'recording_id': 1,
+    'waveform': np.random.randn(1000, 32),
+})
+
+# Fetch returns zarr.Array (read-only)
+zarr_array = (Recording & {'recording_id': 1}).fetch1('waveform')
+
+# Use with NumPy
+mean_waveform = np.mean(zarr_array, axis=0)
+
+# Access Zarr features
+print(zarr_array.shape)   # (1000, 32)
+print(zarr_array.chunks)  # Zarr chunking info
+print(zarr_array.dtype)   # float64
+```
+
+### Storage Structure
+
+Zarr arrays are stored with schema-addressed paths that mirror your database structure:
+
+```
+s3://my-bucket/data/
+└── neuroscience/           # Schema name
+    └── recording/          # Table name
+        └── recording_id=1/ # Primary key
+            └── waveform.zarr/  # Field name + .zarr extension
+                ├── .zarray
+                └── 0.0
+```
+
+This organization makes external storage browsable and self-documenting.
+
+### When to Use `<zarr@>`
+
+**Use `<zarr@>` when:**
+- Arrays are large (> 10 MB)
+- You need chunked access patterns
+- Compression is beneficial
+- Cross-language compatibility matters (any Zarr library can read)
+- You want browsable, organized storage paths
+
+**Use `<npy@>` instead when:**
+- You need lazy loading with metadata inspection before download
+- Memory mapping is important
+- Storage format simplicity is preferred
+
+**Use `<blob@>` instead when:**
+- Arrays are small (< 10 MB)
+- Deduplication of repeated values is important
+- Storing mixed Python objects (not just arrays)
+
+## Finding Plugin Codecs
+
+### DataJoint-Maintained Codecs
+
+- **[dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs)** — Zarr array storage
+- **[anscombe-transform](https://github.com/datajoint/anscombe-transform)** — Anscombe variance stabilization for imaging
+
+### Community Codecs
+
+Check PyPI for packages with the `datajoint` keyword:
+
+```bash
+pip search datajoint codec
+```
+
+Or browse GitHub: https://github.com/topics/datajoint
+
+### Domain-Specific Examples
+
+**Neuroscience:**
+- Spike train formats (NEO, NWB)
+- Neural network models
+- Connectivity matrices
+
+**Imaging:**
+- OME-TIFF, OME-ZARR
+- DICOM medical images
+- Point cloud data
+
+**Genomics:**
+- BAM/SAM alignments
+- VCF variant calls
+- Phylogenetic trees
+
+## Verifying Installation
+
+Check that a codec is registered:
+
+```python
+import datajoint as dj
+
+# List all available codecs
+print(dj.list_codecs())
+# ['blob', 'attach', 'hash', 'object', 'npy', 'filepath', 'zarr', ...]
+
+# Check specific codec
+assert 'zarr' in dj.list_codecs()
+```
+
+## How Auto-Registration Works
+
+Plugin codecs use Python's entry point system for automatic discovery. When you install a codec package, it registers itself via `pyproject.toml`:
+
+```toml
+[project.entry-points."datajoint.codecs"]
+zarr = "dj_zarr_codecs:ZarrCodec"
+```
+
+DataJoint discovers these entry points at import time, so the codec is immediately available after `pip install`.
+
+**No manual registration needed** — unlike DataJoint 0.x which required `dj.register_codec()`.
+
+## Troubleshooting
+
+### "Unknown codec: \<zarr\>"
+
+The codec package is not installed or not found. Verify installation:
+
+```bash
+pip list | grep dj-zarr-codecs
+```
+
+If installed but not working:
+
+```python
+# Force entry point reload
+import importlib.metadata
+importlib.metadata.entry_points().select(group='datajoint.codecs')
+```
+
+### Codec Not Found After Installation
+
+Restart your Python session or kernel. Entry points are discovered at import time:
+
+```python
+# Restart kernel, then:
+import datajoint as dj
+print('zarr' in dj.list_codecs())  # Should be True
+```
+
+### Version Conflicts
+
+Check compatibility with your DataJoint version:
+
+```bash
+pip show dj-zarr-codecs
+# Requires: datajoint>=2.0.0a22
+```
+
+Upgrade DataJoint if needed:
+
+```bash
+pip install --upgrade datajoint
+```
+
+## Creating Your Own Codecs
+
+If you need a codec that doesn't exist yet, see:
+
+- [Create Custom Codecs](create-custom-codec.md) — Step-by-step guide
+- [Codec API Specification](../reference/specs/codec-api.md) — Technical reference
+- [Custom Codecs Explanation](../explanation/custom-codecs.md) — Design concepts
+
+Consider publishing your codec as a package so others can benefit!
+
+## Best Practices
+
+### 1. Install Codecs with Your Project
+
+Add plugin codecs to your project dependencies:
+
+**requirements.txt:**
+```
+datajoint>=2.0.0a22
+dj-zarr-codecs>=0.1.0
+```
+
+**pyproject.toml:**
+```toml
+dependencies = [
+    "datajoint>=2.0.0a22",
+    "dj-zarr-codecs>=0.1.0",
+]
+```
+
+### 2. Document Codec Requirements
+
+In your pipeline documentation, specify required codecs:
+
+```python
+"""
+My Pipeline
+===========
+
+Requirements:
+- datajoint>=2.0.0a22
+- dj-zarr-codecs>=0.1.0  # For waveform storage
+
+Install:
+    pip install datajoint dj-zarr-codecs
+"""
+```
+
+### 3. Pin Versions for Reproducibility
+
+Use exact versions in production:
+
+```
+dj-zarr-codecs==0.1.0  # Exact version
+```
+
+Use minimum versions in libraries:
+
+```
+dj-zarr-codecs>=0.1.0  # Minimum version
+```
+
+### 4. Test Codec Availability
+
+Add checks in your pipeline setup:
+
+```python
+import datajoint as dj
+
+REQUIRED_CODECS = ['zarr']
+
+def check_requirements():
+    available = dj.list_codecs()
+    missing = [c for c in REQUIRED_CODECS if c not in available]
+
+    if missing:
+        raise ImportError(
+            f"Missing required codecs: {missing}\n"
+            f"Install with: pip install dj-zarr-codecs"
+        )
+
+check_requirements()
+```
+
+## See Also
+
+- [Use Object Storage](use-object-storage.md) — Object storage configuration
+- [Create Custom Codecs](create-custom-codec.md) — Build your own codecs
+- [Type System](../reference/specs/type-system.md) — Complete type reference
+- [dj-zarr-codecs Repository](https://github.com/datajoint/dj-zarr-codecs) — Example implementation

From 81c9f99c8c21b493d20a206ae8eb89868f8044ca Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 12:09:30 -0600
Subject: [PATCH 3/9] docs: Add dj-photon-codecs to plugin codecs guide

Update plugin codecs documentation to include dj-photon-codecs:
- Add to DataJoint-maintained codecs list
- Include in imaging domain examples
- Reference in See Also section

dj-photon-codecs provides Anscombe transformation + Zarr compression
for photon-limited imaging data (calcium imaging, low-light microscopy).
---
 src/how-to/use-plugin-codecs.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/how-to/use-plugin-codecs.md b/src/how-to/use-plugin-codecs.md
index 6820915c..aa3d626e 100644
--- a/src/how-to/use-plugin-codecs.md
+++ b/src/how-to/use-plugin-codecs.md
@@ -137,8 +137,9 @@ This organization makes external storage browsable and self-documenting.
 
 ### DataJoint-Maintained Codecs
 
-- **[dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs)** — Zarr array storage
-- **[anscombe-transform](https://github.com/datajoint/anscombe-transform)** — Anscombe variance stabilization for imaging
+- **[dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs)** — Zarr array storage for general numpy arrays
+- **[dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs)** — Photon-limited movies with Anscombe transformation and compression
+- **[anscombe-transform](https://github.com/datajoint/anscombe-transform)** — Anscombe variance stabilization (Zarr/Numcodecs integration)
 
 ### Community Codecs
 
@@ -158,6 +159,7 @@ Or browse GitHub: https://github.com/topics/datajoint
 - Connectivity matrices
 
 **Imaging:**
+- Photon-limited movies (calcium imaging, low-light microscopy)
 - OME-TIFF, OME-ZARR
 - DICOM medical images
 - Point cloud data
@@ -327,4 +329,5 @@ check_requirements()
 - [Use Object Storage](use-object-storage.md) — Object storage configuration
 - [Create Custom Codecs](create-custom-codec.md) — Build your own codecs
 - [Type System](../reference/specs/type-system.md) — Complete type reference
-- [dj-zarr-codecs Repository](https://github.com/datajoint/dj-zarr-codecs) — Example implementation
+- [dj-zarr-codecs Repository](https://github.com/datajoint/dj-zarr-codecs) — General Zarr array storage
+- [dj-photon-codecs Repository](https://github.com/datajoint/dj-photon-codecs) — Photon-limited movies with compression

From c0802b78a415f2220a5e6395cb3949946a722e34 Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 12:10:02 -0600
Subject: [PATCH 4/9] docs: Reference plugin codecs in custom codecs
 explanation

Add 'Before Creating Your Own' section to custom-codecs.md that directs
readers to check existing plugin codecs (dj-zarr-codecs, dj-photon-codecs,
anscombe-transform) before implementing their own.

Encourages reuse and ensures users are aware of existing solutions.
---
 src/explanation/custom-codecs.md | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/src/explanation/custom-codecs.md b/src/explanation/custom-codecs.md
index 44acf9f8..296b890b 100644
--- a/src/explanation/custom-codecs.md
+++ b/src/explanation/custom-codecs.md
@@ -334,3 +334,13 @@ Custom codecs enable:
 
 The codec system makes DataJoint extensible to any scientific domain without
 modifying the core framework.
+
+## Before Creating Your Own
+
+Check for existing plugin codecs that may already solve your needs:
+
+- **[dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs)** — General numpy arrays with Zarr storage
+- **[dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs)** — Photon-limited movies with Anscombe transformation
+- **[anscombe-transform](https://github.com/datajoint/anscombe-transform)** — Variance stabilization for imaging
+
+See the [Use Plugin Codecs](../how-to/use-plugin-codecs.md) guide for installation and usage of existing codec packages. Creating a custom codec is straightforward, but reusing existing ones saves time and ensures compatibility.

From cfa05d166c3412f842fe813c578d2a6c0a312aaa Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 12:16:57 -0600
Subject: [PATCH 5/9] docs: Remove anscombe-transform from DataJoint plugin
 codecs list

anscombe-transform is a Zarr/Numcodecs codec (not a DataJoint codec).
It doesn't have a datajoint.codecs entry point - it's a dependency
used by dj-photon-codecs, not a standalone DataJoint plugin codec.

Removed from:
- DataJoint-maintained codecs list in use-plugin-codecs.md
- Before Creating Your Own section in custom-codecs.md
---
 src/explanation/custom-codecs.md | 3 +--
 src/how-to/use-plugin-codecs.md  | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/explanation/custom-codecs.md b/src/explanation/custom-codecs.md
index 296b890b..5fefbc28 100644
--- a/src/explanation/custom-codecs.md
+++ b/src/explanation/custom-codecs.md
@@ -340,7 +340,6 @@ modifying the core framework.
 Check for existing plugin codecs that may already solve your needs:
 
 - **[dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs)** — General numpy arrays with Zarr storage
-- **[dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs)** — Photon-limited movies with Anscombe transformation
-- **[anscombe-transform](https://github.com/datajoint/anscombe-transform)** — Variance stabilization for imaging
+- **[dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs)** — Photon-limited movies with Anscombe transformation and compression
 
 See the [Use Plugin Codecs](../how-to/use-plugin-codecs.md) guide for installation and usage of existing codec packages. Creating a custom codec is straightforward, but reusing existing ones saves time and ensures compatibility.
diff --git a/src/how-to/use-plugin-codecs.md b/src/how-to/use-plugin-codecs.md
index aa3d626e..7bcaa77b 100644
--- a/src/how-to/use-plugin-codecs.md
+++ b/src/how-to/use-plugin-codecs.md
@@ -139,7 +139,6 @@ This organization makes external storage browsable and self-documenting.
 
 - **[dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs)** — Zarr array storage for general numpy arrays
 - **[dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs)** — Photon-limited movies with Anscombe transformation and compression
-- **[anscombe-transform](https://github.com/datajoint/anscombe-transform)** — Anscombe variance stabilization (Zarr/Numcodecs integration)
 
 ### Community Codecs
 

From 0652c85005ad6d3dfb3b65ca2706bd95dcc3e2bc Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 12:28:49 -0600
Subject: [PATCH 6/9] docs: Add comprehensive versioning and backward
 compatibility guide

Add detailed guidance on versioning plugin codecs for backward compatibility:

- Version strategy: package version vs data format version
- When to bump versions (breaking vs non-breaking changes)
- Implementation patterns for version dispatch
- Migration strategies (lazy, explicit, deprecation warnings)
- Real-world example with dj-photon-codecs evolution
- Testing version compatibility
- Semantic versioning guidelines for codec packages

Critical for maintaining data accessibility as codecs evolve.
---
 src/how-to/use-plugin-codecs.md | 258 ++++++++++++++++++++++++++++++++
 1 file changed, 258 insertions(+)

diff --git a/src/how-to/use-plugin-codecs.md b/src/how-to/use-plugin-codecs.md
index 7bcaa77b..4c5bb291 100644
--- a/src/how-to/use-plugin-codecs.md
+++ b/src/how-to/use-plugin-codecs.md
@@ -239,6 +239,264 @@ Upgrade DataJoint if needed:
 pip install --upgrade datajoint
 ```
 
+## Versioning and Backward Compatibility
+
+Plugin codecs evolve over time. Following versioning best practices ensures your data remains accessible across codec updates.
+
+### Version Strategy
+
+**Two version numbers matter:**
+
+1. **Package version** (semantic versioning: `0.1.0`, `1.0.0`, `2.0.0`)
+   - For codec package releases
+   - Follows standard semantic versioning
+
+2. **Data format version** (stored with each encoded value)
+   - Tracks storage format changes
+   - Enables decode() to handle multiple formats
+
+### Implementing Versioning
+
+**Include version in encoded metadata:**
+
+```python
+def encode(self, value, *, key=None, store_name=None):
+    # ... encoding logic ...
+
+    return {
+        "path": path,
+        "store": store_name,
+        "codec_version": "1.0",  # Data format version
+        "shape": list(value.shape),
+        "dtype": str(value.dtype),
+    }
+```
+
+**Handle multiple versions in decode:**
+
+```python
+def decode(self, stored, *, key=None):
+    version = stored.get("codec_version", "1.0")  # Default for old data
+
+    if version == "2.0":
+        return self._decode_v2(stored)
+    elif version == "1.0":
+        return self._decode_v1(stored)
+    else:
+        raise DataJointError(f"Unsupported codec version: {version}")
+```
+
+### When to Bump Versions
+
+**Bump data format version when:**
+- ✅ Changing storage structure or encoding algorithm
+- ✅ Modifying metadata schema
+- ✅ Changing compression parameters that affect decode
+
+**Don't bump for:**
+- ❌ Bug fixes that don't affect stored data format
+- ❌ Performance improvements to encode/decode logic
+- ❌ Adding new optional features (store version in attributes instead)
+
+### Backward Compatibility Patterns
+
+**Pattern 1: Version dispatch in decode()**
+
+```python
+class MyCodec(SchemaCodec):
+    name = "mycodec"
+    CURRENT_VERSION = "2.0"
+
+    def encode(self, value, *, key=None, store_name=None):
+        # Always encode with current version
+        metadata = {
+            "codec_version": self.CURRENT_VERSION,
+            # ... other metadata ...
+        }
+        return metadata
+
+    def decode(self, stored, *, key=None):
+        version = stored.get("codec_version", "1.0")
+
+        if version == "2.0":
+            # Current version - optimized path
+            return self._decode_current(stored)
+        elif version == "1.0":
+            # Legacy version - compatibility path
+            return self._decode_legacy_v1(stored)
+        else:
+            raise DataJointError(
+                f"Cannot decode {self.name} version {version}. "
+                f"Upgrade codec package or migrate data."
+            )
+```
+
+**Pattern 2: Zarr attributes for feature versions**
+
+For codecs using Zarr (like dj-zarr-codecs, dj-photon-codecs):
+
+```python
+def encode(self, value, *, key=None, store_name=None):
+    # ... write to Zarr ...
+
+    z = zarr.open(store_map, mode="r+")
+    z.attrs["codec_version"] = "2.0"
+    z.attrs["codec_name"] = self.name
+    z.attrs["feature_flags"] = ["compression", "chunking"]
+
+    return {
+        "path": path,
+        "store": store_name,
+        "codec_version": "2.0",  # Also in DB for quick access
+    }
+
+def decode(self, stored, *, key=None):
+    z = zarr.open(store_map, mode="r")
+    version = z.attrs.get("codec_version", "1.0")
+
+    # Handle version-specific decoding
+    if version == "2.0":
+        return z  # Return Zarr array directly
+    else:
+        return self._migrate_v1_to_v2(z)
+```
+
+### Migration Strategies
+
+**Strategy 1: Lazy migration (recommended)**
+
+Old data is migrated when accessed:
+
+```python
+def decode(self, stored, *, key=None):
+    version = stored.get("codec_version", "1.0")
+
+    if version == "1.0":
+        # Decode old format
+        data = self._decode_v1(stored)
+
+        # Optionally: re-encode to new format in background
+        # (requires database write access)
+        return data
+
+    return self._decode_current(stored)
+```
+
+**Strategy 2: Explicit migration script**
+
+For breaking changes, provide migration tools:
+
+```python
+# migration_tool.py
+def migrate_table_to_v2(table, field_name):
+    """Migrate all rows to codec version 2.0."""
+    for key in table.fetch("KEY"):
+        # Fetch with old codec
+        data = (table & key).fetch1(field_name)
+
+        # Re-insert with new codec (triggers encode)
+        table.update1({**key, field_name: data})
+```
+
+**Strategy 3: Deprecation warnings**
+
+```python
+def decode(self, stored, *, key=None):
+    version = stored.get("codec_version", "1.0")
+
+    if version == "1.0":
+        import warnings
+        warnings.warn(
+            f"Reading {self.name} v1.0 data. Support will be removed in v3.0. "
+            f"Please migrate: pip install {self.name}-migrate && migrate-data",
+            DeprecationWarning
+        )
+        return self._decode_v1(stored)
+```
+
+### Real-World Example: dj-photon-codecs Evolution
+
+**Version 1.0** (current):
+- Stores Anscombe-transformed data
+- Fixed compression (Blosc zstd level 5)
+- Fixed chunking (100 frames)
+
+**Hypothetical Version 2.0** (backward compatible):
+```python
+def encode(self, value, *, key=None, store_name=None):
+    # New: configurable compression
+    compression_level = getattr(self, 'compression_level', 5)
+
+    zarr.save_array(
+        store_map,
+        transformed,
+        compressor=zarr.Blosc(cname="zstd", clevel=compression_level),
+    )
+
+    z = zarr.open(store_map, mode="r+")
+    z.attrs["codec_version"] = "2.0"
+    z.attrs["compression_level"] = compression_level
+
+    return {
+        "path": path,
+        "codec_version": "2.0",  # <-- NEW
+        # ... rest same ...
+    }
+
+def decode(self, stored, *, key=None):
+    z = zarr.open(store_map, mode="r")
+    version = z.attrs.get("codec_version", "1.0")
+
+    # Both versions return zarr.Array - fully compatible!
+    if version in ("1.0", "2.0"):
+        return z
+    else:
+        raise DataJointError(f"Unsupported version: {version}")
+```
+
+### Testing Version Compatibility
+
+Include tests for version compatibility:
+
+```python
+def test_decode_v1_data():
+    """Ensure new codec can read old data."""
+    # Load fixture with v1.0 data
+    old_data = load_v1_fixture()
+
+    # Decode with current codec
+    codec = PhotonCodec()
+    result = codec.decode(old_data)
+
+    assert result.shape == (1000, 512, 512)
+    assert result.dtype == np.float64
+```
+
+### Package Version Guidelines
+
+Follow semantic versioning for codec packages:
+
+- **Patch (0.1.0 → 0.1.1)**: Bug fixes, no data format changes
+- **Minor (0.1.0 → 0.2.0)**: New features, backward compatible
+- **Major (0.1.0 → 1.0.0)**: Breaking changes (may require migration)
+
+**Example changelog:**
+
+```
+v2.0.0 (2026-02-01) - BREAKING
+  - Changed default compression from zstd-5 to zstd-3
+  - Data format v2.0 (can still read v1.0)
+  - Migration guide: docs/migration-v2.md
+
+v1.1.0 (2026-01-15)
+  - Added configurable chunk sizes (backward compatible)
+  - Data format still v1.0
+
+v1.0.1 (2026-01-10)
+  - Fixed edge case in Anscombe inverse transform
+  - Data format unchanged (v1.0)
+```
+
 ## Creating Your Own Codecs
 
 If you need a codec that doesn't exist yet, see:

From 656bceda9f4a15f4b4b6f7eea7a75f338ccfb73c Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 12:37:01 -0600
Subject: [PATCH 7/9] docs: Clarify built-in vs plugin codec versioning

Add section explaining why built-in codecs don't need explicit versioning:
- Built-in codecs versioned with DataJoint releases
- Plugin codecs have independent lifecycles and need codec_version
- DataJoint's semantic versioning handles built-in codec evolution
- Plugin versioning protects against independent evolution

Key distinction: Built-in codecs are part of DataJoint's API surface
(versioned by framework), while plugin codecs are independent packages
(need self-versioning).
---
 src/how-to/use-plugin-codecs.md | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/how-to/use-plugin-codecs.md b/src/how-to/use-plugin-codecs.md
index 4c5bb291..3bd5ce54 100644
--- a/src/how-to/use-plugin-codecs.md
+++ b/src/how-to/use-plugin-codecs.md
@@ -243,9 +243,28 @@ pip install --upgrade datajoint
 
 Plugin codecs evolve over time. Following versioning best practices ensures your data remains accessible across codec updates.
 
+### Built-in vs Plugin Codec Versioning
+
+**Built-in codecs** (`<blob>`, `<npy@>`, `<object@>`, etc.) are versioned with DataJoint:
+- ✅ Shipped with datajoint-python
+- ✅ Versioned by DataJoint release (2.0.0, 2.1.0, 3.0.0)
+- ✅ Upgraded when you upgrade DataJoint
+- ✅ Stability guaranteed by DataJoint's semantic versioning
+- ❌ **No explicit codec_version field needed** - DataJoint version is the codec version
+
+**Plugin codecs** (dj-zarr-codecs, dj-photon-codecs, etc.) have independent lifecycles:
+- ✅ Installed separately from DataJoint
+- ✅ Independent version numbers (0.1.0 → 1.0.0 → 2.0.0)
+- ✅ Users choose when to upgrade
+- ✅ **Must include explicit codec_version field** for backward compatibility
+
+**Why the difference?**
+
+Plugin codecs evolve independently and need to handle data encoded by different plugin versions. Built-in codecs are part of DataJoint's API surface and evolve with the framework itself. When you upgrade DataJoint 2.0 → 3.0, you expect potential breaking changes. When you upgrade a plugin 1.0 → 2.0 while keeping DataJoint 2.0, backward compatibility is critical.
+
 ### Version Strategy
 
-**Two version numbers matter:**
+**Two version numbers matter for plugin codecs:**
 
 1. **Package version** (semantic versioning: `0.1.0`, `1.0.0`, `2.0.0`)
    - For codec package releases

From 78047a97c4351185e405e01df20a358e0292a47d Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 12:42:38 -0600
Subject: [PATCH 8/9] docs: Add detailed <blob> serialization format
 documentation

Add comprehensive documentation of DataJoint's custom blob serialization:

Explanation docs (type-system.md):
- Protocol headers (mYm for MATLAB compat, dj0 for Python-extended)
- Optional zlib compression for data > 1KB
- Type-specific encoding with serialization codes
- Version detection via embedded protocol headers
- Supported types list
- Storage modes (<blob> vs <blob@>)

Reference docs (type-system.md):
- Detailed type code mapping for all supported Python types
- Protocol header format (mYm\0, dj0\0)
- Version detection mechanism
- MD5 deduplication for <blob@>

Clarifies that <blob> does NOT use pickle - it uses DataJoint's
custom binary format with intrinsic versioning via protocol headers.
---
 src/explanation/type-system.md     | 19 ++++++++++++++++-
 src/reference/specs/type-system.md | 34 +++++++++++++++++++++++++++---
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md
index 4e2c4bb2..74b76d76 100644
--- a/src/explanation/type-system.md
+++ b/src/explanation/type-system.md
@@ -111,7 +111,20 @@ Codecs provide `encode()`/`decode()` semantics for complex Python objects.
 
 ### `<blob>` — Serialized Python Objects
 
-Stores NumPy arrays, dicts, lists, and other Python objects.
+Stores NumPy arrays, dicts, lists, and other Python objects using DataJoint's custom binary serialization format.
+
+**Serialization format:**
+- **Protocol headers**: `mYm` (MATLAB-compatible) or `dj0` (Python-extended)
+- **Optional compression**: zlib compression for data > 1KB
+- **Type-specific encoding**: Each Python type has a specific serialization code
+- **Version detection**: Protocol header embedded in blob enables format detection
+
+**Supported types:**
+- NumPy arrays (numeric, structured, recarrays)
+- Collections (dict, list, tuple, set)
+- Scalars (int, float, bool, complex, str, bytes)
+- Date/time objects (datetime, date, time)
+- UUID, Decimal
 
 ```python
 class Results(dj.Computed):
@@ -124,6 +137,10 @@ class Results(dj.Computed):
     """
 ```
 
+**Storage modes:**
+- `<blob>` — Stored in database as LONGBLOB (up to ~1GB depending on MySQL config)
+- `<blob@>` — Stored externally via `<hash@>` with MD5 deduplication
+
 ### `<attach>` — File Attachments
 
 Stores files with filename preserved.
diff --git a/src/reference/specs/type-system.md b/src/reference/specs/type-system.md
index 80c6bd05..cb3d483a 100644
--- a/src/reference/specs/type-system.md
+++ b/src/reference/specs/type-system.md
@@ -498,11 +498,39 @@ The `json` database type:
 
 **Supports both internal and external storage.**
 
-Serializes Python objects (NumPy arrays, dicts, lists, etc.) using DataJoint's
-blob format. Compatible with MATLAB.
+Serializes Python objects using DataJoint's custom binary serialization format. The format uses protocol headers and type-specific encoding to serialize complex Python objects efficiently.
+
+**Serialization format:**
+
+- **Protocol headers**:
+  - `mYm` — Original MATLAB-compatible format for numeric arrays, structs, cells
+  - `dj0` — Extended format supporting Python-specific types (UUID, Decimal, datetime, etc.)
+- **Compression**: Automatic zlib compression for data > 1KB
+- **Type codes**: Each Python type has a specific serialization code:
+  - `'A'` — NumPy arrays (numeric)
+  - `'F'` — NumPy recarrays (structured arrays with fields)
+  - `'\x01'` — Tuples
+  - `'\x02'` — Lists
+  - `'\x03'` — Sets
+  - `'\x04'` — Dicts
+  - `'\x05'` — Strings (UTF-8)
+  - `'\x06'` — Bytes
+  - `'\x0a'` — Unbounded integers
+  - `'\x0b'` — Booleans
+  - `'\x0c'` — Complex numbers
+  - `'\x0d'` — Floats
+  - `'d'` — Decimal
+  - `'t'` — Datetime/date/time
+  - `'u'` — UUID
+  - `'S'` — MATLAB structs
+  - `'C'` — MATLAB cell arrays
+
+**Version detection**: The protocol header (`mYm\0` or `dj0\0`) is embedded at the start of the blob, enabling automatic format detection and backward compatibility.
+
+**Storage modes:**
 
 - **`<blob>`**: Stored in database (`bytes` → `LONGBLOB`/`BYTEA`)
-- **`<blob@>`**: Stored externally via `<hash@>` with deduplication
+- **`<blob@>`**: Stored externally via `<hash@>` with MD5 deduplication
 - **`<blob@store>`**: Stored in specific named store
 
 ```python

From b341385e0aa5d49532f08ec775ad4812417f7b94 Mon Sep 17 00:00:00 2001
From: Dimitri Yatsenko <dimitri.yatsenko@gmail.com>
Date: Fri, 16 Jan 2026 12:43:32 -0600
Subject: [PATCH 9/9] docs: Add mYm references and intrinsic versioning
 explanation

Add references to mYm format documentation:
- MATLAB FileExchange: https://www.mathworks.com/matlabcentral/fileexchange/81208-mym
- GitHub repository: https://github.com/datajoint/mym

Add intrinsic versioning explanation to plugin codecs guide:
- How built-in codecs embed version in data format
- Protocol headers in <blob> (mYm\0, dj0\0)
- NumPy format version in <npy@> headers
- Self-describing structure in <object@>
- Why built-in codecs don't need explicit codec_version field

Clarifies the distinction between built-in codecs (intrinsic versioning)
and plugin codecs (explicit codec_version field).
---
 src/explanation/type-system.md     |  4 +++-
 src/how-to/use-plugin-codecs.md    | 11 +++++++++++
 src/reference/specs/type-system.md |  2 +-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/src/explanation/type-system.md b/src/explanation/type-system.md
index 74b76d76..3b84dbd5 100644
--- a/src/explanation/type-system.md
+++ b/src/explanation/type-system.md
@@ -114,7 +114,9 @@ Codecs provide `encode()`/`decode()` semantics for complex Python objects.
 Stores NumPy arrays, dicts, lists, and other Python objects using DataJoint's custom binary serialization format.
 
 **Serialization format:**
-- **Protocol headers**: `mYm` (MATLAB-compatible) or `dj0` (Python-extended)
+- **Protocol headers**:
+  - `mYm` — MATLAB-compatible format (see [mYm on MATLAB FileExchange](https://www.mathworks.com/matlabcentral/fileexchange/81208-mym) and [mym on GitHub](https://github.com/datajoint/mym))
+  - `dj0` — Python-extended format supporting additional types
 - **Optional compression**: zlib compression for data > 1KB
 - **Type-specific encoding**: Each Python type has a specific serialization code
 - **Version detection**: Protocol header embedded in blob enables format detection
diff --git a/src/how-to/use-plugin-codecs.md b/src/how-to/use-plugin-codecs.md
index 3bd5ce54..d59fa45d 100644
--- a/src/how-to/use-plugin-codecs.md
+++ b/src/how-to/use-plugin-codecs.md
@@ -262,6 +262,17 @@ Plugin codecs evolve over time. Following versioning best practices ensures your
 
 Plugin codecs evolve independently and need to handle data encoded by different plugin versions. Built-in codecs are part of DataJoint's API surface and evolve with the framework itself. When you upgrade DataJoint 2.0 → 3.0, you expect potential breaking changes. When you upgrade a plugin 1.0 → 2.0 while keeping DataJoint 2.0, backward compatibility is critical.
 
+**How built-in codecs handle versioning:**
+
+Built-in formats have **intrinsic versioning** - the format version is embedded in the data itself:
+
+- `<blob>` — Protocol header (`mYm\0` or `dj0\0`) at start of blob
+- `<npy@>` — NumPy format version in `.npy` file header
+- `<object@>` — Self-describing directory structure
+- `<attach>` — Filename + content (format-agnostic)
+
+When DataJoint needs to change a built-in codec's format, it can detect the old format from the embedded version information and handle migration transparently. This is why built-in codecs don't need an explicit `codec_version` field in database metadata.
+
 ### Version Strategy
 
 **Two version numbers matter for plugin codecs:**
diff --git a/src/reference/specs/type-system.md b/src/reference/specs/type-system.md
index cb3d483a..0efa0e71 100644
--- a/src/reference/specs/type-system.md
+++ b/src/reference/specs/type-system.md
@@ -503,7 +503,7 @@ Serializes Python objects using DataJoint's custom binary serialization format.
 **Serialization format:**
 
 - **Protocol headers**:
-  - `mYm` — Original MATLAB-compatible format for numeric arrays, structs, cells
+  - `mYm` — Original MATLAB-compatible format for numeric arrays, structs, cells (see [mYm on MATLAB FileExchange](https://www.mathworks.com/matlabcentral/fileexchange/81208-mym) and [mym on GitHub](https://github.com/datajoint/mym))
   - `dj0` — Extended format supporting Python-specific types (UUID, Decimal, datetime, etc.)
 - **Compression**: Automatic zlib compression for data > 1KB
 - **Type codes**: Each Python type has a specific serialization code: