Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ nav:
- Object Storage:
- Use Object Storage: how-to/use-object-storage.md
- Use NPY Codec: how-to/use-npy-codec.md
- Use Plugin Codecs: how-to/use-plugin-codecs.md
- Create Custom Codecs: how-to/create-custom-codec.md
- Manage Large Data: how-to/manage-large-data.md
- Clean Up Storage: how-to/garbage-collection.md
Expand Down
9 changes: 9 additions & 0 deletions src/explanation/custom-codecs.md
Original file line number Diff line number Diff line change
Expand Up @@ -334,3 +334,12 @@ Custom codecs enable:

The codec system makes DataJoint extensible to any scientific domain without
modifying the core framework.

## Before Creating Your Own

Check for existing plugin codecs that may already solve your needs:

- **[dj-zarr-codecs](https://github.com/datajoint/dj-zarr-codecs)** — General numpy arrays with Zarr storage
- **[dj-photon-codecs](https://github.com/datajoint/dj-photon-codecs)** — Photon-limited movies with Anscombe transformation and compression

See the [Use Plugin Codecs](../how-to/use-plugin-codecs.md) guide for installation and usage of existing codec packages. Creating a custom codec is straightforward, but reusing existing ones saves time and ensures compatibility.
21 changes: 20 additions & 1 deletion src/explanation/type-system.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,22 @@ Codecs provide `encode()`/`decode()` semantics for complex Python objects.

### `<blob>` — Serialized Python Objects

Stores NumPy arrays, dicts, lists, and other Python objects.
Stores NumPy arrays, dicts, lists, and other Python objects using DataJoint's custom binary serialization format.

**Serialization format:**
- **Protocol headers**:
- `mYm` — MATLAB-compatible format (see [mYm on MATLAB FileExchange](https://www.mathworks.com/matlabcentral/fileexchange/81208-mym) and [mym on GitHub](https://github.com/datajoint/mym))
- `dj0` — Python-extended format supporting additional types
- **Optional compression**: zlib compression for data > 1KB
- **Type-specific encoding**: Each Python type has a specific serialization code
- **Version detection**: Protocol header embedded in blob enables format detection

**Supported types:**
- NumPy arrays (numeric, structured, recarrays)
- Collections (dict, list, tuple, set)
- Scalars (int, float, bool, complex, str, bytes)
- Date/time objects (datetime, date, time)
- UUID, Decimal

```python
class Results(dj.Computed):
Expand All @@ -124,6 +139,10 @@ class Results(dj.Computed):
"""
```

**Storage modes:**
- `<blob>` — Stored in database as LONGBLOB (up to ~1GB depending on MySQL config)
- `<blob@>` — Stored externally via `<hash@>` with MD5 deduplication

### `<attach>` — File Attachments

Stores files with filename preserved.
Expand Down
1 change: 1 addition & 0 deletions src/how-to/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ they assume you understand the basics and focus on getting things done.
- [Object Storage Overview](object-storage-overview.md) — Navigation guide for all storage docs
- [Choose a Storage Type](choose-storage-type.md) — Decision guide for codecs
- [Use Object Storage](use-object-storage.md) — When and how
- [Use Plugin Codecs](use-plugin-codecs.md) — Install codec packages via entry points
- [Create Custom Codecs](create-custom-codec.md) — Domain-specific types
- [Manage Large Data](manage-large-data.md) — Blobs, streaming, efficiency
- [Clean Up External Storage](garbage-collection.md) — Garbage collection
Expand Down
Loading