DataJoint's type system provides a three-layer architecture that balances database efficiency with Python convenience.
graph TB
subgraph "Layer 3: Codecs"
blob["‹blob›"]
attach["‹attach›"]
object["‹object@›"]
hash["‹hash@›"]
custom["‹custom›"]
end
subgraph "Layer 2: Core Types"
int32
float64
varchar
json
bytes
end
subgraph "Layer 1: Native"
INT["INT"]
DOUBLE["DOUBLE"]
VARCHAR["VARCHAR"]
JSON_N["JSON"]
BLOB["LONGBLOB"]
end
blob --> bytes
attach --> bytes
object --> json
hash --> json
bytes --> BLOB
json --> JSON_N
int32 --> INT
float64 --> DOUBLE
varchar --> VARCHAR
Backend-specific types (MySQL, PostgreSQL). Discouraged for direct use.
# Native types (avoid)
column : TINYINT UNSIGNED
column : MEDIUMBLOBStandardized, scientist-friendly types that work identically across backends.
| Type | Description | Range |
|---|---|---|
int8 |
8-bit signed | -128 to 127 |
int16 |
16-bit signed | -32,768 to 32,767 |
int32 |
32-bit signed | ±2 billion |
int64 |
64-bit signed | ±9 quintillion |
uint8 |
8-bit unsigned | 0 to 255 |
uint16 |
16-bit unsigned | 0 to 65,535 |
uint32 |
32-bit unsigned | 0 to 4 billion |
uint64 |
64-bit unsigned | 0 to 18 quintillion |
float32 |
32-bit float | ~7 significant digits |
float64 |
64-bit float | ~15 significant digits |
decimal(n,f) |
Fixed-point | Exact decimal |
| Type | Description |
|---|---|
char(n) |
Fixed-length string |
varchar(n) |
Variable-length string |
enum(...) |
Enumeration of string labels |
| Type | Description |
|---|---|
bool |
True/False |
date |
Date only |
datetime |
Date and time (UTC) |
json |
JSON document |
uuid |
Universally unique identifier |
bytes |
Raw binary |
Codecs provide encode()/decode() semantics for complex Python objects.
- Angle brackets:
<blob>,<attach>,<object@store> @indicates external storage:<blob@>stores externally- Store name:
<blob@cold>uses named store "cold"
| Codec | Internal | External | Returns |
|---|---|---|---|
<blob> |
✅ | ✅ <blob@> |
Python object |
<attach> |
✅ | ✅ <attach@> |
Local file path |
<object@> |
❌ | ✅ | ObjectRef |
<hash@> |
❌ | ✅ | bytes |
<filepath@> |
❌ | ✅ | ObjectRef |
Stores NumPy arrays, dicts, lists, and other Python objects using DataJoint's custom binary serialization format.
Serialization format:
- Protocol headers:
mYm— MATLAB-compatible format (see mYm on MATLAB FileExchange and mym on GitHub)dj0— Python-extended format supporting additional types
- Optional compression: zlib compression for data > 1KB
- Type-specific encoding: Each Python type has a specific serialization code
- Version detection: Protocol header embedded in blob enables format detection
Supported types:
- NumPy arrays (numeric, structured, recarrays)
- Collections (dict, list, tuple, set)
- Scalars (int, float, bool, complex, str, bytes)
- Date/time objects (datetime, date, time)
- UUID, Decimal
class Results(dj.Computed):
definition = """
-> Analysis
---
spike_times : <blob> # In database
waveforms : <blob@> # External, default store
raw_data : <blob@archive> # External, 'archive' store
"""Storage modes:
<blob>— Stored in database as LONGBLOB (up to ~1GB depending on MySQL config)<blob@>— Stored externally via<hash@>with MD5 deduplication
Stores files with filename preserved.
class Config(dj.Manual):
definition = """
config_id : int
---
settings : <attach> # Small config file
data_file : <attach@> # Large file, external
"""For large/complex file structures (Zarr, HDF5). Path derived from primary key.
class ProcessedData(dj.Computed):
definition = """
-> Recording
---
zarr_data : <object@> # Stored at {schema}/{table}/{pk}/
"""References to externally-managed files with portable paths.
class RawData(dj.Manual):
definition = """
session_id : int
---
recording : <filepath@raw> # Relative to 'raw' store
"""| Mode | Database Storage | External Storage | Use Case |
|---|---|---|---|
| Internal | Yes | No | Small data |
| External | Metadata only | Yes | Large data |
| Hash-addressed | Metadata only | Deduplicated | Repeated data |
| Path-addressed | Metadata only | PK-based path | Complex files |
Extend the type system for domain-specific data:
class GraphCodec(dj.Codec):
"""Store NetworkX graphs."""
name = "graph"
def get_dtype(self, is_external):
return "<blob>"
def encode(self, graph, *, key=None, store_name=None):
return {
'nodes': list(graph.nodes()),
'edges': list(graph.edges())
}
def decode(self, stored, *, key=None):
import networkx as nx
G = nx.Graph()
G.add_nodes_from(stored['nodes'])
G.add_edges_from(stored['edges'])
return GUsage:
class Network(dj.Computed):
definition = """
-> Analysis
---
connectivity : <graph>
"""| Data | Recommended Type |
|---|---|
| Small scalars | Core types (int32, float64) |
| Short strings | varchar(n) |
| NumPy arrays (small) | <blob> |
| NumPy arrays (large) | <blob@> |
| Files to attach | <attach> or <attach@> |
| Zarr/HDF5 | <object@> |
| External file refs | <filepath@store> |
| Custom objects | Custom codec |
- Core types for simple data —
int32,varchar,datetime <blob>for Python objects — NumPy arrays, dicts@suffix for external storage —<blob@>,<object@>- Custom codecs for domain-specific types