Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 2 additions & 14 deletions src/tutorials/advanced/sql-comparison.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1256,19 +1256,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Algebraic Closure\n",
"\n",
"In standard SQL, query results are just \"bags of rows\" — they don't have a defined entity type. You cannot know what kind of thing each row represents without external context.\n",
"\n",
"DataJoint achieves **algebraic closure**: every query result is a valid entity set with a well-defined **entity type**. You always know what kind of entity the result represents, identified by a specific primary key. This means:\n",
"\n",
"1. **Every operator returns a valid relation** — not just rows, but a set of entities of a known type\n",
"2. **Operators compose indefinitely** — you can chain any sequence of operations\n",
"3. **Results remain queryable** — a query result can be used as an operand in further operations\n",
"\n",
"The entity type (and its primary key) is determined by precise rules based on the operator and the functional dependencies between operands. See the [Primary Keys specification](../../reference/specs/primary-keys.md) for details."
]
"source": "### Algebraic Closure\n\nIn standard SQL, query results are just \"bags of rows\" — they don't have a defined entity type. You cannot know what kind of thing each row represents without external context.\n\nDataJoint achieves **algebraic closure**: every query result is a valid entity set with a well-defined **entity type**. You always know what kind of entity the result represents, identified by a specific primary key. This means:\n\n1. **Every operator returns a valid relation** — not just rows, but a set of entities of a known type\n2. **Operators compose indefinitely** — you can chain any sequence of operations\n3. **Results remain queryable** — a query result can be used as an operand in further operations\n\nThe entity type (and its primary key) is determined by precise rules based on the operator and the functional dependencies between operands. See the [Primary Keys specification](../../reference/specs/primary-keys) for details."
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -3101,4 +3089,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
4 changes: 2 additions & 2 deletions src/tutorials/basics/01-first-pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "# A Simple Pipeline\n\nThis tutorial introduces DataJoint by building a simple research lab database. You'll learn to:\n\n- Define tables with primary keys and dependencies\n- Insert and query data\n- Use the four core operations: restriction, projection, join, aggregation\n- Understand the schema diagram\n\nWe'll work with **Manual tables** only—tables where you enter data directly. Later tutorials introduce automated computation.\n\nFor complete working examples, see:\n- [University Database](../examples/university/) — Academic records with complex queries\n- [Blob Detection](../examples/blob-detection/) — Image processing with computation"
"source": "# A Simple Pipeline\n\nThis tutorial introduces DataJoint by building a simple research lab database. You'll learn to:\n\n- Define tables with primary keys and dependencies\n- Insert and query data\n- Use the four core operations: restriction, projection, join, aggregation\n- Understand the schema diagram\n\nWe'll work with **Manual tables** only—tables where you enter data directly. Later tutorials introduce automated computation.\n\nFor complete working examples, see:\n- [University Database](../../examples/university) — Academic records with complex queries\n- [Blob Detection](../../examples/blob-detection) — Image processing with computation"
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -2683,7 +2683,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Summary\n\nYou've learned the fundamentals of DataJoint:\n\n| Concept | Description |\n|---------|-------------|\n| **Tables** | Python classes with a `definition` string |\n| **Primary key** | Above `---`, uniquely identifies rows |\n| **Dependencies** | `->` creates foreign keys |\n| **Restriction** | `&` filters rows |\n| **Projection** | `.proj()` selects/computes columns |\n| **Join** | `*` combines tables |\n| **Aggregation** | `.aggr()` summarizes groups |\n\n### Next Steps\n\n- [Schema Design](02-schema-design/) — Primary keys, relationships, table tiers\n- [Queries](04-queries/) — Advanced query patterns\n- [Computation](05-computation/) — Automated processing with Imported/Computed tables\n\n### Complete Examples\n\n- [University Database](../examples/university/) — Complex queries on academic records\n- [Blob Detection](../examples/blob-detection/) — Image processing pipeline with computation"
"source": "## Summary\n\nYou've learned the fundamentals of DataJoint:\n\n| Concept | Description |\n|---------|-------------|\n| **Tables** | Python classes with a `definition` string |\n| **Primary key** | Above `---`, uniquely identifies rows |\n| **Dependencies** | `->` creates foreign keys |\n| **Restriction** | `&` filters rows |\n| **Projection** | `.proj()` selects/computes columns |\n| **Join** | `*` combines tables |\n| **Aggregation** | `.aggr()` summarizes groups |\n\n### Next Steps\n\n- [Schema Design](../02-schema-design) — Primary keys, relationships, table tiers\n- [Queries](../04-queries) — Advanced query patterns\n- [Computation](../05-computation) — Automated processing with Imported/Computed tables\n\n### Complete Examples\n\n- [University Database](../../examples/university) — Complex queries on academic records\n- [Blob Detection](../../examples/blob-detection) — Image processing pipeline with computation"
},
{
"cell_type": "code",
Expand Down
2 changes: 1 addition & 1 deletion src/tutorials/basics/02-schema-design.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1530,7 +1530,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Best Practices\n\n### 1. Choose Meaningful Primary Keys\n- Use natural identifiers when possible (`subject_id = 'M001'`)\n- Keep keys minimal but sufficient for uniqueness\n\n### 2. Use Appropriate Table Tiers\n- **Manual**: Data entered by operators or instruments\n- **Lookup**: Configuration, parameters, reference data\n- **Imported**: Data read from files (recordings, images)\n- **Computed**: Derived analyses and summaries\n\n### 3. Normalize Your Data\n- Don't repeat information across rows\n- Create separate tables for distinct entities\n- Use foreign keys to link related data\n\n### 4. Use Core DataJoint Types\n\nDataJoint has a three-layer type architecture (see [Type System Specification](../../reference/specs/type-system/)):\n\n1. **Native database types** (Layer 1): Backend-specific types like `INT`, `FLOAT`, `TINYINT UNSIGNED`. These are **discouraged** but allowed for backward compatibility.\n\n2. **Core DataJoint types** (Layer 2): Standardized, scientist-friendly types that work identically across MySQL and PostgreSQL. **Always prefer these.**\n\n3. **Codec types** (Layer 3): Types with `encode()`/`decode()` semantics like `<blob>`, `<attach>`, `<object@>`.\n\n**Core types used in this tutorial:**\n\n| Type | Description | Example |\n|------|-------------|---------|\n| `uint8`, `uint16`, `int32` | Sized integers | `session_idx : uint16` |\n| `float32`, `float64` | Sized floats | `reaction_time : float32` |\n| `varchar(n)` | Variable-length string | `name : varchar(100)` |\n| `bool` | Boolean | `correct : bool` |\n| `date` | Date only | `date_of_birth : date` |\n| `datetime` | Date and time (UTC) | `created_at : datetime` |\n| `enum(...)` | Enumeration | `sex : enum('M', 'F', 'U')` |\n| `json` | JSON document | `task_params : json` |\n| `uuid` | Universally unique ID | `experimenter_id : uuid` |\n\n**Why native types are allowed but discouraged:**\n\nNative types (like `int`, `float`, `tinyint`) are passed through to the database but generate a **warning at declaration time**. They are discouraged because:\n- They lack explicit size information\n- They are not portable across database backends\n- They are not recorded in field metadata for reconstruction\n\nIf you see a warning like `\"Native type 'int' used; consider 'int32' instead\"`, update your definition to use the corresponding core type.\n\n### 5. Document Your Tables\n- Add comments after `#` in definitions\n- Document units in attribute comments\n\n## Key Concepts Recap\n\n| Concept | Description |\n|---------|-------------|\n| **Primary Key** | Attributes above `---` that uniquely identify rows |\n| **Secondary Attributes** | Attributes below `---` that store additional data |\n| **Foreign Key** (`->`) | Reference to another table, imports its primary key |\n| **One-to-Many** | FK in primary key: parent has many children |\n| **One-to-One** | FK is entire primary key: exactly one child per parent |\n| **Master-Part** | Compositional integrity: master and parts inserted/deleted atomically |\n| **Nullable FK** | `[nullable]` makes the reference optional |\n| **Lookup Table** | Pre-populated reference data |\n\n## Next Steps\n\n- [Data Entry](03-data-entry/) — Inserting, updating, and deleting data\n- [Queries](04-queries/) — Filtering, joining, and projecting\n- [Computation](05-computation/) — Building computational pipelines"
"source": "## Best Practices\n\n### 1. Choose Meaningful Primary Keys\n- Use natural identifiers when possible (`subject_id = 'M001'`)\n- Keep keys minimal but sufficient for uniqueness\n\n### 2. Use Appropriate Table Tiers\n- **Manual**: Data entered by operators or instruments\n- **Lookup**: Configuration, parameters, reference data\n- **Imported**: Data read from files (recordings, images)\n- **Computed**: Derived analyses and summaries\n\n### 3. Normalize Your Data\n- Don't repeat information across rows\n- Create separate tables for distinct entities\n- Use foreign keys to link related data\n\n### 4. Use Core DataJoint Types\n\nDataJoint has a three-layer type architecture (see [Type System Specification](../../reference/specs/type-system/)):\n\n1. **Native database types** (Layer 1): Backend-specific types like `INT`, `FLOAT`, `TINYINT UNSIGNED`. These are **discouraged** but allowed for backward compatibility.\n\n2. **Core DataJoint types** (Layer 2): Standardized, scientist-friendly types that work identically across MySQL and PostgreSQL. **Always prefer these.**\n\n3. **Codec types** (Layer 3): Types with `encode()`/`decode()` semantics like `<blob>`, `<attach>`, `<object@>`.\n\n**Core types used in this tutorial:**\n\n| Type | Description | Example |\n|------|-------------|---------|\n| `uint8`, `uint16`, `int32` | Sized integers | `session_idx : uint16` |\n| `float32`, `float64` | Sized floats | `reaction_time : float32` |\n| `varchar(n)` | Variable-length string | `name : varchar(100)` |\n| `bool` | Boolean | `correct : bool` |\n| `date` | Date only | `date_of_birth : date` |\n| `datetime` | Date and time (UTC) | `created_at : datetime` |\n| `enum(...)` | Enumeration | `sex : enum('M', 'F', 'U')` |\n| `json` | JSON document | `task_params : json` |\n| `uuid` | Universally unique ID | `experimenter_id : uuid` |\n\n**Why native types are allowed but discouraged:**\n\nNative types (like `int`, `float`, `tinyint`) are passed through to the database but generate a **warning at declaration time**. They are discouraged because:\n- They lack explicit size information\n- They are not portable across database backends\n- They are not recorded in field metadata for reconstruction\n\nIf you see a warning like `\"Native type 'int' used; consider 'int32' instead\"`, update your definition to use the corresponding core type.\n\n### 5. Document Your Tables\n- Add comments after `#` in definitions\n- Document units in attribute comments\n\n## Key Concepts Recap\n\n| Concept | Description |\n|---------|-------------|\n| **Primary Key** | Attributes above `---` that uniquely identify rows |\n| **Secondary Attributes** | Attributes below `---` that store additional data |\n| **Foreign Key** (`->`) | Reference to another table, imports its primary key |\n| **One-to-Many** | FK in primary key: parent has many children |\n| **One-to-One** | FK is entire primary key: exactly one child per parent |\n| **Master-Part** | Compositional integrity: master and parts inserted/deleted atomically |\n| **Nullable FK** | `[nullable]` makes the reference optional |\n| **Lookup Table** | Pre-populated reference data |\n\n## Next Steps\n\n- [Data Entry](../03-data-entry) — Inserting, updating, and deleting data\n- [Queries](../04-queries) — Filtering, joining, and projecting\n- [Computation](../05-computation) — Building computational pipelines"
},
{
"cell_type": "code",
Expand Down
2 changes: 1 addition & 1 deletion src/tutorials/basics/03-data-entry.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1588,7 +1588,7 @@
"cell_type": "markdown",
"id": "cell-42",
"metadata": {},
"source": "## Quick Reference\n\n| Operation | Method | Use Case |\n|-----------|--------|----------|\n| Insert one | `insert1(row)` | Adding single entity |\n| Insert many | `insert(rows)` | Bulk data loading |\n| Update one | `update1(row)` | Surgical corrections only |\n| Delete | `delete()` | Removing entities (cascades) |\n| Delete quick | `delete_quick()` | Internal cleanup (no cascade) |\n| Validate | `validate(rows)` | Pre-insert check |\n\nSee the [Data Manipulation Specification](../../reference/specs/data-manipulation/) for complete details.\n\n## Next Steps\n\n- [Queries](04-queries/) — Filtering, joining, and projecting data\n- [Computation](05-computation/) — Building computational pipelines"
"source": "## Quick Reference\n\n| Operation | Method | Use Case |\n|-----------|--------|----------|\n| Insert one | `insert1(row)` | Adding single entity |\n| Insert many | `insert(rows)` | Bulk data loading |\n| Update one | `update1(row)` | Surgical corrections only |\n| Delete | `delete()` | Removing entities (cascades) |\n| Delete quick | `delete_quick()` | Internal cleanup (no cascade) |\n| Validate | `validate(rows)` | Pre-insert check |\n\nSee the [Data Manipulation Specification](../../reference/specs/data-manipulation) for complete details.\n\n## Next Steps\n\n- [Queries](../04-queries) — Filtering, joining, and projecting data\n- [Computation](../05-computation) — Building computational pipelines"
},
{
"cell_type": "code",
Expand Down
12 changes: 3 additions & 9 deletions src/tutorials/basics/04-queries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -965,13 +965,7 @@
"cell_type": "markdown",
"id": "cell-11",
"metadata": {},
"source": [
"### Restriction by Query Expression\n",
"\n",
"Restrict by another query expression. DataJoint uses **semantic matching**: attributes with the same name are matched only if they share the same origin through foreign key lineage. This prevents accidental matches on unrelated attributes that happen to share names (like generic `id` columns in unrelated tables).\n",
"\n",
"See [Semantic Matching](../reference/specs/semantic-matching.md) for the full specification."
]
"source": "### Restriction by Query Expression\n\nRestrict by another query expression. DataJoint uses **semantic matching**: attributes with the same name are matched only if they share the same origin through foreign key lineage. This prevents accidental matches on unrelated attributes that happen to share names (like generic `id` columns in unrelated tables).\n\nSee [Semantic Matching](../../reference/specs/semantic-matching) for the full specification."
},
{
"cell_type": "code",
Expand Down Expand Up @@ -3991,7 +3985,7 @@
"cell_type": "markdown",
"id": "tt5h1lmim2",
"metadata": {},
"source": "### Primary Keys in Join Results\n\nEvery query result has a valid primary key. For joins, the result's primary key depends on **functional dependencies** between the operands:\n\n| Condition | Result Primary Key |\n|-----------|-------------------|\n| `A → B` (A determines B) | PK(A) |\n| `B → A` (B determines A) | PK(B) |\n| Both | PK(A) |\n| Neither | PK(A) ∪ PK(B) |\n\n**\"A determines B\"** means all of B's primary key attributes exist in A (as primary or secondary attributes).\n\nIn our example:\n- `Session` has PK: `(subject_id, session_idx)`\n- `Trial` has PK: `(subject_id, session_idx, trial_idx)`\n\nSince Session's PK is a subset of Trial's PK, `Session → Trial`. The join `Session * Trial` has the same primary key as Session.\n\nSee the [Query Algebra Specification](../../reference/specs/query-algebra/) for the complete functional dependency rules."
"source": "### Primary Keys in Join Results\n\nEvery query result has a valid primary key. For joins, the result's primary key depends on **functional dependencies** between the operands:\n\n| Condition | Result Primary Key |\n|-----------|-------------------|\n| `A → B` (A determines B) | PK(A) |\n| `B → A` (B determines A) | PK(B) |\n| Both | PK(A) |\n| Neither | PK(A) ∪ PK(B) |\n\n**\"A determines B\"** means all of B's primary key attributes exist in A (as primary or secondary attributes).\n\nIn our example:\n- `Session` has PK: `(subject_id, session_idx)`\n- `Trial` has PK: `(subject_id, session_idx, trial_idx)`\n\nSince Session's PK is a subset of Trial's PK, `Session → Trial`. The join `Session * Trial` has the same primary key as Session.\n\nSee the [Query Algebra Specification](../../reference/specs/query-algebra) for the complete functional dependency rules."
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -6379,7 +6373,7 @@
"cell_type": "markdown",
"id": "cell-63",
"metadata": {},
"source": "## Quick Reference\n\n### Operators\n\n| Operation | Syntax | Description |\n|-----------|--------|-------------|\n| Restrict | `A & cond` | Select matching rows |\n| Anti-restrict | `A - cond` | Select non-matching rows |\n| Top | `A & dj.Top(limit, order_by)` | Limit/order results |\n| Project | `A.proj(...)` | Select/compute columns |\n| Join | `A * B` | Combine tables |\n| Extend | `A.extend(B)` | Add B's attributes, keep all A rows |\n| Aggregate | `A.aggr(B, ...)` | Group and summarize |\n| Union | `A + B` | Combine entity sets |\n\n### Fetch Methods\n\n| Method | Returns | Use Case |\n|--------|---------|----------|\n| `to_dicts()` | `list[dict]` | JSON, iteration |\n| `to_pandas()` | `DataFrame` | Data analysis |\n| `to_arrays()` | `np.ndarray` | Numeric computation |\n| `to_arrays('a', 'b')` | `tuple[array, ...]` | Specific columns |\n| `keys()` | `list[dict]` | Primary keys |\n| `fetch1()` | `dict` | Single row |\n\nSee the [Query Algebra Specification](../../reference/specs/query-algebra/) and [Fetch API](../../reference/specs/fetch-api/) for complete details.\n\n## Next Steps\n\n- [Computation](05-computation/) — Building computational pipelines"
"source": "## Quick Reference\n\n### Operators\n\n| Operation | Syntax | Description |\n|-----------|--------|-------------|\n| Restrict | `A & cond` | Select matching rows |\n| Anti-restrict | `A - cond` | Select non-matching rows |\n| Top | `A & dj.Top(limit, order_by)` | Limit/order results |\n| Project | `A.proj(...)` | Select/compute columns |\n| Join | `A * B` | Combine tables |\n| Extend | `A.extend(B)` | Add B's attributes, keep all A rows |\n| Aggregate | `A.aggr(B, ...)` | Group and summarize |\n| Union | `A + B` | Combine entity sets |\n\n### Fetch Methods\n\n| Method | Returns | Use Case |\n|--------|---------|----------|\n| `to_dicts()` | `list[dict]` | JSON, iteration |\n| `to_pandas()` | `DataFrame` | Data analysis |\n| `to_arrays()` | `np.ndarray` | Numeric computation |\n| `to_arrays('a', 'b')` | `tuple[array, ...]` | Specific columns |\n| `keys()` | `list[dict]` | Primary keys |\n| `fetch1()` | `dict` | Single row |\n\nSee the [Query Algebra Specification](../../reference/specs/query-algebra) and [Fetch API](../../reference/specs/fetch-api) for complete details.\n\n## Next Steps\n\n- [Computation](../05-computation) — Building computational pipelines"
},
{
"cell_type": "code",
Expand Down
Loading