Skip to content
91 changes: 86 additions & 5 deletions src/explanation/whats-new-22.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# What's New in DataJoint 2.2

DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections, and **graph-driven diagram operations** that replace the legacy error-driven cascade with a reliable, inspectable approach for all users.

> **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.

Expand Down Expand Up @@ -201,9 +201,90 @@ class MyTable(dj.Manual):

Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.

## Graph-Driven Diagram Operations

DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting.

### From Visualization to Operations

In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently using an error-driven approach: attempt `DELETE` on the parent, catch the foreign key integrity error, parse the error message to discover which child table is blocking, then recursively delete from that child first. This had several problems:

- **MySQL 8 with limited privileges** returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name — the cascade crashes with no way to proceed.
- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt.
- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats.

In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing.

### The Preview-Then-Execute Pattern

The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`:

```python
# Build the dependency graph and inspect the cascade
diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# Inspect: what tables and how many rows would be affected?
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# Execute via Table.delete() after reviewing the blast radius
(Session & {'subject_id': 'M001'}).delete(prompt=False)
```

This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious.

### Two Propagation Modes

The diagram supports two restriction propagation modes designed for fundamentally different tasks.

**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here include a description of how cascade behaves when it encounters a part table whose master is not yet included in the cascade.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a paragraph after cascade() describing part table behavior: with "enforce" (default), delete() raises an error if part rows would be deleted without their master; with "cascade", the restriction propagates upward from part to master, then back downstream to all sibling parts — deleting the entire compositional unit.


When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows.

**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, describe how part tables are restricted when reached through a path that did not include its master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in the same paragraph — describes the upward propagation from part to master when the part is reached through a path that didn't include its master.


The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa.

### Pruning Empty Tables

After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:

```python
export = (dj.Diagram(schema)
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"')
.prune())

export.preview() # only tables with matching rows
export # visualize the export subgraph
```

Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.

### Architecture

`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.

### Advantages over Error-Driven Cascade

The graph-driven approach resolves every known limitation of the prior error-driven cascade:

| Scenario | Error-driven (prior) | Graph-driven (2.2) |
|---|---|---|
| MySQL 8 + limited privileges | Crashes (error 1217, no table name) | Works — no error parsing needed |
| PostgreSQL | Savepoint overhead per attempt | No errors triggered |
| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront |
| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
| Reusability | Delete-only | Delete, drop, export, prune |
| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing |

## See Also

- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
- [Configure Database](../how-to/configure-database.md/) — Connection setup
- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide
- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial
- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings
- [Configure Database](../how-to/configure-database.md) — Connection setup
- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
- [Delete Data](../how-to/delete-data.md) — Task-oriented delete guide
42 changes: 42 additions & 0 deletions src/how-to/delete-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,50 @@ count = (Subject & restriction).delete(prompt=False)
print(f"Deleted {count} subjects")
```

## Inspecting Cascade Before Deleting

!!! version-added "New in 2.2"
Cascade inspection via `dj.Diagram` was added in DataJoint 2.2.

For a quick preview, `table.delete(dry_run=True)` returns the affected row counts without deleting anything:

```python
# Quick preview of what would be deleted
(Session & {'subject_id': 'M001'}).delete(dry_run=True)
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
```

For more complex scenarios — working across schemas, chaining multiple restrictions, or visualizing the dependency graph — use `dj.Diagram` to build and inspect the cascade explicitly:

```python
import datajoint as dj

# 1. Build the dependency graph and apply cascade restriction
diag = dj.Diagram(schema)
restricted = diag.cascade(Session & {'subject_id': 'M001'})

# 2. Preview: see affected tables and row counts
counts = restricted.preview()
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}

# 3. Visualize the cascade subgraph (in Jupyter)
restricted

# 4. Execute via Table.delete() after reviewing
(Session & {'subject_id': 'M001'}).delete(prompt=False)
```

### When to Use

- **Preview blast radius**: Understand what a cascade delete will affect before committing
- **Multi-schema inspection**: Build a diagram spanning multiple schemas to visualize cascade impact
- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows

For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram API is for when you need more visibility before executing.

## See Also

- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
- [Master-Part Tables](master-part.ipynb) — Compositional data patterns
- [Model Relationships](model-relationships.ipynb) — Foreign key patterns
- [Insert Data](insert-data.md) — Adding data to tables
Expand Down
84 changes: 7 additions & 77 deletions src/how-to/read-diagrams.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1325,39 +1325,13 @@
"cell_type": "markdown",
"id": "cell-ops-ref",
"metadata": {},
"source": [
"**Operation Reference:**\n",
"\n",
"| Operation | Meaning |\n",
"|-----------|--------|\n",
"| `dj.Diagram(schema)` | Entire schema |\n",
"| `dj.Diagram(Table) - N` | Table + N levels upstream |\n",
"| `dj.Diagram(Table) + N` | Table + N levels downstream |\n",
"| `D1 + D2` | Union of two diagrams |\n",
"| `D1 * D2` | Intersection (common nodes) |\n",
"\n",
"**Finding paths:** Use intersection to find connection paths:\n",
"```python\n",
"(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n",
"```"
]
"source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(New in 2.2)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```"
},
{
"cell_type": "markdown",
"id": "2lmw6tar3w8",
"metadata": {},
"source": [
"## Layout Direction\n",
"\n",
"*New in DataJoint 2.1*\n",
"\n",
"Control the flow direction of diagrams via configuration:\n",
"\n",
"| Direction | Description |\n",
"|-----------|-------------|\n",
"| `\"TB\"` | Top to bottom (default) |\n",
"| `\"LR\"` | Left to right |"
]
"source": "## Layout Direction\n\n!!! version-added \"New in 2.1\"\n Configurable layout direction was added in DataJoint 2.1.\n\nControl the flow direction of diagrams via configuration:\n\n| Direction | Description |\n|-----------|-------------|\n| `\"TB\"` | Top to bottom (default) |\n| `\"LR\"` | Left to right |"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -1634,13 +1608,7 @@
"cell_type": "markdown",
"id": "ogpr8cqsife",
"metadata": {},
"source": [
"## Mermaid Output\n",
"\n",
"*New in DataJoint 2.1*\n",
"\n",
"Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
]
"source": "## Mermaid Output\n\n!!! version-added \"New in 2.1\"\n Mermaid output was added in DataJoint 2.1.\n\nGenerate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -1700,13 +1668,7 @@
"cell_type": "markdown",
"id": "pqet0vo8pwp",
"metadata": {},
"source": [
"## Multi-Schema Pipelines\n",
"\n",
"Real-world pipelines often span multiple schemas (modules). \n",
"\n",
"*New in DataJoint 2.1:* Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
]
"source": "## Multi-Schema Pipelines\n\nReal-world pipelines often span multiple schemas (modules).\n\n!!! version-added \"New in 2.1\"\n Automatic schema grouping was added in DataJoint 2.1. Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
},
{
"cell_type": "code",
Expand Down Expand Up @@ -2104,13 +2066,7 @@
"cell_type": "markdown",
"id": "ncl6hafwbjt",
"metadata": {},
"source": [
"## Collapsing Schemas\n",
"\n",
"*New in DataJoint 2.1*\n",
"\n",
"For high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
]
"source": "## Collapsing Schemas\n\n!!! version-added \"New in 2.1\"\n The `collapse()` method was added in DataJoint 2.1.\n\nFor high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
},
{
"cell_type": "code",
Expand Down Expand Up @@ -3322,33 +3278,7 @@
"cell_type": "markdown",
"id": "cell-summary-md",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"| Visual | Meaning |\n",
"|--------|--------|\n",
"| **Thick solid** | One-to-one extension |\n",
"| **Thin solid** | One-to-many containment |\n",
"| **Dashed** | Reference (independent identity) |\n",
"| **Underlined** | Introduces new dimension |\n",
"| **Orange dots** | Renamed FK via `.proj()` |\n",
"| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n",
"| **Grouped boxes** | Tables grouped by schema/module |\n",
"| **3D box (gray)** | Collapsed schema *(2.1+)* |\n",
"\n",
"| Feature | Method |\n",
"|---------|--------|\n",
"| Layout direction | `dj.config.display.diagram_direction` |\n",
"| Mermaid output | `.make_mermaid()` |\n",
"| Collapse schema | `.collapse()` *(2.1+)* |\n",
"\n",
"## Related\n",
"\n",
"- [Diagram Specification](../reference/specs/diagram.md)\n",
"- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n",
"- [Semantic Matching](../reference/specs/semantic-matching.md)\n",
"- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
]
"source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(New in 2.1)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(New in 2.1)* |\n| Prune empty tables | `.prune()` *(New in 2.2)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -3397,4 +3327,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
3 changes: 3 additions & 0 deletions src/reference/specs/data-manipulation.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables:
2. Recursively delete matching rows in child tables
3. Delete rows in target table

!!! version-added "New in 2.2"
`Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods).

### 4.3 Basic Usage

```python
Expand Down
Loading
Loading