Skip to content

Commit 32c4a40

Browse files
Merge pull request #155 from datajoint/docs/v2.2-restricted-diagram
docs: document restricted diagram operations (new in 2.2)
2 parents 7e28d67 + 187408b commit 32c4a40

6 files changed

Lines changed: 290 additions & 89 deletions

File tree

src/explanation/whats-new-22.md

Lines changed: 86 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# What's New in DataJoint 2.2
22

3-
DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
3+
DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections, and **graph-driven diagram operations** that replace the legacy error-driven cascade with a reliable, inspectable approach for all users.
44

55
> **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.
66
@@ -201,9 +201,90 @@ class MyTable(dj.Manual):
201201

202202
Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.
203203

204+
## Graph-Driven Diagram Operations
205+
206+
DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting.
207+
208+
### From Visualization to Operations
209+
210+
In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently using an error-driven approach: attempt `DELETE` on the parent, catch the foreign key integrity error, parse the error message to discover which child table is blocking, then recursively delete from that child first. This had several problems:
211+
212+
- **MySQL 8 with limited privileges** returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name — the cascade crashes with no way to proceed.
213+
- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt.
214+
- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats.
215+
216+
In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing.
217+
218+
### The Preview-Then-Execute Pattern
219+
220+
The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`:
221+
222+
```python
223+
# Build the dependency graph and inspect the cascade
224+
diag = dj.Diagram(schema)
225+
restricted = diag.cascade(Session & {'subject_id': 'M001'})
226+
227+
# Inspect: what tables and how many rows would be affected?
228+
counts = restricted.preview()
229+
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
230+
231+
# Execute via Table.delete() after reviewing the blast radius
232+
(Session & {'subject_id': 'M001'}).delete(prompt=False)
233+
```
234+
235+
This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious.
236+
237+
### Two Propagation Modes
238+
239+
The diagram supports two restriction propagation modes designed for fundamentally different tasks.
240+
241+
**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
242+
243+
When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows.
244+
245+
**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
246+
247+
The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa.
248+
249+
### Pruning Empty Tables
250+
251+
After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:
252+
253+
```python
254+
export = (dj.Diagram(schema)
255+
.restrict(Subject & {'species': 'mouse'})
256+
.restrict(Session & 'session_date > "2024-01-01"')
257+
.prune())
258+
259+
export.preview() # only tables with matching rows
260+
export # visualize the export subgraph
261+
```
262+
263+
Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.
264+
265+
### Architecture
266+
267+
`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
268+
269+
### Advantages over Error-Driven Cascade
270+
271+
The graph-driven approach resolves every known limitation of the prior error-driven cascade:
272+
273+
| Scenario | Error-driven (prior) | Graph-driven (2.2) |
274+
|---|---|---|
275+
| MySQL 8 + limited privileges | Crashes (error 1217, no table name) | Works — no error parsing needed |
276+
| PostgreSQL | Savepoint overhead per attempt | No errors triggered |
277+
| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront |
278+
| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
279+
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
280+
| Reusability | Delete-only | Delete, drop, export, prune |
281+
| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing |
282+
204283
## See Also
205284

206-
- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
207-
- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial
208-
- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
209-
- [Configure Database](../how-to/configure-database.md/) — Connection setup
285+
- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide
286+
- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial
287+
- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings
288+
- [Configure Database](../how-to/configure-database.md) — Connection setup
289+
- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
290+
- [Delete Data](../how-to/delete-data.md) — Task-oriented delete guide

src/how-to/delete-data.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -189,8 +189,50 @@ count = (Subject & restriction).delete(prompt=False)
189189
print(f"Deleted {count} subjects")
190190
```
191191

192+
## Inspecting Cascade Before Deleting
193+
194+
!!! version-added "New in 2.2"
195+
Cascade inspection via `dj.Diagram` was added in DataJoint 2.2.
196+
197+
For a quick preview, `table.delete(dry_run=True)` returns the affected row counts without deleting anything:
198+
199+
```python
200+
# Quick preview of what would be deleted
201+
(Session & {'subject_id': 'M001'}).delete(dry_run=True)
202+
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
203+
```
204+
205+
For more complex scenarios — working across schemas, chaining multiple restrictions, or visualizing the dependency graph — use `dj.Diagram` to build and inspect the cascade explicitly:
206+
207+
```python
208+
import datajoint as dj
209+
210+
# 1. Build the dependency graph and apply cascade restriction
211+
diag = dj.Diagram(schema)
212+
restricted = diag.cascade(Session & {'subject_id': 'M001'})
213+
214+
# 2. Preview: see affected tables and row counts
215+
counts = restricted.preview()
216+
# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
217+
218+
# 3. Visualize the cascade subgraph (in Jupyter)
219+
restricted
220+
221+
# 4. Execute via Table.delete() after reviewing
222+
(Session & {'subject_id': 'M001'}).delete(prompt=False)
223+
```
224+
225+
### When to Use
226+
227+
- **Preview blast radius**: Understand what a cascade delete will affect before committing
228+
- **Multi-schema inspection**: Build a diagram spanning multiple schemas to visualize cascade impact
229+
- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows
230+
231+
For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram API is for when you need more visibility before executing.
232+
192233
## See Also
193234

235+
- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
194236
- [Master-Part Tables](master-part.ipynb) — Compositional data patterns
195237
- [Model Relationships](model-relationships.ipynb) — Foreign key patterns
196238
- [Insert Data](insert-data.md) — Adding data to tables

src/how-to/read-diagrams.ipynb

Lines changed: 7 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1325,39 +1325,13 @@
13251325
"cell_type": "markdown",
13261326
"id": "cell-ops-ref",
13271327
"metadata": {},
1328-
"source": [
1329-
"**Operation Reference:**\n",
1330-
"\n",
1331-
"| Operation | Meaning |\n",
1332-
"|-----------|--------|\n",
1333-
"| `dj.Diagram(schema)` | Entire schema |\n",
1334-
"| `dj.Diagram(Table) - N` | Table + N levels upstream |\n",
1335-
"| `dj.Diagram(Table) + N` | Table + N levels downstream |\n",
1336-
"| `D1 + D2` | Union of two diagrams |\n",
1337-
"| `D1 * D2` | Intersection (common nodes) |\n",
1338-
"\n",
1339-
"**Finding paths:** Use intersection to find connection paths:\n",
1340-
"```python\n",
1341-
"(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n",
1342-
"```"
1343-
]
1328+
"source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(New in 2.2)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```"
13441329
},
13451330
{
13461331
"cell_type": "markdown",
13471332
"id": "2lmw6tar3w8",
13481333
"metadata": {},
1349-
"source": [
1350-
"## Layout Direction\n",
1351-
"\n",
1352-
"*New in DataJoint 2.1*\n",
1353-
"\n",
1354-
"Control the flow direction of diagrams via configuration:\n",
1355-
"\n",
1356-
"| Direction | Description |\n",
1357-
"|-----------|-------------|\n",
1358-
"| `\"TB\"` | Top to bottom (default) |\n",
1359-
"| `\"LR\"` | Left to right |"
1360-
]
1334+
"source": "## Layout Direction\n\n!!! version-added \"New in 2.1\"\n Configurable layout direction was added in DataJoint 2.1.\n\nControl the flow direction of diagrams via configuration:\n\n| Direction | Description |\n|-----------|-------------|\n| `\"TB\"` | Top to bottom (default) |\n| `\"LR\"` | Left to right |"
13611335
},
13621336
{
13631337
"cell_type": "code",
@@ -1634,13 +1608,7 @@
16341608
"cell_type": "markdown",
16351609
"id": "ogpr8cqsife",
16361610
"metadata": {},
1637-
"source": [
1638-
"## Mermaid Output\n",
1639-
"\n",
1640-
"*New in DataJoint 2.1*\n",
1641-
"\n",
1642-
"Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
1643-
]
1611+
"source": "## Mermaid Output\n\n!!! version-added \"New in 2.1\"\n Mermaid output was added in DataJoint 2.1.\n\nGenerate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
16441612
},
16451613
{
16461614
"cell_type": "code",
@@ -1700,13 +1668,7 @@
17001668
"cell_type": "markdown",
17011669
"id": "pqet0vo8pwp",
17021670
"metadata": {},
1703-
"source": [
1704-
"## Multi-Schema Pipelines\n",
1705-
"\n",
1706-
"Real-world pipelines often span multiple schemas (modules). \n",
1707-
"\n",
1708-
"*New in DataJoint 2.1:* Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
1709-
]
1671+
"source": "## Multi-Schema Pipelines\n\nReal-world pipelines often span multiple schemas (modules).\n\n!!! version-added \"New in 2.1\"\n Automatic schema grouping was added in DataJoint 2.1. Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
17101672
},
17111673
{
17121674
"cell_type": "code",
@@ -2104,13 +2066,7 @@
21042066
"cell_type": "markdown",
21052067
"id": "ncl6hafwbjt",
21062068
"metadata": {},
2107-
"source": [
2108-
"## Collapsing Schemas\n",
2109-
"\n",
2110-
"*New in DataJoint 2.1*\n",
2111-
"\n",
2112-
"For high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
2113-
]
2069+
"source": "## Collapsing Schemas\n\n!!! version-added \"New in 2.1\"\n The `collapse()` method was added in DataJoint 2.1.\n\nFor high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
21142070
},
21152071
{
21162072
"cell_type": "code",
@@ -3322,33 +3278,7 @@
33223278
"cell_type": "markdown",
33233279
"id": "cell-summary-md",
33243280
"metadata": {},
3325-
"source": [
3326-
"## Summary\n",
3327-
"\n",
3328-
"| Visual | Meaning |\n",
3329-
"|--------|--------|\n",
3330-
"| **Thick solid** | One-to-one extension |\n",
3331-
"| **Thin solid** | One-to-many containment |\n",
3332-
"| **Dashed** | Reference (independent identity) |\n",
3333-
"| **Underlined** | Introduces new dimension |\n",
3334-
"| **Orange dots** | Renamed FK via `.proj()` |\n",
3335-
"| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n",
3336-
"| **Grouped boxes** | Tables grouped by schema/module |\n",
3337-
"| **3D box (gray)** | Collapsed schema *(2.1+)* |\n",
3338-
"\n",
3339-
"| Feature | Method |\n",
3340-
"|---------|--------|\n",
3341-
"| Layout direction | `dj.config.display.diagram_direction` |\n",
3342-
"| Mermaid output | `.make_mermaid()` |\n",
3343-
"| Collapse schema | `.collapse()` *(2.1+)* |\n",
3344-
"\n",
3345-
"## Related\n",
3346-
"\n",
3347-
"- [Diagram Specification](../reference/specs/diagram.md)\n",
3348-
"- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n",
3349-
"- [Semantic Matching](../reference/specs/semantic-matching.md)\n",
3350-
"- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
3351-
]
3281+
"source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(New in 2.1)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(New in 2.1)* |\n| Prune empty tables | `.prune()` *(New in 2.2)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
33523282
},
33533283
{
33543284
"cell_type": "code",
@@ -3397,4 +3327,4 @@
33973327
},
33983328
"nbformat": 4,
33993329
"nbformat_minor": 5
3400-
}
3330+
}

src/reference/specs/data-manipulation.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables:
332332
2. Recursively delete matching rows in child tables
333333
3. Delete rows in target table
334334

335+
!!! version-added "New in 2.2"
336+
`Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods).
337+
335338
### 4.3 Basic Usage
336339

337340
```python

0 commit comments

Comments
 (0)