Merge pull request #155 from datajoint/docs/v2.2-restricted-diagram

dimitri-yatsenko · web-flow · commit 32c4a40a4570 · 2026-03-13T10:25:42.000-05:00
docs: document restricted diagram operations (new in 2.2)
diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md
@@ -1,6 +1,6 @@
 # What's New in DataJoint 2.2
 
-DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
+DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections, and **graph-driven diagram operations** that replace the legacy error-driven cascade with a reliable, inspectable approach for all users.
 
 > **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive.
 
@@ -201,9 +201,90 @@ class MyTable(dj.Manual):
 
 Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema.
 
+## Graph-Driven Diagram Operations
+
+DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting.
+
+### From Visualization to Operations
+
+In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently using an error-driven approach: attempt `DELETE` on the parent, catch the foreign key integrity error, parse the error message to discover which child table is blocking, then recursively delete from that child first. This had several problems:
+
+- **MySQL 8 with limited privileges** returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name — the cascade crashes with no way to proceed.
+- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt.
+- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats.
+
+In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing.
+
+### The Preview-Then-Execute Pattern
+
+The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`:
+
+```python
+# Build the dependency graph and inspect the cascade
+diag = dj.Diagram(schema)
+restricted = diag.cascade(Session & {'subject_id': 'M001'})
+
+# Inspect: what tables and how many rows would be affected?
+counts = restricted.preview()
+# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
+
+# Execute via Table.delete() after reviewing the blast radius
+(Session & {'subject_id': 'M001'}).delete(prompt=False)
+```
+
+This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious.
+
+### Two Propagation Modes
+
+The diagram supports two restriction propagation modes designed for fundamentally different tasks.
+
+**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
+
+When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows.
+
+**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
+
+The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa.
+
+### Pruning Empty Tables
+
+After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data:
+
+```python
+export = (dj.Diagram(schema)
+    .restrict(Subject & {'species': 'mouse'})
+    .restrict(Session & 'session_date > "2024-01-01"')
+    .prune())
+
+export.preview()   # only tables with matching rows
+export             # visualize the export subgraph
+```
+
+Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated.
+
+### Architecture
+
+`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`.
+
+### Advantages over Error-Driven Cascade
+
+The graph-driven approach resolves every known limitation of the prior error-driven cascade:
+
+| Scenario | Error-driven (prior) | Graph-driven (2.2) |
+|---|---|---|
+| MySQL 8 + limited privileges | Crashes (error 1217, no table name) | Works — no error parsing needed |
+| PostgreSQL | Savepoint overhead per attempt | No errors triggered |
+| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront |
+| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
+| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
+| Reusability | Delete-only | Delete, drop, export, prune |
+| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing |
+
 ## See Also
 
-- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide
-- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial
-- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings
-- [Configure Database](../how-to/configure-database.md/) — Connection setup
+- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide
+- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial
+- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings
+- [Configure Database](../how-to/configure-database.md) — Connection setup
+- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
+- [Delete Data](../how-to/delete-data.md) — Task-oriented delete guide
diff --git a/src/how-to/delete-data.md b/src/how-to/delete-data.md
@@ -189,8 +189,50 @@ count = (Subject & restriction).delete(prompt=False)
 print(f"Deleted {count} subjects")
 ```
 
+## Inspecting Cascade Before Deleting
+
+!!! version-added "New in 2.2"
+    Cascade inspection via `dj.Diagram` was added in DataJoint 2.2.
+
+For a quick preview, `table.delete(dry_run=True)` returns the affected row counts without deleting anything:
+
+```python
+# Quick preview of what would be deleted
+(Session & {'subject_id': 'M001'}).delete(dry_run=True)
+# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
+```
+
+For more complex scenarios — working across schemas, chaining multiple restrictions, or visualizing the dependency graph — use `dj.Diagram` to build and inspect the cascade explicitly:
+
+```python
+import datajoint as dj
+
+# 1. Build the dependency graph and apply cascade restriction
+diag = dj.Diagram(schema)
+restricted = diag.cascade(Session & {'subject_id': 'M001'})
+
+# 2. Preview: see affected tables and row counts
+counts = restricted.preview()
+# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45}
+
+# 3. Visualize the cascade subgraph (in Jupyter)
+restricted
+
+# 4. Execute via Table.delete() after reviewing
+(Session & {'subject_id': 'M001'}).delete(prompt=False)
+```
+
+### When to Use
+
+- **Preview blast radius**: Understand what a cascade delete will affect before committing
+- **Multi-schema inspection**: Build a diagram spanning multiple schemas to visualize cascade impact
+- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows
+
+For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram API is for when you need more visibility before executing.
+
 ## See Also
 
+- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations
 - [Master-Part Tables](master-part.ipynb) — Compositional data patterns
 - [Model Relationships](model-relationships.ipynb) — Foreign key patterns
 - [Insert Data](insert-data.md) — Adding data to tables
diff --git a/src/how-to/read-diagrams.ipynb b/src/how-to/read-diagrams.ipynb
@@ -1325,39 +1325,13 @@
    "cell_type": "markdown",
    "id": "cell-ops-ref",
    "metadata": {},
-   "source": [
-    "**Operation Reference:**\n",
-    "\n",
-    "| Operation | Meaning |\n",
-    "|-----------|--------|\n",
-    "| `dj.Diagram(schema)` | Entire schema |\n",
-    "| `dj.Diagram(Table) - N` | Table + N levels upstream |\n",
-    "| `dj.Diagram(Table) + N` | Table + N levels downstream |\n",
-    "| `D1 + D2` | Union of two diagrams |\n",
-    "| `D1 * D2` | Intersection (common nodes) |\n",
-    "\n",
-    "**Finding paths:** Use intersection to find connection paths:\n",
-    "```python\n",
-    "(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n",
-    "```"
-   ]
+   "source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(New in 2.2)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```"
   },
   {
    "cell_type": "markdown",
    "id": "2lmw6tar3w8",
    "metadata": {},
-   "source": [
-    "## Layout Direction\n",
-    "\n",
-    "*New in DataJoint 2.1*\n",
-    "\n",
-    "Control the flow direction of diagrams via configuration:\n",
-    "\n",
-    "| Direction | Description |\n",
-    "|-----------|-------------|\n",
-    "| `\"TB\"` | Top to bottom (default) |\n",
-    "| `\"LR\"` | Left to right |"
-   ]
+   "source": "## Layout Direction\n\n!!! version-added \"New in 2.1\"\n    Configurable layout direction was added in DataJoint 2.1.\n\nControl the flow direction of diagrams via configuration:\n\n| Direction | Description |\n|-----------|-------------|\n| `\"TB\"` | Top to bottom (default) |\n| `\"LR\"` | Left to right |"
   },
   {
    "cell_type": "code",
@@ -1634,13 +1608,7 @@
    "cell_type": "markdown",
    "id": "ogpr8cqsife",
    "metadata": {},
-   "source": [
-    "## Mermaid Output\n",
-    "\n",
-    "*New in DataJoint 2.1*\n",
-    "\n",
-    "Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
-   ]
+   "source": "## Mermaid Output\n\n!!! version-added \"New in 2.1\"\n    Mermaid output was added in DataJoint 2.1.\n\nGenerate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:"
   },
   {
    "cell_type": "code",
@@ -1700,13 +1668,7 @@
    "cell_type": "markdown",
    "id": "pqet0vo8pwp",
    "metadata": {},
-   "source": [
-    "## Multi-Schema Pipelines\n",
-    "\n",
-    "Real-world pipelines often span multiple schemas (modules). \n",
-    "\n",
-    "*New in DataJoint 2.1:* Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
-   ]
+   "source": "## Multi-Schema Pipelines\n\nReal-world pipelines often span multiple schemas (modules).\n\n!!! version-added \"New in 2.1\"\n    Automatic schema grouping was added in DataJoint 2.1. Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label."
   },
   {
    "cell_type": "code",
@@ -2104,13 +2066,7 @@
    "cell_type": "markdown",
    "id": "ncl6hafwbjt",
    "metadata": {},
-   "source": [
-    "## Collapsing Schemas\n",
-    "\n",
-    "*New in DataJoint 2.1*\n",
-    "\n",
-    "For high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
-   ]
+   "source": "## Collapsing Schemas\n\n!!! version-added \"New in 2.1\"\n    The `collapse()` method was added in DataJoint 2.1.\n\nFor high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables."
   },
   {
    "cell_type": "code",
@@ -3322,33 +3278,7 @@
    "cell_type": "markdown",
    "id": "cell-summary-md",
    "metadata": {},
-   "source": [
-    "## Summary\n",
-    "\n",
-    "| Visual | Meaning |\n",
-    "|--------|--------|\n",
-    "| **Thick solid** | One-to-one extension |\n",
-    "| **Thin solid** | One-to-many containment |\n",
-    "| **Dashed** | Reference (independent identity) |\n",
-    "| **Underlined** | Introduces new dimension |\n",
-    "| **Orange dots** | Renamed FK via `.proj()` |\n",
-    "| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n",
-    "| **Grouped boxes** | Tables grouped by schema/module |\n",
-    "| **3D box (gray)** | Collapsed schema *(2.1+)* |\n",
-    "\n",
-    "| Feature | Method |\n",
-    "|---------|--------|\n",
-    "| Layout direction | `dj.config.display.diagram_direction` |\n",
-    "| Mermaid output | `.make_mermaid()` |\n",
-    "| Collapse schema | `.collapse()` *(2.1+)* |\n",
-    "\n",
-    "## Related\n",
-    "\n",
-    "- [Diagram Specification](../reference/specs/diagram.md)\n",
-    "- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n",
-    "- [Semantic Matching](../reference/specs/semantic-matching.md)\n",
-    "- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
-   ]
+   "source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(New in 2.1)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(New in 2.1)* |\n| Prune empty tables | `.prune()` *(New in 2.2)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)"
   },
   {
    "cell_type": "code",
@@ -3397,4 +3327,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}
diff --git a/src/reference/specs/data-manipulation.md b/src/reference/specs/data-manipulation.md
@@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables:
 2. Recursively delete matching rows in child tables
 3. Delete rows in target table
 
+!!! version-added "New in 2.2"
+    `Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods).
+
 ### 4.3 Basic Usage
 
 ```python
diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md
diff --git a/src/reference/specs/master-part.md b/src/reference/specs/master-part.md