From 443090cefda0b935234d27601d2c633780593503 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Fri, 6 Mar 2026 15:28:29 -0600 Subject: [PATCH 1/8] docs: document restricted diagram operations (new in 2.2) - Add Operational Methods section to diagram.md spec: cascade(), restrict(), delete(), drop(), preview(), prune(), restriction propagation rules, OR-vs-AND convergence - Add Graph-Driven Diagram Operations section to whats-new-22.md: motivation, preview-then-execute pattern, two propagation modes, pruning empty tables - Add Diagram-Level Delete section to delete-data.md: build-preview-execute workflow, when to use - Add prune() to read-diagrams how-to - Add version admonition in data-manipulation.md noting graph-driven cascade internals - Cross-references between all files Co-Authored-By: Claude Opus 4.6 --- src/explanation/whats-new-22.md | 65 +++++++- src/how-to/delete-data.md | 35 +++++ src/how-to/read-diagrams.ipynb | 47 +----- src/reference/specs/data-manipulation.md | 3 + src/reference/specs/diagram.md | 179 ++++++++++++++++++++++- 5 files changed, 282 insertions(+), 47 deletions(-) diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md index 0f437c08..5eef376f 100644 --- a/src/explanation/whats-new-22.md +++ b/src/explanation/whats-new-22.md @@ -1,6 +1,6 @@ # What's New in DataJoint 2.2 -DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing. +DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph. > **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive. @@ -201,9 +201,72 @@ class MyTable(dj.Manual): Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema. +## Graph-Driven Diagram Operations + +DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting. + +### From Visualization to Operations + +In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently, with no way to inspect or control the cascade before it executed. + +In 2.2, `Table.delete()` and `Table.drop()` delegate internally to `dj.Diagram`. The user-facing behavior of `Table.delete()` is unchanged, but the diagram-level API is now available as a more powerful interface for complex scenarios. + +### The Preview-Then-Execute Pattern + +The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then decide whether to execute: + +```python +# Build the dependency graph +diag = dj.Diagram(schema) + +# Apply cascade restriction — nothing is deleted yet +restricted = diag.cascade(Session & {'subject_id': 'M001'}) + +# Inspect: what tables and how many rows would be affected? +counts = restricted.preview() +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} + +# Execute only after reviewing the blast radius +restricted.delete(prompt=False) +``` + +This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious. + +### Two Propagation Modes + +The diagram supports two restriction propagation modes with different convergence semantics: + +**`cascade()` uses OR at convergence.** When a child table has multiple restricted ancestors, the child row is affected if *any* parent path reaches it. This is the right semantics for delete — if any reason exists to remove a row, it should be removed. `cascade()` is one-shot: it can only be called once on an unrestricted diagram. + +**`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables. + +The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics. + +### Pruning Empty Tables + +After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data: + +```python +export = (dj.Diagram(schema) + .restrict(Subject & {'species': 'mouse'}) + .restrict(Session & 'session_date > "2024-01-01"') + .prune()) + +export.preview() # only tables with matching rows +export # visualize the export subgraph +``` + +Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated. + +### Architecture + +`Table.delete()` now constructs a `Diagram` internally, calls `cascade()`, and then `delete()`. This means every table-level delete benefits from the same graph-driven logic. The diagram-level API simply exposes this machinery for direct use when more control is needed. + ## See Also - [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide - [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial - [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings - [Configure Database](../how-to/configure-database.md/) — Connection setup +- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations +- [Delete Data](../how-to/delete-data.md/) — Task-oriented delete guide diff --git a/src/how-to/delete-data.md b/src/how-to/delete-data.md index 545788bb..72724075 100644 --- a/src/how-to/delete-data.md +++ b/src/how-to/delete-data.md @@ -189,8 +189,43 @@ count = (Subject & restriction).delete(prompt=False) print(f"Deleted {count} subjects") ``` +## Diagram-Level Delete + +!!! version-added "New in 2.2" + Diagram-level delete was added in DataJoint 2.2. + +For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing. + +### Build, Preview, Execute + +```python +import datajoint as dj + +# 1. Build the dependency graph +diag = dj.Diagram(schema) + +# 2. Apply cascade restriction (nothing deleted yet) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) + +# 3. Preview: see affected tables and row counts +counts = restricted.preview() +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} + +# 4. Execute only after reviewing +restricted.delete(prompt=False) +``` + +### When to Use + +- **Preview blast radius**: Understand what a cascade delete will affect before committing +- **Multi-schema cascades**: Build a diagram spanning multiple schemas and delete across them in one operation +- **Programmatic control**: Use `preview()` return values to make decisions in automated workflows + +For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram-level API is for when you need more visibility or control. + ## See Also +- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations - [Master-Part Tables](master-part.ipynb) — Compositional data patterns - [Model Relationships](model-relationships.ipynb) — Foreign key patterns - [Insert Data](insert-data.md) — Adding data to tables diff --git a/src/how-to/read-diagrams.ipynb b/src/how-to/read-diagrams.ipynb index b1d4308f..61dd652b 100644 --- a/src/how-to/read-diagrams.ipynb +++ b/src/how-to/read-diagrams.ipynb @@ -1325,22 +1325,7 @@ "cell_type": "markdown", "id": "cell-ops-ref", "metadata": {}, - "source": [ - "**Operation Reference:**\n", - "\n", - "| Operation | Meaning |\n", - "|-----------|--------|\n", - "| `dj.Diagram(schema)` | Entire schema |\n", - "| `dj.Diagram(Table) - N` | Table + N levels upstream |\n", - "| `dj.Diagram(Table) + N` | Table + N levels downstream |\n", - "| `D1 + D2` | Union of two diagrams |\n", - "| `D1 * D2` | Intersection (common nodes) |\n", - "\n", - "**Finding paths:** Use intersection to find connection paths:\n", - "```python\n", - "(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n", - "```" - ] + "source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(2.2+)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```" }, { "cell_type": "markdown", @@ -3322,33 +3307,7 @@ "cell_type": "markdown", "id": "cell-summary-md", "metadata": {}, - "source": [ - "## Summary\n", - "\n", - "| Visual | Meaning |\n", - "|--------|--------|\n", - "| **Thick solid** | One-to-one extension |\n", - "| **Thin solid** | One-to-many containment |\n", - "| **Dashed** | Reference (independent identity) |\n", - "| **Underlined** | Introduces new dimension |\n", - "| **Orange dots** | Renamed FK via `.proj()` |\n", - "| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n", - "| **Grouped boxes** | Tables grouped by schema/module |\n", - "| **3D box (gray)** | Collapsed schema *(2.1+)* |\n", - "\n", - "| Feature | Method |\n", - "|---------|--------|\n", - "| Layout direction | `dj.config.display.diagram_direction` |\n", - "| Mermaid output | `.make_mermaid()` |\n", - "| Collapse schema | `.collapse()` *(2.1+)* |\n", - "\n", - "## Related\n", - "\n", - "- [Diagram Specification](../reference/specs/diagram.md)\n", - "- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n", - "- [Semantic Matching](../reference/specs/semantic-matching.md)\n", - "- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)" - ] + "source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(2.1+)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(2.1+)* |\n| Prune empty tables | `.prune()` *(2.2+)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)" }, { "cell_type": "code", @@ -3397,4 +3356,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/src/reference/specs/data-manipulation.md b/src/reference/specs/data-manipulation.md index e2841efa..160aee16 100644 --- a/src/reference/specs/data-manipulation.md +++ b/src/reference/specs/data-manipulation.md @@ -332,6 +332,9 @@ Delete automatically cascades to all dependent tables: 2. Recursively delete matching rows in child tables 3. Delete rows in target table +!!! version-added "New in 2.2" + `Table.delete()` now uses graph-driven cascade internally via `dj.Diagram`. User-facing behavior is unchanged — the same parameters and return values apply. For direct control over the cascade (preview, multi-schema operations), use the [Diagram operational methods](diagram.md#operational-methods). + ### 4.3 Basic Usage ```python diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index 58aba574..1745b469 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -117,6 +117,176 @@ dj.Diagram(Subject) + dj.Diagram(analysis).collapse() --- +## Operational Methods + +!!! version-added "New in 2.2" + Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`, `prune`) were added in DataJoint 2.2. + +Diagrams can propagate restrictions through the dependency graph and execute data operations (delete, drop) using the graph structure. These methods turn Diagram from a visualization tool into an operational component. + +### `cascade()` + +```python +diag.cascade(table_expr, part_integrity="enforce") +``` + +Apply a cascade restriction and propagate it downstream through the dependency graph. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `table_expr` | QueryExpression | — | A restricted table expression (e.g., `Session & 'subject_id=1'`) | +| `part_integrity` | str | `"enforce"` | Master-part integrity policy | + +**Returns:** New `Diagram` with cascade restrictions applied. + +**Constraints:** + +- `cascade()` can only be called **once** on an unrestricted Diagram +- Cannot be mixed with `restrict()` — the two modes are mutually exclusive +- `table_expr.full_table_name` must be a node in the diagram + +**`part_integrity` values:** + +| Value | Behavior | +|-------|----------| +| `"enforce"` | Error if parts would be deleted before masters | +| `"ignore"` | Allow deleting parts without masters | +| `"cascade"` | Also delete masters when parts are deleted | + +```python +# Build a cascade from a restricted table +diag = dj.Diagram(schema) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) +``` + +### `restrict()` + +```python +diag.restrict(table_expr) +``` + +Apply a restrict condition and propagate it downstream. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `table_expr` | QueryExpression | — | A restricted table expression | + +**Returns:** New `Diagram` with restrict conditions applied. + +**Constraints:** + +- Cannot be called on a cascade-restricted Diagram (mutually exclusive with `cascade()`) +- `table_expr.full_table_name` must be a node in the diagram +- **Can be chained** — call `restrict()` multiple times to add conditions from different tables + +```python +# Chain multiple restrictions (AND semantics) +diag = dj.Diagram(schema) +restricted = (diag + .restrict(Subject & {'species': 'mouse'}) + .restrict(Session & 'session_date > "2024-01-01"')) +``` + +### `delete()` + +```python +diag.delete(transaction=True, prompt=None) +``` + +Execute a cascading delete using previously applied cascade restrictions. Tables are deleted in reverse topological order (leaves first) to maintain referential integrity. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `transaction` | bool | `True` | Wrap in atomic transaction | +| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | + +**Returns:** Number of rows deleted from the root table. + +**Requires:** `cascade()` must be called first. + +```python +diag = dj.Diagram(schema) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) +restricted.preview() # inspect what will be deleted +restricted.delete() # execute the delete +``` + +### `drop()` + +```python +diag.drop(prompt=None, part_integrity="enforce") +``` + +Drop all tables in the diagram in reverse topological order. + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | +| `part_integrity` | str | `"enforce"` | `"enforce"` or `"ignore"` | + +**Note:** Unlike `delete()`, `drop()` does not use cascade restrictions. It drops all tables in the diagram. + +### `preview()` + +```python +diag.preview() +``` + +Show affected tables and row counts without modifying data. Works with both `cascade()` and `restrict()` restrictions. + +**Returns:** `dict[str, int]` — mapping of full table names to affected row counts. + +**Requires:** `cascade()` or `restrict()` must be called first. + +```python +diag = dj.Diagram(schema) +restricted = diag.cascade(Session & {'subject_id': 'M001'}) +counts = restricted.preview() +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} +``` + +### `prune()` + +```python +diag.prune() +``` + +Remove tables with zero matching rows from the diagram. Without prior restrictions, removes physically empty tables. With restrictions (`cascade()` or `restrict()`), removes tables where the restricted query yields zero rows. + +**Returns:** New `Diagram` with empty tables removed. + +**Note:** Queries the database to determine row counts. The underlying graph structure is preserved — subsequent `restrict()` calls can still seed at any table in the schema. + +```python +# Export workflow: restrict, prune, visualize +export = (dj.Diagram(schema) + .restrict(Subject & {'species': 'mouse'}) + .restrict(Session & 'session_date > "2024-01-01"') + .prune()) + +export.preview() # only tables with matching rows +export # visualize the export subgraph +``` + +### Restriction Propagation + +When `cascade()` or `restrict()` propagates a restriction from a parent table to a child table, one of three rules applies depending on the foreign key relationship: + +**Rule 1 — Direct copy:** When the foreign key is non-aliased and the restriction attributes are a subset of the child's primary key, the restriction is copied directly to the child. + +**Rule 2 — Aliased projection:** When the foreign key uses attribute renaming (e.g., `subject_id` → `animal_id`), the parent is projected with the attribute mapping to match the child's column names. + +**Rule 3 — Full projection:** When the foreign key is non-aliased but the restriction uses attributes not in the child's primary key, the parent is projected (all attributes) and used as a restriction on the child. + +**Convergence behavior:** + +When a child table has multiple restricted ancestors, the convergence rule depends on the mode: + +- **`cascade()` (OR):** A child row is affected if *any* path from a restricted ancestor reaches it. This is appropriate for delete — if any reason exists to delete a row, it should be deleted. +- **`restrict()` (AND):** A child row is included only if *all* restricted ancestors match. This is appropriate for export — only rows satisfying every condition are selected. + +--- + ## Output Methods ### Graphviz Output @@ -299,18 +469,23 @@ combined = dj.Diagram.from_sequence([schema1, schema2, schema3]) ## Dependencies -Diagram visualization requires optional dependencies: +Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`, `prune`) use `networkx`, which is always installed as a core dependency. + +Diagram **visualization** requires optional dependencies: ```bash pip install matplotlib pygraphviz ``` -If dependencies are missing, `dj.Diagram` displays a warning and provides a stub class. +If visualization dependencies are missing, `dj.Diagram` displays a warning and provides a stub class. Operational methods remain available regardless. --- ## See Also - [How to Read Diagrams](../../how-to/read-diagrams.ipynb/) +- [Delete Data](../../how-to/delete-data.md/) — Diagram-level delete workflow +- [What's New in 2.2](../../explanation/whats-new-22.md/) — Motivation and design +- [Data Manipulation](data-manipulation.md) — Insert, update, delete specification - [Query Algebra](query-algebra.md) - [Table Declaration](table-declaration.md) From 3a27735f89f81f48e5b4762cd488330d433b1980 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 9 Mar 2026 08:55:04 -0500 Subject: [PATCH 2/8] docs: add dry_run to diagram specs, fix cross-refs and version markers - Add dry_run parameter to delete() and drop() signatures in diagram.md - Fix trailing slashes in cross-reference paths across 3 files - Convert inline version markers to proper admonitions in read-diagrams.ipynb - Normalize table-cell version markers to consistent *(New in X.Y)* format Co-Authored-By: Claude Opus 4.6 --- src/explanation/whats-new-22.md | 12 +++++----- src/how-to/delete-data.md | 2 +- src/how-to/read-diagrams.ipynb | 41 +++++---------------------------- src/reference/specs/diagram.md | 16 ++++++++----- 4 files changed, 23 insertions(+), 48 deletions(-) diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md index 5eef376f..f94ab24a 100644 --- a/src/explanation/whats-new-22.md +++ b/src/explanation/whats-new-22.md @@ -264,9 +264,9 @@ Without prior restrictions, `prune()` removes physically empty tables. This is u ## See Also -- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide -- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial -- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings -- [Configure Database](../how-to/configure-database.md/) — Connection setup -- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations -- [Delete Data](../how-to/delete-data.md/) — Task-oriented delete guide +- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide +- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial +- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings +- [Configure Database](../how-to/configure-database.md) — Connection setup +- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations +- [Delete Data](../how-to/delete-data.md) — Task-oriented delete guide diff --git a/src/how-to/delete-data.md b/src/how-to/delete-data.md index 72724075..d89a87e7 100644 --- a/src/how-to/delete-data.md +++ b/src/how-to/delete-data.md @@ -225,7 +225,7 @@ For simple single-table deletes, `(Table & restriction).delete()` remains the si ## See Also -- [Diagram Specification](../reference/specs/diagram.md/) — Full reference for diagram operations +- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations - [Master-Part Tables](master-part.ipynb) — Compositional data patterns - [Model Relationships](model-relationships.ipynb) — Foreign key patterns - [Insert Data](insert-data.md) — Adding data to tables diff --git a/src/how-to/read-diagrams.ipynb b/src/how-to/read-diagrams.ipynb index 61dd652b..d2b13e23 100644 --- a/src/how-to/read-diagrams.ipynb +++ b/src/how-to/read-diagrams.ipynb @@ -1325,24 +1325,13 @@ "cell_type": "markdown", "id": "cell-ops-ref", "metadata": {}, - "source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(2.2+)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```" + "source": "**Operation Reference:**\n\n| Operation | Meaning |\n|-----------|--------|\n| `dj.Diagram(schema)` | Entire schema |\n| `dj.Diagram(Table) - N` | Table + N levels upstream |\n| `dj.Diagram(Table) + N` | Table + N levels downstream |\n| `D1 + D2` | Union of two diagrams |\n| `D1 * D2` | Intersection (common nodes) |\n| `D.prune()` | Remove tables with zero matching rows *(New in 2.2)* |\n\n**Finding paths:** Use intersection to find connection paths:\n```python\n(dj.Diagram(upstream) + 100) * (dj.Diagram(downstream) - 100)\n```" }, { "cell_type": "markdown", "id": "2lmw6tar3w8", "metadata": {}, - "source": [ - "## Layout Direction\n", - "\n", - "*New in DataJoint 2.1*\n", - "\n", - "Control the flow direction of diagrams via configuration:\n", - "\n", - "| Direction | Description |\n", - "|-----------|-------------|\n", - "| `\"TB\"` | Top to bottom (default) |\n", - "| `\"LR\"` | Left to right |" - ] + "source": "## Layout Direction\n\n!!! version-added \"New in 2.1\"\n Configurable layout direction was added in DataJoint 2.1.\n\nControl the flow direction of diagrams via configuration:\n\n| Direction | Description |\n|-----------|-------------|\n| `\"TB\"` | Top to bottom (default) |\n| `\"LR\"` | Left to right |" }, { "cell_type": "code", @@ -1619,13 +1608,7 @@ "cell_type": "markdown", "id": "ogpr8cqsife", "metadata": {}, - "source": [ - "## Mermaid Output\n", - "\n", - "*New in DataJoint 2.1*\n", - "\n", - "Generate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:" - ] + "source": "## Mermaid Output\n\n!!! version-added \"New in 2.1\"\n Mermaid output was added in DataJoint 2.1.\n\nGenerate [Mermaid](https://mermaid.js.org/) syntax for embedding diagrams in Markdown documentation, GitHub, or web pages:" }, { "cell_type": "code", @@ -1685,13 +1668,7 @@ "cell_type": "markdown", "id": "pqet0vo8pwp", "metadata": {}, - "source": [ - "## Multi-Schema Pipelines\n", - "\n", - "Real-world pipelines often span multiple schemas (modules). \n", - "\n", - "*New in DataJoint 2.1:* Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label." - ] + "source": "## Multi-Schema Pipelines\n\nReal-world pipelines often span multiple schemas (modules).\n\n!!! version-added \"New in 2.1\"\n Automatic schema grouping was added in DataJoint 2.1. Tables are automatically grouped into visual clusters by schema, with the Python module name shown as the group label." }, { "cell_type": "code", @@ -2089,13 +2066,7 @@ "cell_type": "markdown", "id": "ncl6hafwbjt", "metadata": {}, - "source": [ - "## Collapsing Schemas\n", - "\n", - "*New in DataJoint 2.1*\n", - "\n", - "For high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables." - ] + "source": "## Collapsing Schemas\n\n!!! version-added \"New in 2.1\"\n The `collapse()` method was added in DataJoint 2.1.\n\nFor high-level pipeline views, collapse entire schemas into single nodes using `.collapse()`. This is useful for showing relationships between modules without the detail of individual tables." }, { "cell_type": "code", @@ -3307,7 +3278,7 @@ "cell_type": "markdown", "id": "cell-summary-md", "metadata": {}, - "source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(2.1+)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(2.1+)* |\n| Prune empty tables | `.prune()` *(2.2+)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)" + "source": "## Summary\n\n| Visual | Meaning |\n|--------|--------|\n| **Thick solid** | One-to-one extension |\n| **Thin solid** | One-to-many containment |\n| **Dashed** | Reference (independent identity) |\n| **Underlined** | Introduces new dimension |\n| **Orange dots** | Renamed FK via `.proj()` |\n| **Colors** | Green=Manual, Gray=Lookup, Red=Computed, Blue=Imported |\n| **Grouped boxes** | Tables grouped by schema/module |\n| **3D box (gray)** | Collapsed schema *(New in 2.1)* |\n\n| Feature | Method |\n|---------|--------|\n| Layout direction | `dj.config.display.diagram_direction` |\n| Mermaid output | `.make_mermaid()` |\n| Collapse schema | `.collapse()` *(New in 2.1)* |\n| Prune empty tables | `.prune()` *(New in 2.2)* |\n\n## Related\n\n- [Diagram Specification](../reference/specs/diagram.md)\n- [Entity Integrity: Dimensions](../explanation/entity-integrity.md#schema-dimensions)\n- [Semantic Matching](../reference/specs/semantic-matching.md)\n- [Schema Design Tutorial](../tutorials/basics/02-schema-design.ipynb)" }, { "cell_type": "code", diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index 1745b469..cfbe8cd6 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -190,7 +190,7 @@ restricted = (diag ### `delete()` ```python -diag.delete(transaction=True, prompt=None) +diag.delete(transaction=True, prompt=None, dry_run=False) ``` Execute a cascading delete using previously applied cascade restrictions. Tables are deleted in reverse topological order (leaves first) to maintain referential integrity. @@ -199,8 +199,9 @@ Execute a cascading delete using previously applied cascade restrictions. Tables |-----------|------|---------|-------------| | `transaction` | bool | `True` | Wrap in atomic transaction | | `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | +| `dry_run` | bool | `False` | If `True`, return affected row counts without deleting | -**Returns:** Number of rows deleted from the root table. +**Returns:** `int` (rows deleted from root table) or `dict[str, int]` (table → row count mapping when `dry_run=True`). **Requires:** `cascade()` must be called first. @@ -214,7 +215,7 @@ restricted.delete() # execute the delete ### `drop()` ```python -diag.drop(prompt=None, part_integrity="enforce") +diag.drop(prompt=None, part_integrity="enforce", dry_run=False) ``` Drop all tables in the diagram in reverse topological order. @@ -223,6 +224,9 @@ Drop all tables in the diagram in reverse topological order. |-----------|------|---------|-------------| | `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | | `part_integrity` | str | `"enforce"` | `"enforce"` or `"ignore"` | +| `dry_run` | bool | `False` | If `True`, return row counts without dropping tables | + +**Returns:** `dict[str, int]` (table → row count mapping when `dry_run=True`). Returns `None` otherwise. **Note:** Unlike `delete()`, `drop()` does not use cascade restrictions. It drops all tables in the diagram. @@ -483,9 +487,9 @@ If visualization dependencies are missing, `dj.Diagram` displays a warning and p ## See Also -- [How to Read Diagrams](../../how-to/read-diagrams.ipynb/) -- [Delete Data](../../how-to/delete-data.md/) — Diagram-level delete workflow -- [What's New in 2.2](../../explanation/whats-new-22.md/) — Motivation and design +- [How to Read Diagrams](../../how-to/read-diagrams.ipynb) +- [Delete Data](../../how-to/delete-data.md) — Diagram-level delete workflow +- [What's New in 2.2](../../explanation/whats-new-22.md) — Motivation and design - [Data Manipulation](data-manipulation.md) — Insert, update, delete specification - [Query Algebra](query-algebra.md) - [Table Declaration](table-declaration.md) From af0c0dbc8bdb8cf81dd7dc7057a9c29bf7b15f7f Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 9 Mar 2026 14:16:14 -0500 Subject: [PATCH 3/8] docs: clarify that cascade/restrict only affect descendants, not ancestors Both cascade() and restrict() propagate downstream only from the seed table. Ancestors of the seed are excluded. Document this in the diagram spec (cascade and restrict method descriptions) and the whats-new explanation. Co-Authored-By: Claude Opus 4.6 --- src/explanation/whats-new-22.md | 2 ++ src/reference/specs/diagram.md | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md index f94ab24a..2962745b 100644 --- a/src/explanation/whats-new-22.md +++ b/src/explanation/whats-new-22.md @@ -240,6 +240,8 @@ The diagram supports two restriction propagation modes with different convergenc **`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables. +Both modes propagate **downstream only** — from the seed table to its descendants. Tables upstream of the seed (its ancestors) are never affected. This matches the semantics of foreign key cascades: deleting a session deletes its trials, not its subject. + The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics. ### Pruning Empty Tables diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index cfbe8cd6..ba35b05e 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -130,7 +130,7 @@ Diagrams can propagate restrictions through the dependency graph and execute dat diag.cascade(table_expr, part_integrity="enforce") ``` -Apply a cascade restriction and propagate it downstream through the dependency graph. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. +Apply a cascade restriction and propagate it downstream through the dependency graph. Only the seed table and its descendants are affected — ancestors of the seed table are excluded. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| @@ -165,7 +165,7 @@ restricted = diag.cascade(Session & {'subject_id': 'M001'}) diag.restrict(table_expr) ``` -Apply a restrict condition and propagate it downstream. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. +Apply a restrict condition and propagate it downstream. Only the seed table and its descendants are affected — ancestors of the seed table are excluded. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| From 1bc28f5f08bb83bb71196f92c454342f8954753e Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 9 Mar 2026 14:18:18 -0500 Subject: [PATCH 4/8] =?UTF-8?q?docs:=20clarify=20cascade/restrict=20wordin?= =?UTF-8?q?g=20=E2=80=94=20unaffected,=20not=20removed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ancestors remain in the diagram but receive no restrictions and are unaffected by delete/preview. Previous wording ("excluded") was imprecise — they're not removed from the graph, just not operated on. Co-Authored-By: Claude Opus 4.6 --- src/reference/specs/diagram.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index ba35b05e..88632869 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -130,7 +130,7 @@ Diagrams can propagate restrictions through the dependency graph and execute dat diag.cascade(table_expr, part_integrity="enforce") ``` -Apply a cascade restriction and propagate it downstream through the dependency graph. Only the seed table and its descendants are affected — ancestors of the seed table are excluded. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. +Apply a cascade restriction and propagate it downstream through the dependency graph. Only the seed table and its descendants receive restrictions — ancestors of the seed table are unaffected by subsequent `delete()` or `preview()` calls. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| @@ -165,7 +165,7 @@ restricted = diag.cascade(Session & {'subject_id': 'M001'}) diag.restrict(table_expr) ``` -Apply a restrict condition and propagate it downstream. Only the seed table and its descendants are affected — ancestors of the seed table are excluded. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. +Apply a restrict condition and propagate it downstream. Only the seed table and its descendants receive restrictions — ancestors of the seed table are unaffected by subsequent operations. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| From fbbafc27fd9260ba07cfa08a2b1abbd60012e733 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 9 Mar 2026 16:06:50 -0500 Subject: [PATCH 5/8] docs: clarify cascade() trims diagram to subgraph Update diagram.md and whats-new-22.md to reflect that cascade() returns a trimmed Diagram containing only seed + descendants, while restrict() keeps the full graph intact for chaining. Co-Authored-By: Claude Opus 4.6 --- src/explanation/whats-new-22.md | 2 +- src/reference/specs/diagram.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md index 2962745b..9122b967 100644 --- a/src/explanation/whats-new-22.md +++ b/src/explanation/whats-new-22.md @@ -240,7 +240,7 @@ The diagram supports two restriction propagation modes with different convergenc **`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables. -Both modes propagate **downstream only** — from the seed table to its descendants. Tables upstream of the seed (its ancestors) are never affected. This matches the semantics of foreign key cascades: deleting a session deletes its trials, not its subject. +Both modes propagate **downstream only** — from the seed table to its descendants. `cascade()` goes further: it trims the returned Diagram to the cascade subgraph, removing all ancestors and unrelated tables. This means `delete()` operates on the entire trimmed diagram with no additional filtering. `restrict()` keeps the full graph intact (to support chaining from multiple seed tables) but only restricts the seed's descendants. This matches the semantics of foreign key cascades: deleting a session deletes its trials, not its subject. The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics. diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index 88632869..9e312b99 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -130,14 +130,14 @@ Diagrams can propagate restrictions through the dependency graph and execute dat diag.cascade(table_expr, part_integrity="enforce") ``` -Apply a cascade restriction and propagate it downstream through the dependency graph. Only the seed table and its descendants receive restrictions — ancestors of the seed table are unaffected by subsequent `delete()` or `preview()` calls. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. +Apply a cascade restriction and propagate it downstream through the dependency graph. The returned Diagram is trimmed to the **cascade subgraph** — only the seed table and its descendants remain. All ancestors and unrelated tables are removed. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `table_expr` | QueryExpression | — | A restricted table expression (e.g., `Session & 'subject_id=1'`) | | `part_integrity` | str | `"enforce"` | Master-part integrity policy | -**Returns:** New `Diagram` with cascade restrictions applied. +**Returns:** New `Diagram` containing only the seed table and its descendants, with cascade restrictions applied. **Constraints:** @@ -165,7 +165,7 @@ restricted = diag.cascade(Session & {'subject_id': 'M001'}) diag.restrict(table_expr) ``` -Apply a restrict condition and propagate it downstream. Only the seed table and its descendants receive restrictions — ancestors of the seed table are unaffected by subsequent operations. Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. +Apply a restrict condition and propagate it downstream. Only the seed table and its descendants receive restrictions — ancestors remain in the diagram but are not restricted. Unlike `cascade()`, the diagram is not trimmed (to support chaining from multiple seed tables). Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| @@ -193,7 +193,7 @@ restricted = (diag diag.delete(transaction=True, prompt=None, dry_run=False) ``` -Execute a cascading delete using previously applied cascade restrictions. Tables are deleted in reverse topological order (leaves first) to maintain referential integrity. +Execute a cascading delete on the cascade subgraph. All tables in the diagram are deleted in reverse topological order (leaves first) to maintain referential integrity. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| From 56ecea441c3293e8ecd3de5492586f0f02bc4e6f Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 9 Mar 2026 16:07:58 -0500 Subject: [PATCH 6/8] docs: sharpen distinction between cascade() and restrict() Lead each description with its purpose rather than using parallel structure. cascade() prepares a delete (one-shot, trims graph, OR). restrict() selects a data subset (chainable, preserves graph, AND). Co-Authored-By: Claude Opus 4.6 --- src/explanation/whats-new-22.md | 10 ++++------ src/reference/specs/diagram.md | 14 +++++++------- 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md index 9122b967..14c66824 100644 --- a/src/explanation/whats-new-22.md +++ b/src/explanation/whats-new-22.md @@ -234,15 +234,13 @@ This is valuable when working with unfamiliar pipelines, large datasets, or mult ### Two Propagation Modes -The diagram supports two restriction propagation modes with different convergence semantics: +The diagram supports two restriction propagation modes designed for fundamentally different tasks. -**`cascade()` uses OR at convergence.** When a child table has multiple restricted ancestors, the child row is affected if *any* parent path reaches it. This is the right semantics for delete — if any reason exists to remove a row, it should be removed. `cascade()` is one-shot: it can only be called once on an unrestricted diagram. +**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`. -**`restrict()` uses AND at convergence.** A child row is included only if *all* restricted ancestors match. This is the right semantics for data subsetting and export — only rows satisfying every condition are selected. `restrict()` is chainable: call it multiple times to build up conditions from different tables. +**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result. -Both modes propagate **downstream only** — from the seed table to its descendants. `cascade()` goes further: it trims the returned Diagram to the cascade subgraph, removing all ancestors and unrelated tables. This means `delete()` operates on the entire trimmed diagram with no additional filtering. `restrict()` keeps the full graph intact (to support chaining from multiple seed tables) but only restricts the seed's descendants. This matches the semantics of foreign key cascades: deleting a session deletes its trials, not its subject. - -The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics. +The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics — a delete diagram should never be reused for subsetting, and vice versa. ### Pruning Empty Tables diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index 9e312b99..1102f389 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -130,7 +130,7 @@ Diagrams can propagate restrictions through the dependency graph and execute dat diag.cascade(table_expr, part_integrity="enforce") ``` -Apply a cascade restriction and propagate it downstream through the dependency graph. The returned Diagram is trimmed to the **cascade subgraph** — only the seed table and its descendants remain. All ancestors and unrelated tables are removed. Uses **OR** semantics at convergence — a child row is affected if *any* restricted ancestor reaches it. Designed for delete operations. +Prepare a cascading delete. Starting from a restricted table expression, propagate the restriction downstream through all descendants using **OR** semantics — a descendant row is marked for deletion if *any* ancestor path reaches it. The returned Diagram is **trimmed** to the cascade subgraph: only the seed table and its descendants remain; all ancestors and unrelated tables are removed. The trimmed diagram is ready for `preview()` and `delete()`. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| @@ -141,8 +141,8 @@ Apply a cascade restriction and propagate it downstream through the dependency g **Constraints:** -- `cascade()` can only be called **once** on an unrestricted Diagram -- Cannot be mixed with `restrict()` — the two modes are mutually exclusive +- **One-shot** — can only be called once on an unrestricted Diagram +- Mutually exclusive with `restrict()` - `table_expr.full_table_name` must be a node in the diagram **`part_integrity` values:** @@ -165,19 +165,19 @@ restricted = diag.cascade(Session & {'subject_id': 'M001'}) diag.restrict(table_expr) ``` -Apply a restrict condition and propagate it downstream. Only the seed table and its descendants receive restrictions — ancestors remain in the diagram but are not restricted. Unlike `cascade()`, the diagram is not trimmed (to support chaining from multiple seed tables). Uses **AND** semantics at convergence — a child row is included only if it satisfies *all* restricted ancestors. Designed for data subsetting and export operations. +Select a subset of data for export or inspection. Starting from a restricted table expression, propagate the restriction downstream through all descendants using **AND** semantics — a descendant row is included only if *all* restricted ancestors match. The full diagram is preserved (ancestors, unrelated tables) so that `restrict()` can be called again from a different seed table, building up a multi-condition subset incrementally. | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `table_expr` | QueryExpression | — | A restricted table expression | -**Returns:** New `Diagram` with restrict conditions applied. +**Returns:** New `Diagram` with restrict conditions applied. The graph is not trimmed. **Constraints:** -- Cannot be called on a cascade-restricted Diagram (mutually exclusive with `cascade()`) +- **Chainable** — call multiple times to add conditions from different seed tables +- Mutually exclusive with `cascade()` - `table_expr.full_table_name` must be a node in the diagram -- **Can be chained** — call `restrict()` multiple times to add conditions from different tables ```python # Chain multiple restrictions (AND semantics) From 6c9291d48761b3f4f95c0495cf31ff3dfb3059ed Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Mon, 9 Mar 2026 16:15:57 -0500 Subject: [PATCH 7/8] docs: explain upward propagation with part_integrity="cascade" Describe how cascade propagates restrictions upward from part to master, then back downstream to all sibling parts, deleting the entire compositional unit. Updated in both diagram.md and master-part.md. Co-Authored-By: Claude Opus 4.6 --- src/reference/specs/diagram.md | 4 +++- src/reference/specs/master-part.md | 7 +++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index 1102f389..7a5733c1 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -151,7 +151,9 @@ Prepare a cascading delete. Starting from a restricted table expression, propaga |-------|----------| | `"enforce"` | Error if parts would be deleted before masters | | `"ignore"` | Allow deleting parts without masters | -| `"cascade"` | Also delete masters when parts are deleted | +| `"cascade"` | Propagate restriction upward from part to master, then re-propagate downstream to all sibling parts | + +With `"cascade"`, the restriction flows **upward** from a part table to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction propagates back downstream through the normal cascade — deleting the entire compositional unit (master + all parts), not just the originally matched part rows. ```python # Build a cascade from a restricted table diff --git a/src/reference/specs/master-part.md b/src/reference/specs/master-part.md index c8dff40d..9afe8f36 100644 --- a/src/reference/specs/master-part.md +++ b/src/reference/specs/master-part.md @@ -216,10 +216,9 @@ Session.Trial.delete() (Session.Trial & condition).delete(part_integrity="cascade") ``` -**Behavior:** -- Identifies affected masters -- Deletes masters (which cascades to ALL their parts) -- Maintains compositional integrity +**Behavior:** The restriction propagates **upward** from the part to its master. Specifically, the restricted part rows identify which master rows are affected, and those masters receive a restriction. The master restriction then propagates back **downstream** through the normal cascade, reaching all sibling parts. The result is that the entire compositional unit — master plus all its parts — is deleted, not just the originally restricted part rows. + +This upward propagation may trigger further rounds: if the master itself is a part of a higher-level master, the restriction continues upward. The cascade engine iterates until no new restrictions are produced. ### 4.6 Behavior Matrix From 187408b014dd9c862d1580933a45565e18da8841 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Tue, 10 Mar 2026 17:27:09 -0500 Subject: [PATCH 8/8] docs: remove delete/drop from Diagram public API Diagram is now an inspection-only tool. delete() and drop() have been moved to Table. Updated diagram spec, whats-new-22, and delete-data how-to to reflect this change. Co-Authored-By: Claude Opus 4.6 --- src/explanation/whats-new-22.md | 40 +++++++++++++++------ src/how-to/delete-data.md | 31 ++++++++++------- src/reference/specs/diagram.md | 61 +++++++-------------------------- 3 files changed, 61 insertions(+), 71 deletions(-) diff --git a/src/explanation/whats-new-22.md b/src/explanation/whats-new-22.md index 14c66824..f33abb1d 100644 --- a/src/explanation/whats-new-22.md +++ b/src/explanation/whats-new-22.md @@ -1,6 +1,6 @@ # What's New in DataJoint 2.2 -DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph. +DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections, and **graph-driven diagram operations** that replace the legacy error-driven cascade with a reliable, inspectable approach for all users. > **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive. @@ -207,27 +207,29 @@ DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational ### From Visualization to Operations -In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently, with no way to inspect or control the cascade before it executed. +In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently using an error-driven approach: attempt `DELETE` on the parent, catch the foreign key integrity error, parse the error message to discover which child table is blocking, then recursively delete from that child first. This had several problems: -In 2.2, `Table.delete()` and `Table.drop()` delegate internally to `dj.Diagram`. The user-facing behavior of `Table.delete()` is unchanged, but the diagram-level API is now available as a more powerful interface for complex scenarios. +- **MySQL 8 with limited privileges** returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name — the cascade crashes with no way to proceed. +- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt. +- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats. + +In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing. ### The Preview-Then-Execute Pattern -The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then decide whether to execute: +The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`: ```python -# Build the dependency graph +# Build the dependency graph and inspect the cascade diag = dj.Diagram(schema) - -# Apply cascade restriction — nothing is deleted yet restricted = diag.cascade(Session & {'subject_id': 'M001'}) # Inspect: what tables and how many rows would be affected? counts = restricted.preview() # {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} -# Execute only after reviewing the blast radius -restricted.delete(prompt=False) +# Execute via Table.delete() after reviewing the blast radius +(Session & {'subject_id': 'M001'}).delete(prompt=False) ``` This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious. @@ -238,9 +240,11 @@ The diagram supports two restriction propagation modes designed for fundamentall **`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`. +When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows. + **`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result. -The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics — a delete diagram should never be reused for subsetting, and vice versa. +The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa. ### Pruning Empty Tables @@ -260,7 +264,21 @@ Without prior restrictions, `prune()` removes physically empty tables. This is u ### Architecture -`Table.delete()` now constructs a `Diagram` internally, calls `cascade()`, and then `delete()`. This means every table-level delete benefits from the same graph-driven logic. The diagram-level API simply exposes this machinery for direct use when more control is needed. +`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`. + +### Advantages over Error-Driven Cascade + +The graph-driven approach resolves every known limitation of the prior error-driven cascade: + +| Scenario | Error-driven (prior) | Graph-driven (2.2) | +|---|---|---| +| MySQL 8 + limited privileges | Crashes (error 1217, no table name) | Works — no error parsing needed | +| PostgreSQL | Savepoint overhead per attempt | No errors triggered | +| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront | +| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) | +| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" | +| Reusability | Delete-only | Delete, drop, export, prune | +| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing | ## See Also diff --git a/src/how-to/delete-data.md b/src/how-to/delete-data.md index d89a87e7..1de724cc 100644 --- a/src/how-to/delete-data.md +++ b/src/how-to/delete-data.md @@ -189,39 +189,46 @@ count = (Subject & restriction).delete(prompt=False) print(f"Deleted {count} subjects") ``` -## Diagram-Level Delete +## Inspecting Cascade Before Deleting !!! version-added "New in 2.2" - Diagram-level delete was added in DataJoint 2.2. + Cascade inspection via `dj.Diagram` was added in DataJoint 2.2. -For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing. +For a quick preview, `table.delete(dry_run=True)` returns the affected row counts without deleting anything: -### Build, Preview, Execute +```python +# Quick preview of what would be deleted +(Session & {'subject_id': 'M001'}).delete(dry_run=True) +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} +``` + +For more complex scenarios — working across schemas, chaining multiple restrictions, or visualizing the dependency graph — use `dj.Diagram` to build and inspect the cascade explicitly: ```python import datajoint as dj -# 1. Build the dependency graph +# 1. Build the dependency graph and apply cascade restriction diag = dj.Diagram(schema) - -# 2. Apply cascade restriction (nothing deleted yet) restricted = diag.cascade(Session & {'subject_id': 'M001'}) -# 3. Preview: see affected tables and row counts +# 2. Preview: see affected tables and row counts counts = restricted.preview() # {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} -# 4. Execute only after reviewing -restricted.delete(prompt=False) +# 3. Visualize the cascade subgraph (in Jupyter) +restricted + +# 4. Execute via Table.delete() after reviewing +(Session & {'subject_id': 'M001'}).delete(prompt=False) ``` ### When to Use - **Preview blast radius**: Understand what a cascade delete will affect before committing -- **Multi-schema cascades**: Build a diagram spanning multiple schemas and delete across them in one operation +- **Multi-schema inspection**: Build a diagram spanning multiple schemas to visualize cascade impact - **Programmatic control**: Use `preview()` return values to make decisions in automated workflows -For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram-level API is for when you need more visibility or control. +For simple single-table deletes, `(Table & restriction).delete()` remains the simplest approach. The diagram API is for when you need more visibility before executing. ## See Also diff --git a/src/reference/specs/diagram.md b/src/reference/specs/diagram.md index 7a5733c1..b3e9ea4b 100644 --- a/src/reference/specs/diagram.md +++ b/src/reference/specs/diagram.md @@ -120,9 +120,9 @@ dj.Diagram(Subject) + dj.Diagram(analysis).collapse() ## Operational Methods !!! version-added "New in 2.2" - Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`, `prune`) were added in DataJoint 2.2. + Operational methods (`cascade`, `restrict`, `preview`, `prune`) were added in DataJoint 2.2. -Diagrams can propagate restrictions through the dependency graph and execute data operations (delete, drop) using the graph structure. These methods turn Diagram from a visualization tool into an operational component. +Diagrams can propagate restrictions through the dependency graph and inspect affected data using the graph structure. These methods turn Diagram from a visualization tool into a graph computation and inspection component. All mutation operations (delete, drop) are executed by `Table.delete()` and `Table.drop()`, which use Diagram internally. ### `cascade()` @@ -189,49 +189,6 @@ restricted = (diag .restrict(Session & 'session_date > "2024-01-01"')) ``` -### `delete()` - -```python -diag.delete(transaction=True, prompt=None, dry_run=False) -``` - -Execute a cascading delete on the cascade subgraph. All tables in the diagram are deleted in reverse topological order (leaves first) to maintain referential integrity. - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `transaction` | bool | `True` | Wrap in atomic transaction | -| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | -| `dry_run` | bool | `False` | If `True`, return affected row counts without deleting | - -**Returns:** `int` (rows deleted from root table) or `dict[str, int]` (table → row count mapping when `dry_run=True`). - -**Requires:** `cascade()` must be called first. - -```python -diag = dj.Diagram(schema) -restricted = diag.cascade(Session & {'subject_id': 'M001'}) -restricted.preview() # inspect what will be deleted -restricted.delete() # execute the delete -``` - -### `drop()` - -```python -diag.drop(prompt=None, part_integrity="enforce", dry_run=False) -``` - -Drop all tables in the diagram in reverse topological order. - -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `prompt` | bool or None | `None` | Prompt for confirmation. Default: `dj.config['safemode']` | -| `part_integrity` | str | `"enforce"` | `"enforce"` or `"ignore"` | -| `dry_run` | bool | `False` | If `True`, return row counts without dropping tables | - -**Returns:** `dict[str, int]` (table → row count mapping when `dry_run=True`). Returns `None` otherwise. - -**Note:** Unlike `delete()`, `drop()` does not use cascade restrictions. It drops all tables in the diagram. - ### `preview()` ```python @@ -257,7 +214,7 @@ counts = restricted.preview() diag.prune() ``` -Remove tables with zero matching rows from the diagram. Without prior restrictions, removes physically empty tables. With restrictions (`cascade()` or `restrict()`), removes tables where the restricted query yields zero rows. +Remove tables with zero matching rows from the diagram view. This only affects the diagram object — no tables or data are modified in the database. Without prior restrictions, removes physically empty tables from the diagram. With restrictions (`cascade()` or `restrict()`), removes tables where the restricted query yields zero rows. **Returns:** New `Diagram` with empty tables removed. @@ -291,6 +248,14 @@ When a child table has multiple restricted ancestors, the convergence rule depen - **`cascade()` (OR):** A child row is affected if *any* path from a restricted ancestor reaches it. This is appropriate for delete — if any reason exists to delete a row, it should be deleted. - **`restrict()` (AND):** A child row is included only if *all* restricted ancestors match. This is appropriate for export — only rows satisfying every condition are selected. +**Multiple foreign keys to the same parent:** + +When a child table references the same parent through multiple foreign keys (e.g., `source_mouse` and `target_mouse` both referencing `Mouse`), these paths always combine with **OR** regardless of the propagation mode. Each foreign key path is an independent reason for the child row to be affected — this is structural, not operation-dependent. + +**Unloaded schemas:** + +If a descendant table lives in a schema that hasn't been activated (loaded into the dependency graph), the graph-driven delete won't know about it. The final `DELETE` on the parent will fail with a foreign key error. DataJoint catches this and produces an actionable error message identifying which schema needs to be activated. + --- ## Output Methods @@ -475,7 +440,7 @@ combined = dj.Diagram.from_sequence([schema1, schema2, schema3]) ## Dependencies -Operational methods (`cascade`, `restrict`, `delete`, `drop`, `preview`, `prune`) use `networkx`, which is always installed as a core dependency. +Operational methods (`cascade`, `restrict`, `preview`, `prune`) use `networkx`, which is always installed as a core dependency. Diagram **visualization** requires optional dependencies: @@ -490,7 +455,7 @@ If visualization dependencies are missing, `dj.Diagram` displays a warning and p ## See Also - [How to Read Diagrams](../../how-to/read-diagrams.ipynb) -- [Delete Data](../../how-to/delete-data.md) — Diagram-level delete workflow +- [Delete Data](../../how-to/delete-data.md) — Cascade inspection and delete workflow - [What's New in 2.2](../../explanation/whats-new-22.md) — Motivation and design - [Data Manipulation](data-manipulation.md) — Insert, update, delete specification - [Query Algebra](query-algebra.md)