-
Notifications
You must be signed in to change notification settings - Fork 11
docs: document restricted diagram operations (new in 2.2) #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
443090c
3a27735
af0c0db
1bc28f5
fbbafc2
56ecea4
6c9291d
187408b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| # What's New in DataJoint 2.2 | ||
|
|
||
| DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing. | ||
| DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections, and **graph-driven diagram operations** that replace the legacy error-driven cascade with a reliable, inspectable approach for all users. | ||
|
|
||
| > **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive. | ||
|
|
||
|
|
@@ -201,9 +201,90 @@ class MyTable(dj.Manual): | |
|
|
||
| Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema. | ||
|
|
||
| ## Graph-Driven Diagram Operations | ||
|
|
||
| DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting. | ||
|
|
||
| ### From Visualization to Operations | ||
|
|
||
| In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently using an error-driven approach: attempt `DELETE` on the parent, catch the foreign key integrity error, parse the error message to discover which child table is blocking, then recursively delete from that child first. This had several problems: | ||
|
|
||
| - **MySQL 8 with limited privileges** returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name — the cascade crashes with no way to proceed. | ||
| - **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt. | ||
| - **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats. | ||
|
|
||
| In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing. | ||
|
|
||
| ### The Preview-Then-Execute Pattern | ||
|
|
||
| The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`: | ||
|
|
||
| ```python | ||
| # Build the dependency graph and inspect the cascade | ||
| diag = dj.Diagram(schema) | ||
| restricted = diag.cascade(Session & {'subject_id': 'M001'}) | ||
|
|
||
| # Inspect: what tables and how many rows would be affected? | ||
| counts = restricted.preview() | ||
| # {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} | ||
|
|
||
| # Execute via Table.delete() after reviewing the blast radius | ||
| (Session & {'subject_id': 'M001'}).delete(prompt=False) | ||
| ``` | ||
|
|
||
| This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious. | ||
|
|
||
| ### Two Propagation Modes | ||
|
|
||
| The diagram supports two restriction propagation modes designed for fundamentally different tasks. | ||
|
|
||
| **`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`. | ||
|
|
||
| When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows. | ||
|
|
||
| **`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, describe how part tables are restricted when reached through a path that did not include its master.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Covered in the same paragraph — describes the upward propagation from part to master when the part is reached through a path that didn't include its master. |
||
|
|
||
| The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa. | ||
|
|
||
| ### Pruning Empty Tables | ||
|
|
||
| After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data: | ||
|
|
||
| ```python | ||
| export = (dj.Diagram(schema) | ||
| .restrict(Subject & {'species': 'mouse'}) | ||
| .restrict(Session & 'session_date > "2024-01-01"') | ||
| .prune()) | ||
|
|
||
| export.preview() # only tables with matching rows | ||
| export # visualize the export subgraph | ||
| ``` | ||
|
|
||
| Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated. | ||
|
|
||
| ### Architecture | ||
|
|
||
| `Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`. | ||
|
|
||
| ### Advantages over Error-Driven Cascade | ||
|
|
||
| The graph-driven approach resolves every known limitation of the prior error-driven cascade: | ||
|
|
||
| | Scenario | Error-driven (prior) | Graph-driven (2.2) | | ||
| |---|---|---| | ||
| | MySQL 8 + limited privileges | Crashes (error 1217, no table name) | Works — no error parsing needed | | ||
| | PostgreSQL | Savepoint overhead per attempt | No errors triggered | | ||
| | Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront | | ||
| | Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) | | ||
| | Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" | | ||
| | Reusability | Delete-only | Delete, drop, export, prune | | ||
| | Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing | | ||
|
|
||
| ## See Also | ||
|
|
||
| - [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide | ||
| - [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial | ||
| - [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings | ||
| - [Configure Database](../how-to/configure-database.md/) — Connection setup | ||
| - [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide | ||
| - [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial | ||
| - [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings | ||
| - [Configure Database](../how-to/configure-database.md) — Connection setup | ||
| - [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations | ||
| - [Delete Data](../how-to/delete-data.md) — Task-oriented delete guide | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here include a description of how cascade behaves when it encounters a part table whose master is not yet included in the cascade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a paragraph after cascade() describing part table behavior: with
"enforce"(default),delete()raises an error if part rows would be deleted without their master; with"cascade", the restriction propagates upward from part to master, then back downstream to all sibling parts — deleting the entire compositional unit.