|
1 | 1 | # What's New in DataJoint 2.2 |
2 | 2 |
|
3 | | -DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing. |
| 3 | +DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections, and **graph-driven diagram operations** that replace the legacy error-driven cascade with a reliable, inspectable approach for all users. |
4 | 4 |
|
5 | 5 | > **Upgrading from 2.0 or 2.1?** No breaking changes. All existing code using `dj.config` and `dj.Schema()` continues to work. The new Instance API is purely additive. |
6 | 6 |
|
@@ -201,9 +201,90 @@ class MyTable(dj.Manual): |
201 | 201 |
|
202 | 202 | Once a Schema is created, table definitions, inserts, queries, and all other operations work identically regardless of which pattern was used to create the Schema. |
203 | 203 |
|
| 204 | +## Graph-Driven Diagram Operations |
| 205 | + |
| 206 | +DataJoint 2.2 promotes `dj.Diagram` from a visualization tool to an operational component. The same dependency graph that renders pipeline diagrams now powers cascade delete, table drop, and data subsetting. |
| 207 | + |
| 208 | +### From Visualization to Operations |
| 209 | + |
| 210 | +In prior versions, `dj.Diagram` existed solely for visualization — drawing the dependency graph as SVG or Mermaid output. The cascade logic inside `Table.delete()` traversed dependencies independently using an error-driven approach: attempt `DELETE` on the parent, catch the foreign key integrity error, parse the error message to discover which child table is blocking, then recursively delete from that child first. This had several problems: |
| 211 | + |
| 212 | +- **MySQL 8 with limited privileges** returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name — the cascade crashes with no way to proceed. |
| 213 | +- **PostgreSQL** aborts the entire transaction on any error, requiring `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips for each failed delete attempt. |
| 214 | +- **Fragile error parsing** across MySQL versions and privilege levels, where different configurations produce different error message formats. |
| 215 | + |
| 216 | +In 2.2, `Table.delete()` and `Table.drop()` use `dj.Diagram` internally to compute the dependency graph and walk it in reverse topological order — deleting leaves first, with no trial-and-error needed. The user-facing behavior of `Table.delete()` is unchanged. The Diagram's `cascade()` and `preview()` methods are available as a public inspection API for understanding cascade impact before executing. |
| 217 | + |
| 218 | +### The Preview-Then-Execute Pattern |
| 219 | + |
| 220 | +The key benefit of the diagram-level API is the ability to build a cascade explicitly, inspect it, and then execute via `Table.delete()`: |
| 221 | + |
| 222 | +```python |
| 223 | +# Build the dependency graph and inspect the cascade |
| 224 | +diag = dj.Diagram(schema) |
| 225 | +restricted = diag.cascade(Session & {'subject_id': 'M001'}) |
| 226 | + |
| 227 | +# Inspect: what tables and how many rows would be affected? |
| 228 | +counts = restricted.preview() |
| 229 | +# {'`lab`.`session`': 3, '`lab`.`trial`': 45, '`lab`.`processed_data`': 45} |
| 230 | + |
| 231 | +# Execute via Table.delete() after reviewing the blast radius |
| 232 | +(Session & {'subject_id': 'M001'}).delete(prompt=False) |
| 233 | +``` |
| 234 | + |
| 235 | +This is valuable when working with unfamiliar pipelines, large datasets, or multi-schema dependencies where the cascade impact is not immediately obvious. |
| 236 | + |
| 237 | +### Two Propagation Modes |
| 238 | + |
| 239 | +The diagram supports two restriction propagation modes designed for fundamentally different tasks. |
| 240 | + |
| 241 | +**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`. |
| 242 | + |
| 243 | +When the cascade encounters a part table whose master is not yet included in the cascade, the behavior depends on the `part_integrity` setting. With `"enforce"` (the default), `delete()` raises an error if part rows would be deleted without their master — preventing orphaned master rows. With `"cascade"`, the restriction propagates *upward* from the part to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction then propagates back downstream to all sibling parts — deleting the entire compositional unit, not just the originally matched part rows. |
| 244 | + |
| 245 | +**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result. |
| 246 | + |
| 247 | +The two modes are mutually exclusive on the same diagram — DataJoint raises an error if you attempt to mix `cascade()` and `restrict()`, or if you call `cascade()` more than once. This prevents accidental mixing of incompatible semantics: a delete diagram should never be reused for subsetting, and vice versa. |
| 248 | + |
| 249 | +### Pruning Empty Tables |
| 250 | + |
| 251 | +After applying restrictions, some tables in the diagram may have zero matching rows. The `prune()` method removes these tables from the diagram, leaving only the subgraph with actual data: |
| 252 | + |
| 253 | +```python |
| 254 | +export = (dj.Diagram(schema) |
| 255 | + .restrict(Subject & {'species': 'mouse'}) |
| 256 | + .restrict(Session & 'session_date > "2024-01-01"') |
| 257 | + .prune()) |
| 258 | + |
| 259 | +export.preview() # only tables with matching rows |
| 260 | +export # visualize the export subgraph |
| 261 | +``` |
| 262 | + |
| 263 | +Without prior restrictions, `prune()` removes physically empty tables. This is useful for understanding which parts of a pipeline are populated. |
| 264 | + |
| 265 | +### Architecture |
| 266 | + |
| 267 | +`Table.delete()` constructs a `Diagram` internally, calls `cascade()` to compute the affected subgraph, then executes the delete itself in reverse topological order. The Diagram is purely a graph computation and inspection tool — it computes the cascade and provides `preview()`, but all mutation logic (transactions, SQL execution, prompts) lives in `Table.delete()` and `Table.drop()`. |
| 268 | + |
| 269 | +### Advantages over Error-Driven Cascade |
| 270 | + |
| 271 | +The graph-driven approach resolves every known limitation of the prior error-driven cascade: |
| 272 | + |
| 273 | +| Scenario | Error-driven (prior) | Graph-driven (2.2) | |
| 274 | +|---|---|---| |
| 275 | +| MySQL 8 + limited privileges | Crashes (error 1217, no table name) | Works — no error parsing needed | |
| 276 | +| PostgreSQL | Savepoint overhead per attempt | No errors triggered | |
| 277 | +| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront | |
| 278 | +| Part integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) | |
| 279 | +| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" | |
| 280 | +| Reusability | Delete-only | Delete, drop, export, prune | |
| 281 | +| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing | |
| 282 | + |
204 | 283 | ## See Also |
205 | 284 |
|
206 | | -- [Use Isolated Instances](../how-to/use-instances.md/) — Task-oriented guide |
207 | | -- [Working with Instances](../tutorials/advanced/instances.ipynb/) — Step-by-step tutorial |
208 | | -- [Configuration Reference](../reference/configuration.md/) — Thread-safe mode settings |
209 | | -- [Configure Database](../how-to/configure-database.md/) — Connection setup |
| 285 | +- [Use Isolated Instances](../how-to/use-instances.md) — Task-oriented guide |
| 286 | +- [Working with Instances](../tutorials/advanced/instances.ipynb) — Step-by-step tutorial |
| 287 | +- [Configuration Reference](../reference/configuration.md) — Thread-safe mode settings |
| 288 | +- [Configure Database](../how-to/configure-database.md) — Connection setup |
| 289 | +- [Diagram Specification](../reference/specs/diagram.md) — Full reference for diagram operations |
| 290 | +- [Delete Data](../how-to/delete-data.md) — Task-oriented delete guide |
0 commit comments