docs: document restricted diagram operations (new in 2.2)#155
docs: document restricted diagram operations (new in 2.2)#155dimitri-yatsenko wants to merge 8 commits intomainfrom
Conversation
- Add Operational Methods section to diagram.md spec: cascade(), restrict(), delete(), drop(), preview(), prune(), restriction propagation rules, OR-vs-AND convergence - Add Graph-Driven Diagram Operations section to whats-new-22.md: motivation, preview-then-execute pattern, two propagation modes, pruning empty tables - Add Diagram-Level Delete section to delete-data.md: build-preview-execute workflow, when to use - Add prune() to read-diagrams how-to - Add version admonition in data-manipulation.md noting graph-driven cascade internals - Cross-references between all files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
21a36c6 to
443090c
Compare
- Add dry_run parameter to delete() and drop() signatures in diagram.md - Fix trailing slashes in cross-reference paths across 3 files - Convert inline version markers to proper admonitions in read-diagrams.ipynb - Normalize table-cell version markers to consistent *(New in X.Y)* format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…stors Both cascade() and restrict() propagate downstream only from the seed table. Ancestors of the seed are excluded. Document this in the diagram spec (cascade and restrict method descriptions) and the whats-new explanation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ancestors remain in the diagram but receive no restrictions and are
unaffected by delete/preview. Previous wording ("excluded") was
imprecise — they're not removed from the graph, just not operated on.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update diagram.md and whats-new-22.md to reflect that cascade() returns a trimmed Diagram containing only seed + descendants, while restrict() keeps the full graph intact for chaining. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Lead each description with its purpose rather than using parallel structure. cascade() prepares a delete (one-shot, trims graph, OR). restrict() selects a data subset (chainable, preserves graph, AND). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Describe how cascade propagates restrictions upward from part to master, then back downstream to all sibling parts, deleting the entire compositional unit. Updated in both diagram.md and master-part.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
src/explanation/whats-new-22.md
Outdated
| # What's New in DataJoint 2.2 | ||
|
|
||
| DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing. | ||
| DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph. |
There was a problem hiding this comment.
Restructure this sentence. Thread-safe mode is for applications that handle multiple connections but graph driven operations are for general use, even for single-threaded operations.
There was a problem hiding this comment.
Fixed. Restructured to: "introduces isolated instances and thread-safe mode for applications that need multiple independent database connections, and graph-driven diagram operations that replace the legacy error-driven cascade with a reliable, inspectable approach for all users."
|
|
||
| The diagram supports two restriction propagation modes designed for fundamentally different tasks. | ||
|
|
||
| **`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`. |
There was a problem hiding this comment.
Here include a description of how cascade behaves when it encounters a part table whose master is not yet included in the cascade.
There was a problem hiding this comment.
Added a paragraph after cascade() describing part table behavior: with "enforce" (default), delete() raises an error if part rows would be deleted without their master; with "cascade", the restriction propagates upward from part to master, then back downstream to all sibling parts — deleting the entire compositional unit.
|
|
||
| **`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`. | ||
|
|
||
| **`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result. |
There was a problem hiding this comment.
Also, describe how part tables are restricted when reached through a path that did not include its master.
There was a problem hiding this comment.
Covered in the same paragraph — describes the upward propagation from part to master when the part is reached through a path that didn't include its master.
src/explanation/whats-new-22.md
Outdated
|
|
||
| **`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result. | ||
|
|
||
| The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics — a delete diagram should never be reused for subsetting, and vice versa. |
There was a problem hiding this comment.
Helpfully, datajoint will raise an error when user attempts to mix cascade and restrict or when he attempts to cascade more than once.
There was a problem hiding this comment.
Added: "DataJoint raises an error if you attempt to mix cascade() and restrict(), or if you call cascade() more than once."
src/how-to/delete-data.md
Outdated
| !!! version-added "New in 2.2" | ||
| Diagram-level delete was added in DataJoint 2.2. | ||
|
|
||
| For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing. |
There was a problem hiding this comment.
can the same thing be accomplished with table.delete(dry_run=True)?
There was a problem hiding this comment.
Yes — table.delete(dry_run=True) returns the same affected row counts without deleting anything. Added a note at the top of the section mentioning this simpler alternative before introducing the diagram-level API.
| | `"ignore"` | Allow deleting parts without masters | | ||
| | `"cascade"` | Propagate restriction upward from part to master, then re-propagate downstream to all sibling parts | | ||
|
|
||
| With `"cascade"`, the restriction flows **upward** from a part table to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction propagates back downstream through the normal cascade — deleting the entire compositional unit (master + all parts), not just the originally matched part rows. |
There was a problem hiding this comment.
If I understand this correctly, this may result in a restriction on the part table with a subquery in its where clause that references the same part table. This may produce an error in MySQL when attempting to delete from this table. Do we have a test that covers this part-to-master propagation in delete?
There was a problem hiding this comment.
Yes, there are tests covering this. In test_cascading_delete.py:
test_delete_parts()— deletes a Part table (Website) withpart_integrity="cascade", triggering upward propagation to its master (Profile)test_delete_parts_complex()— cascades fromAwithpart_integrity="cascade"through complex master/part relationships, verifying correct row countstest_delete_parts_error()— verifies thatpart_integrity="enforce"raisesDataJointErrorwhen parts would be deleted without their master
On the self-referencing subquery concern: the implementation uses (master_ft.proj() & child_ft.proj()).to_arrays() to materialize the restriction into a concrete tuple list before applying it to the master. This avoids a live subquery referencing the same table. The propagated_edges and visited_masters sets prevent the restriction from cycling back to the part table on subsequent passes. That said, this is worth monitoring — if the tuple list is very large, it could hit MySQL packet limits. We could add a targeted test for that edge case.
| diag.restrict(table_expr) | ||
| ``` | ||
|
|
||
| Select a subset of data for export or inspection. Starting from a restricted table expression, propagate the restriction downstream through all descendants using **AND** semantics — a descendant row is included only if *all* restricted ancestors match. The full diagram is preserved (ancestors, unrelated tables) so that `restrict()` can be called again from a different seed table, building up a multi-condition subset incrementally. |
There was a problem hiding this comment.
How does the restriction of parts restrict their masters when not reached through the master? Should part integrity be accounted in diagram restriction as well?
There was a problem hiding this comment.
In cascade() mode: when a part table is reached through a path that did not include its master, the part_integrity setting controls the behavior:
"enforce"(default): the cascade proceeds, butdelete()runs a post-check — if part rows were deleted without the master also being deleted, it rolls back and raises an error."cascade": the restriction propagates upward from the part to its master (usingto_arrays()to materialize the join), then back downstream to all sibling parts."ignore": the part is deleted without checking the master.
For restrict() mode: part_integrity is not currently applied. restrict() is designed for subsetting/export, not delete, so the master-part integrity constraint is less relevant — you are selecting data, not removing it. If we wanted to support "include the master whenever any part matches" in restrict mode, that would be a feature addition. Worth discussing but probably not needed for the initial release.
src/reference/specs/diagram.md
Outdated
| ```python | ||
| diag.delete(transaction=True, prompt=None, dry_run=False) | ||
| ``` |
There was a problem hiding this comment.
Should delete be removed or hidden from diagram and only executed from table? Generally, delete should not be called unless cascade has been applied? Or does this also allow for unrestricted delete from all tables in the diagram? I guess this could go either way. What is the advantage of keeping the delete in dj.Diagram if it's always called with a single seed table?
There was a problem hiding this comment.
Diagram.delete() requires cascade() first — it raises DataJointError("No cascade restrictions applied. Call cascade() first.") if called on an unrestricted diagram. So it cannot do an unrestricted delete of all tables.
The advantage of keeping delete() on Diagram rather than only on Table:
- Multi-schema cascades:
Table.delete()constructs a diagram from_from_table(self), which only includes the table and its descendants. If you need to cascade across schemas or from a specific subgraph, the diagram API lets you construct the graph first withdj.Diagram(schema1) + dj.Diagram(schema2), thencascade(). - Preview workflow: The diagram API enables
cascade() → preview() → delete()as a three-step pattern.Table.delete(dry_run=True)provides the same counts but the diagram also supports visualization of the cascade subgraph. - Future operations:
restrict() → prune() → export()— the diagram is the foundation for data subsetting, which does not start from a single table.
That said, for the common case of "delete rows from one table", Table.delete() is simpler and delegates to Diagram internally. The diagram-level API is the power-user interface.
There was a problem hiding this comment.
Done. delete() and drop() have been removed from Diagram's public API. All mutation logic (transaction management, SQL execution, prompts) now lives in Table.delete() and Table.drop(). Diagram remains purely a graph computation and inspection tool: cascade(), restrict(), preview(), prune().
The design docs (docs/design/restricted-diagram.md and docs/design/thread-safe-mode.md) have also been removed from datajoint-python — their content is captured in the datajoint-docs specs.
src/reference/specs/diagram.md
Outdated
| diag.prune() | ||
| ``` | ||
|
|
||
| Remove tables with zero matching rows from the diagram. Without prior restrictions, removes physically empty tables. With restrictions (`cascade()` or `restrict()`), removes tables where the restricted query yields zero rows. |
There was a problem hiding this comment.
We may want to highlight that removing from the diagram only does that -- remove from the diagram view, not affecting the tables themselves.
There was a problem hiding this comment.
Good call. Updated the prune() description to: "Remove tables with zero matching rows from the diagram view. This only affects the diagram object — no tables or data are modified in the database."
Diagram is now an inspection-only tool. delete() and drop() have been moved to Table. Updated diagram spec, whats-new-22, and delete-data how-to to reflect this change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
diagram.mdspec:cascade(),restrict(),delete(),drop(),preview(), restriction propagation rules, OR-vs-AND convergencewhats-new-22.md: motivation, preview-then-execute pattern, two propagation modes, architecture notedelete-data.md: build-preview-execute workflow, when to usedata-manipulation.mdnoting graph-driven cascade internalsTest plan
mkdocs buildsucceeds with no new warnings🤖 Generated with Claude Code