Skip to content

docs: document restricted diagram operations (new in 2.2)#155

Open
dimitri-yatsenko wants to merge 8 commits intomainfrom
docs/v2.2-restricted-diagram
Open

docs: document restricted diagram operations (new in 2.2)#155
dimitri-yatsenko wants to merge 8 commits intomainfrom
docs/v2.2-restricted-diagram

Conversation

@dimitri-yatsenko
Copy link
Member

Summary

  • Add Operational Methods section to diagram.md spec: cascade(), restrict(), delete(), drop(), preview(), restriction propagation rules, OR-vs-AND convergence
  • Add Graph-Driven Diagram Operations section to whats-new-22.md: motivation, preview-then-execute pattern, two propagation modes, architecture note
  • Add Diagram-Level Delete section to delete-data.md: build-preview-execute workflow, when to use
  • Add version admonition in data-manipulation.md noting graph-driven cascade internals
  • Cross-references between all four files

Test plan

  • mkdocs build succeeds with no new warnings
  • Review rendered pages for diagram.md, whats-new-22.md, delete-data.md, data-manipulation.md
  • Verify cross-reference links resolve correctly

🤖 Generated with Claude Code

@dimitri-yatsenko dimitri-yatsenko changed the base branch from docs/v2.2-thread-safe to main March 6, 2026 20:56
- Add Operational Methods section to diagram.md spec: cascade(), restrict(),
  delete(), drop(), preview(), prune(), restriction propagation rules,
  OR-vs-AND convergence
- Add Graph-Driven Diagram Operations section to whats-new-22.md: motivation,
  preview-then-execute pattern, two propagation modes, pruning empty tables
- Add Diagram-Level Delete section to delete-data.md: build-preview-execute
  workflow, when to use
- Add prune() to read-diagrams how-to
- Add version admonition in data-manipulation.md noting graph-driven cascade
  internals
- Cross-references between all files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko force-pushed the docs/v2.2-restricted-diagram branch from 21a36c6 to 443090c Compare March 6, 2026 21:28
dimitri-yatsenko and others added 3 commits March 9, 2026 08:55
- Add dry_run parameter to delete() and drop() signatures in diagram.md
- Fix trailing slashes in cross-reference paths across 3 files
- Convert inline version markers to proper admonitions in read-diagrams.ipynb
- Normalize table-cell version markers to consistent *(New in X.Y)* format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…stors

Both cascade() and restrict() propagate downstream only from the seed
table. Ancestors of the seed are excluded. Document this in the diagram
spec (cascade and restrict method descriptions) and the whats-new
explanation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ancestors remain in the diagram but receive no restrictions and are
unaffected by delete/preview. Previous wording ("excluded") was
imprecise — they're not removed from the graph, just not operated on.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ttngu207
ttngu207 previously approved these changes Mar 9, 2026
Update diagram.md and whats-new-22.md to reflect that cascade()
returns a trimmed Diagram containing only seed + descendants,
while restrict() keeps the full graph intact for chaining.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimitri-yatsenko and others added 2 commits March 9, 2026 16:07
Lead each description with its purpose rather than using parallel
structure. cascade() prepares a delete (one-shot, trims graph, OR).
restrict() selects a data subset (chainable, preserves graph, AND).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Describe how cascade propagates restrictions upward from part to
master, then back downstream to all sibling parts, deleting the
entire compositional unit. Updated in both diagram.md and
master-part.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# What's New in DataJoint 2.2

DataJoint 2.2 introduces **isolated instances** and **thread-safe mode** for applications that need multiple independent database connections—web servers, multi-tenant notebooks, parallel pipelines, and testing.
DataJoint 2.2 introduces **isolated instances**, **thread-safe mode**, and **graph-driven diagram operations** for applications that need multiple independent database connections, explicit cascade control, and operational use of the dependency graph.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restructure this sentence. Thread-safe mode is for applications that handle multiple connections but graph driven operations are for general use, even for single-threaded operations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Restructured to: "introduces isolated instances and thread-safe mode for applications that need multiple independent database connections, and graph-driven diagram operations that replace the legacy error-driven cascade with a reliable, inspectable approach for all users."


The diagram supports two restriction propagation modes designed for fundamentally different tasks.

**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here include a description of how cascade behaves when it encounters a part table whose master is not yet included in the cascade.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a paragraph after cascade() describing part table behavior: with "enforce" (default), delete() raises an error if part rows would be deleted without their master; with "cascade", the restriction propagates upward from part to master, then back downstream to all sibling parts — deleting the entire compositional unit.


**`cascade()` prepares a delete.** It takes a single restricted table expression, propagates the restriction downstream through all descendants, and **trims the diagram** to the resulting subgraph — ancestors and unrelated tables are removed entirely. Convergence uses OR: a descendant row is marked for deletion if *any* ancestor path reaches it, because if any reason exists to remove a row, it should be removed. `cascade()` is one-shot and is always followed by `preview()` or `delete()`.

**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, describe how part tables are restricted when reached through a path that did not include its master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in the same paragraph — describes the upward propagation from part to master when the part is reached through a path that didn't include its master.


**`restrict()` selects a data subset.** It propagates a restriction downstream but **preserves the full diagram**, allowing `restrict()` to be called again from a different seed table. This makes it possible to build up multi-condition subsets incrementally — for example, restricting by species from one table and by date from another. Convergence uses AND: a descendant row is included only if *all* restricted ancestors match, because an export should contain only rows satisfying every condition. After chaining restrictions, use `prune()` to remove empty tables and `preview()` to inspect the result.

The two modes are mutually exclusive on the same diagram. This prevents accidental mixing of incompatible semantics — a delete diagram should never be reused for subsetting, and vice versa.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpfully, datajoint will raise an error when user attempts to mix cascade and restrict or when he attempts to cascade more than once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added: "DataJoint raises an error if you attempt to mix cascade() and restrict(), or if you call cascade() more than once."

!!! version-added "New in 2.2"
Diagram-level delete was added in DataJoint 2.2.

For complex scenarios — previewing the blast radius, working across schemas, or understanding the dependency graph before deleting — use `dj.Diagram` to build and inspect the cascade before executing.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the same thing be accomplished with table.delete(dry_run=True)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — table.delete(dry_run=True) returns the same affected row counts without deleting anything. Added a note at the top of the section mentioning this simpler alternative before introducing the diagram-level API.

| `"ignore"` | Allow deleting parts without masters |
| `"cascade"` | Propagate restriction upward from part to master, then re-propagate downstream to all sibling parts |

With `"cascade"`, the restriction flows **upward** from a part table to its master: the restricted part rows identify which master rows are affected, those masters receive a restriction, and that restriction propagates back downstream through the normal cascade — deleting the entire compositional unit (master + all parts), not just the originally matched part rows.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, this may result in a restriction on the part table with a subquery in its where clause that references the same part table. This may produce an error in MySQL when attempting to delete from this table. Do we have a test that covers this part-to-master propagation in delete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are tests covering this. In test_cascading_delete.py:

  • test_delete_parts() — deletes a Part table (Website) with part_integrity="cascade", triggering upward propagation to its master (Profile)
  • test_delete_parts_complex() — cascades from A with part_integrity="cascade" through complex master/part relationships, verifying correct row counts
  • test_delete_parts_error() — verifies that part_integrity="enforce" raises DataJointError when parts would be deleted without their master

On the self-referencing subquery concern: the implementation uses (master_ft.proj() & child_ft.proj()).to_arrays() to materialize the restriction into a concrete tuple list before applying it to the master. This avoids a live subquery referencing the same table. The propagated_edges and visited_masters sets prevent the restriction from cycling back to the part table on subsequent passes. That said, this is worth monitoring — if the tuple list is very large, it could hit MySQL packet limits. We could add a targeted test for that edge case.

diag.restrict(table_expr)
```

Select a subset of data for export or inspection. Starting from a restricted table expression, propagate the restriction downstream through all descendants using **AND** semantics — a descendant row is included only if *all* restricted ancestors match. The full diagram is preserved (ancestors, unrelated tables) so that `restrict()` can be called again from a different seed table, building up a multi-condition subset incrementally.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the restriction of parts restrict their masters when not reached through the master? Should part integrity be accounted in diagram restriction as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cascade() mode: when a part table is reached through a path that did not include its master, the part_integrity setting controls the behavior:

  • "enforce" (default): the cascade proceeds, but delete() runs a post-check — if part rows were deleted without the master also being deleted, it rolls back and raises an error.
  • "cascade": the restriction propagates upward from the part to its master (using to_arrays() to materialize the join), then back downstream to all sibling parts.
  • "ignore": the part is deleted without checking the master.

For restrict() mode: part_integrity is not currently applied. restrict() is designed for subsetting/export, not delete, so the master-part integrity constraint is less relevant — you are selecting data, not removing it. If we wanted to support "include the master whenever any part matches" in restrict mode, that would be a feature addition. Worth discussing but probably not needed for the initial release.

Comment on lines +194 to +196
```python
diag.delete(transaction=True, prompt=None, dry_run=False)
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should delete be removed or hidden from diagram and only executed from table? Generally, delete should not be called unless cascade has been applied? Or does this also allow for unrestricted delete from all tables in the diagram? I guess this could go either way. What is the advantage of keeping the delete in dj.Diagram if it's always called with a single seed table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diagram.delete() requires cascade() first — it raises DataJointError("No cascade restrictions applied. Call cascade() first.") if called on an unrestricted diagram. So it cannot do an unrestricted delete of all tables.

The advantage of keeping delete() on Diagram rather than only on Table:

  1. Multi-schema cascades: Table.delete() constructs a diagram from _from_table(self), which only includes the table and its descendants. If you need to cascade across schemas or from a specific subgraph, the diagram API lets you construct the graph first with dj.Diagram(schema1) + dj.Diagram(schema2), then cascade().
  2. Preview workflow: The diagram API enables cascade() → preview() → delete() as a three-step pattern. Table.delete(dry_run=True) provides the same counts but the diagram also supports visualization of the cascade subgraph.
  3. Future operations: restrict() → prune() → export() — the diagram is the foundation for data subsetting, which does not start from a single table.

That said, for the common case of "delete rows from one table", Table.delete() is simpler and delegates to Diagram internally. The diagram-level API is the power-user interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. delete() and drop() have been removed from Diagram's public API. All mutation logic (transaction management, SQL execution, prompts) now lives in Table.delete() and Table.drop(). Diagram remains purely a graph computation and inspection tool: cascade(), restrict(), preview(), prune().

The design docs (docs/design/restricted-diagram.md and docs/design/thread-safe-mode.md) have also been removed from datajoint-python — their content is captured in the datajoint-docs specs.

diag.prune()
```

Remove tables with zero matching rows from the diagram. Without prior restrictions, removes physically empty tables. With restrictions (`cascade()` or `restrict()`), removes tables where the restricted query yields zero rows.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to highlight that removing from the diagram only does that -- remove from the diagram view, not affecting the tables themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Updated the prune() description to: "Remove tables with zero matching rows from the diagram view. This only affects the diagram object — no tables or data are modified in the database."

Diagram is now an inspection-only tool. delete() and drop() have been
moved to Table. Updated diagram spec, whats-new-22, and delete-data
how-to to reflect this change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants