Skip to content

Implement graph-driven cascade delete and restrict on Diagram#1407

Draft
dimitri-yatsenko wants to merge 19 commits intomasterfrom
design/restricted-diagram
Draft

Implement graph-driven cascade delete and restrict on Diagram#1407
dimitri-yatsenko wants to merge 19 commits intomasterfrom
design/restricted-diagram

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Feb 21, 2026

Summary

Replace the error-driven cascade in Table.delete() (~200 lines) with graph-driven restriction propagation on Diagram. Table.delete() and Table.drop() now delegate to Diagram.cascade().delete() and Diagram.drop() respectively.

Resolves: #865 (applying restrictions to a Diagram), #1110 (cascade delete fails on MySQL 8 with limited privileges)

New Diagram methods

  • cascade(table_expr) — OR at convergence, one-shot, for delete
  • restrict(table_expr) — AND at convergence, chainable, for export
  • delete() — execute cascade delete in reverse topo order
  • drop() — drop tables in reverse topo order
  • preview() — show affected tables and row counts without modifying data
  • prune() — remove tables with zero matching rows from the diagram
  • _from_table() — lightweight internal factory for Table.delete/Table.drop

Architecture changes

  • Single Diagram class: Removed the if diagram_active: conditional. Diagram(nx.DiGraph) is always defined. Only visualization methods (draw, make_dot, make_svg, etc.) are gated on diagram_active.
  • Table.delete() rewritten: ~200 lines → ~10 lines delegating to Diagram
  • Table.drop() rewritten: ~35 lines → ~10 lines delegating to Diagram
  • dry_run support: delete(dry_run=True) and drop(dry_run=True) return affected row counts without modifying data
  • part_integrity: Data-driven post-check — only raises when rows were actually deleted from a Part without its master also being deleted. Avoids false positives when a Part table appears in the cascade graph but has zero affected rows.

Restriction propagation rules

For edge Parent→Child with attr_map:

Condition Child restriction
Non-aliased AND parent_attrs ⊆ child.primary_key Copy parent restriction directly
Aliased FK (fk_attrs ≠ pk_attrs) parent.proj(**{fk: pk for fk, pk in attr_map.items()})
Non-aliased AND parent_attrs ⊄ child.primary_key parent.proj()

SQL generation fix

Restrictions are applied via restrict()make_condition() rather than direct assignment to _restriction. This ensures AndList and QueryExpression objects are properly converted to SQL, fixing invalid WHERE clauses that caused 30+ test failures.

Convergence semantics

  • Cascade (delete): OR convergence — a row is deleted if ANY of its FK references point to a deleted row. Implemented by passing the restriction list to restrict() which creates an OrList.
  • Restrict (export): AND convergence — a row is included only if ALL ancestor conditions are satisfied. Implemented by iterating restrictions and calling restrict() for each.

Advantages over previous implementation

Previous (error-driven) New (graph-driven)
MySQL 8 + limited privileges Crashes (#1110) Works — no error parsing needed
PostgreSQL Savepoint overhead per attempt No errors triggered
part_integrity Post-hoc check after delete Data-driven post-check (no false positives on empty tables)
Reusability Delete-only Delete, drop, export, backup
Inspectability Opaque recursive cascade preview() / dry_run shows affected data before executing

Files changed

File Change
src/datajoint/diagram.py _restrict_freetable() helper, cascade(), restrict(), delete(), drop(), preview(), prune() with proper SQL generation and OR/AND convergence
src/datajoint/table.py Table.delete() and Table.drop() delegate to Diagram, dry_run parameter added
src/datajoint/user_tables.py Part.drop() passes part_integrity and dry_run through
tests/integration/test_cascade_delete.py New dry_run tests for delete and drop
tests/integration/test_cascading_delete.py Fixture cleanup
tests/integration/test_cli.py Fix subprocess to use sys.executable -m datajoint.cli

Test plan

  • All 12 existing test_cascading_delete.py tests pass
  • All 10 test_cascade_delete.py tests pass (5 MySQL + 5 PostgreSQL, including dry_run)
  • All 12 test_erd.py tests pass (including 5 prune tests)
  • All 5 test_cli.py tests pass
  • Full suite: 792 passed, 10 skipped, 0 errors

See docs/design/restricted-diagram-spec.md for the full implementation spec.

🤖 Generated with Claude Code

dimitri-yatsenko and others added 5 commits February 21, 2026 13:56
Graph-driven cascade delete using restricted Diagram nodes,
replacing error-message parsing with dependency graph traversal.
Addresses MySQL 8 privilege issues and PostgreSQL overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Unrestricted nodes are not affected by operations
- Multiple restrict() calls create separate restriction sets
- Delete combines sets with OR (any taint → delete)
- Export combines sets with AND (all criteria → include)
- Within a set, multiple FK paths combine with OR (structural)
- Added open questions on lenient vs strict AND and same-table restrictions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Delete: one restriction, propagated downstream only, OR at convergence
- Export: downstream + upstream context, AND at convergence
- Removed over-engineered "multiple restriction sets" abstraction
- Clarified alias nodes (same parent, multiple FKs) vs convergence (different parents)
- Non-downstream tables: excluded for delete, included for export

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- cascade(): OR at convergence, downstream only — for delete
- restrict(): AND at convergence, includes upstream context — for export
- Both propagate downstream via attr_map, differ only at convergence
- Table.delete() internally constructs diagram.cascade()
- part_integrity is a parameter of cascade(), not delete()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Table.drop() rewritten as Diagram(table).drop()
- Shared infrastructure: reverse topo traversal, part_integrity pre-checks,
  unloaded-schema error handling, preview
- drop is DDL (no restrictions), delete is DML (with cascade restrictions)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimitri-yatsenko and others added 2 commits February 22, 2026 11:15
Replace the error-driven cascade in Table.delete() (~200 lines) with
graph-driven restriction propagation on Diagram. Table.delete() and
Table.drop() now delegate to Diagram.cascade().delete() and
Diagram.drop() respectively.

New Diagram methods:
- cascade(table_expr) — OR at convergence, one-shot, for delete
- restrict(table_expr) — AND at convergence, chainable, for export
- delete() — execute cascade delete in reverse topo order
- drop() — drop tables in reverse topo order
- preview() — show affected tables and row counts
- _from_table() — lightweight factory for Table.delete/drop

Restructure: single Diagram(nx.DiGraph) class always defined.
Only visualization methods gated on diagram_active.

Resolves #865, #1110.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts in diagram.py and table.py:
- Adopt master's config access pattern (self._connection._config)
- Keep graph-driven cascade/restrict implementation
- Apply master's declare() config param, split_full_table_name(), _config in store context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko changed the title Design: Restricted Diagrams for cascading operations Implement graph-driven cascade delete and restrict on Diagram Feb 22, 2026
- Add assert after conditional config import to narrow type for mypy
  (filepath.py, attach.py)
- Add Any type annotation to untyped config parameters (hash_registry.py)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimitri-yatsenko and others added 10 commits February 23, 2026 13:54
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cascade restrictions stored as plain lists (for OR semantics) were
being directly assigned to ft._restriction, causing list objects to
be stringified as Python repr ("[' condition ']") in SQL WHERE clauses.

Use restrict_in_place() which properly handles lists as OR conditions
through the standard restrict() path. Also fix version string to be
PEP 440 compliant.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The delete() pre-check for part_integrity="enforce" was hardcoded and
did not respect the part_integrity parameter passed to cascade(). Also,
explicitly deleting from a part table (e.g. Website().delete()) would
always fail because the cascade seed is the part itself and its master
is never in the cascade graph.

Fix: store _part_integrity and _cascade_seed during cascade(), only run
the enforce check when part_integrity="enforce", and skip the seed node
since it was explicitly targeted by the caller.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pre-check on the cascade graph was too conservative — it flagged
part tables that appeared in the graph but had zero rows to delete.
The old code checked actual deletions within a transaction.

Replace the graph-based pre-check with a post-hoc check on
deleted_tables (tables that actually had rows deleted). If a part
table had rows deleted without its master also having rows deleted,
roll back the transaction and raise DataJointError. This matches
the original part_integrity="enforce" semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion_attributes

FreeTable._restriction_attributes is None by default. The property
accessor initializes it to set() on first access. The make_condition
call in part_integrity="cascade" upward propagation was using the
private attribute directly, causing AttributeError when columns=None.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds prune() method that removes tables with zero matching rows from
the diagram. Without prior restrictions, removes physically empty
tables. With restrictions (cascade or restrict), removes tables where
the restricted query yields zero rows. Returns a new Diagram.

Includes 5 integration tests: unrestricted prune, prune after restrict,
prune after cascade, idempotency, and prune-then-restrict chaining.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add prune() method to both spec and design docs
- Rename _propagate_to_children → _propagate_restrictions + _apply_propagation_rule
- Fix delete() part_integrity: post-check with rollback, not pre-check
- Add _part_integrity instance attribute
- Update files affected, verification, and implementation phases
- Mark open questions as resolved with actual decisions
- Mark export/restore as future work

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove process artifacts (implementation phases, verification checklists,
resolved decisions, files-changed tables). Both documents now describe
the current system as-is, ready for migration into datajoint-docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimitri-yatsenko dimitri-yatsenko marked this pull request as draft March 2, 2026 20:49
…ost-check part integrity

Replace direct `_restriction` assignment with `restrict()` calls in Diagram
so that AndList and QueryExpression objects are converted to valid SQL via
`make_condition()`. Cascade delete uses OR convergence (a row is deleted if
ANY FK reference points to a deleted row), while restrict/export uses AND.

Part integrity enforcement uses a data-driven post-check: only raises when
rows were actually deleted from a Part without its master also being deleted.
This avoids false positives when a Part table appears in the cascade graph
but has zero affected rows.

Also adds dry_run support to delete()/drop(), prune() method, fixes CLI test
subprocess invocation, and updates test fixtures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant