Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
5be0a01
docs: design for restricted diagrams (#865, #1110)
dimitri-yatsenko Feb 21, 2026
3adc661
docs: add restriction semantics for non-downstream nodes and OR/AND
dimitri-yatsenko Feb 21, 2026
db0ab4e
docs: clarify convergence semantics and delete vs export scope
dimitri-yatsenko Feb 21, 2026
315bef8
docs: two distinct operators — cascade (OR) and restrict (AND)
dimitri-yatsenko Feb 21, 2026
6049fc3
docs: unify drop under diagram, shared traversal infrastructure
dimitri-yatsenko Feb 21, 2026
d63c130
feat: implement graph-driven cascade delete and restrict on Diagram
dimitri-yatsenko Feb 22, 2026
9fe2df7
Merge upstream/master into design/restricted-diagram
dimitri-yatsenko Feb 22, 2026
ae0eddd
fix: resolve mypy errors in codecs and hash_registry
dimitri-yatsenko Feb 22, 2026
3c028d1
ci: trigger fresh CI run
dimitri-yatsenko Feb 23, 2026
8cdf42d
feat: bump version to 2.2.0dev0
dimitri-yatsenko Feb 23, 2026
f4742be
fix: use restrict_in_place for cascade restrictions in Diagram
dimitri-yatsenko Feb 23, 2026
a2d2693
fix: store part_integrity and cascade_seed on Diagram instance
dimitri-yatsenko Feb 24, 2026
d2626e0
fix: use post-hoc enforce check matching old Table.delete() behavior
dimitri-yatsenko Feb 24, 2026
b88ede7
fix: use restriction_attributes property instead of private _restrict…
dimitri-yatsenko Feb 24, 2026
0bede1d
feat: implement Diagram.prune() to remove empty tables
dimitri-yatsenko Feb 25, 2026
0ae8c80
docs: update design docs to reflect actual implementation
dimitri-yatsenko Mar 2, 2026
934a6fc
docs: rewrite design docs as authoritative specs
dimitri-yatsenko Mar 2, 2026
b8fd688
docs: make bare issue references clickable links
dimitri-yatsenko Mar 2, 2026
204745a
fix: cascade delete with proper SQL generation, OR convergence, and p…
dimitri-yatsenko Mar 7, 2026
3a2fc59
docs: consolidate restricted diagram design documents
dimitri-yatsenko Mar 7, 2026
91bf61b
refactor: replace _restrict_freetable with _restricted_table on Diagram
dimitri-yatsenko Mar 7, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
217 changes: 217 additions & 0 deletions docs/design/restricted-diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
# Restricted Diagrams

**Issues:** [#865](https://github.com/datajoint/datajoint-python/issues/865), [#1110](https://github.com/datajoint/datajoint-python/issues/1110)

## Motivation

### Error-driven cascade is fragile

The original cascade delete worked by trial-and-error: attempt `DELETE` on the parent, catch the FK integrity error, parse the MySQL error message to discover which child table is blocking, then recursively delete from that child first.

This approach has several problems:

- **MySQL 8 with limited privileges:** Returns error 1217 (`ROW_IS_REFERENCED`) instead of 1451 (`ROW_IS_REFERENCED_2`), which provides no table name. The cascade crashes ([#1110](https://github.com/datajoint/datajoint-python/issues/1110)).
- **PostgreSQL overhead:** PostgreSQL aborts the entire transaction on any error. Each failed delete attempt requires `SAVEPOINT` / `ROLLBACK TO SAVEPOINT` round-trips.
- **Fragile parsing:** Different MySQL versions and privilege levels produce different error message formats.

### Graph-driven approach

`drop()` already uses graph-driven traversal — walking the dependency graph in reverse topological order, dropping leaves first. The same pattern applies to cascade delete, with the addition of **restriction propagation** through FK attribute mappings.

### Data subsetting

`dj.Diagram` provides set operators for specifying subsets of *tables*. Per-node restrictions complete the functionality for specifying cross-sections of *data* — enabling delete, export, backup, and sharing.

## Architecture

Single `class Diagram(nx.DiGraph)` with all operational methods always available. Only visualization methods (`draw`, `make_dot`, `make_svg`, `make_png`, `make_image`, `make_mermaid`, `save`, `_repr_svg_`) are gated on `diagram_active`.

`Dependencies` is the canonical store of the FK graph. `Diagram` copies from it and constructs derived views.

### Instance attributes

```python
self._connection # Connection
self._cascade_restrictions # dict[str, list] — per-node OR restrictions (cascade mode)
self._restrict_conditions # dict[str, AndList] — per-node AND restrictions (restrict mode)
self._restriction_attrs # dict[str, set] — restriction attribute names per node
self._part_integrity # str — "enforce", "ignore", or "cascade" (set by cascade())
```

### Restriction modes

A diagram operates in one of three states: **unrestricted** (initial), **cascade**, or **restrict**. The modes are mutually exclusive. `cascade` is applied once; `restrict` can be chained.

```python
# cascade: applied once, OR at convergence, for delete
rd = dj.Diagram(schema).cascade(Session & 'subject_id=1')

# restrict: chainable, AND at convergence, for export
rd = dj.Diagram(schema).restrict(Session & cond).restrict(Stimulus & cond2)

# Mixing raises DataJointError
```

## Restriction Propagation

A restriction applied to one table node propagates downstream through FK edges in topological order. Each downstream node accumulates a restriction derived from its restricted parent(s).

### Propagation rules

For edge `Parent → Child` with `attr_map`:

| Condition | Child restriction |
|-----------|-------------------|
| Non-aliased AND `parent_attrs ⊆ child.primary_key` | Copy parent restriction directly |
| Aliased FK (`attr_map` renames columns) | `parent_ft.proj(**{fk: pk for fk, pk in attr_map.items()})` |
| Non-aliased AND `parent_attrs ⊄ child.primary_key` | `parent_ft.proj()` |

Restrictions are applied via `restrict()` → `make_condition()`, ensuring `AndList` and `QueryExpression` objects are properly converted to SQL. Direct assignment to `_restriction` is never used, as `where_clause()` would produce invalid SQL from `str(AndList)` or `str(QueryExpression)`.

### Converging paths

A child node may have multiple restricted ancestors. The combination rule depends on the operator:

```
Session ──→ Recording ←── Stimulus
↓ ↓
subject=1 type="visual"
```

`Recording` receives two propagated restrictions: R1 from Session, R2 from Stimulus.

**`cascade` — OR (union):** A recording is deleted if tainted by *any* restricted parent. Correct for referential integrity: if the parent row is being deleted, all child rows referencing it must go. Implemented by passing the full restriction list to `restrict()`, which creates an OrList.

**`restrict` — AND (intersection):** A recording is included only if it satisfies *all* restricted ancestors. Correct for subsetting: only rows matching every condition are selected. Implemented by iterating restrictions and calling `restrict()` for each.

| DataJoint type | Python type | SQL meaning |
|----------------|-------------|-------------|
| OR-combined restrictions | `list` | `WHERE (r1) OR (r2) OR ...` |
| AND-combined restrictions | `AndList` | `WHERE (r1) AND (r2) AND ...` |
| No restriction | empty `list` or `AndList()` | No WHERE clause (all rows) |

### Multiple FK paths from same parent

A child may reference the same parent through multiple FKs (e.g., `source_mouse` and `target_mouse` both referencing `Mouse`). These are represented as alias nodes in the dependency graph. Multiple FK paths from the same restricted parent always combine with **OR** — structural, not operation-dependent.

### `part_integrity`

| Mode | Behavior |
|------|----------|
| `"enforce"` | Data-driven post-check: raises only when rows were actually deleted from a Part without its master also being deleted. Avoids false positives when a Part appears in the cascade but has zero affected rows. |
| `"ignore"` | Allow deleting parts without masters |
| `"cascade"` | Propagate restriction upward from part to master, then re-propagate downstream |

### Unloaded schemas

If a child table lives in a schema not loaded into the dependency graph, the graph-driven delete won't know about it. The final parent `delete_quick()` fails with an FK error. Error-message parsing is retained as a **diagnostic fallback** to produce an actionable error: "activate schema X."

## Methods

### `cascade(self, table_expr, part_integrity="enforce") -> Diagram`

Apply cascade restriction and propagate downstream. Returns a new `Diagram`. One-shot — cannot be called twice or mixed with `restrict()`.

1. Verify no existing cascade or restrict restrictions
2. Copy diagram, seed `_cascade_restrictions[root]` with `list(table_expr.restriction)`
3. Propagate via `_propagate_restrictions(root, mode="cascade", part_integrity=part_integrity)`

### `restrict(self, table_expr) -> Diagram`

Apply restrict condition and propagate downstream. Returns a new `Diagram`. Chainable — can be called multiple times. Cannot be mixed with `cascade()`.

1. Verify no existing cascade restrictions
2. Copy diagram, seed/extend `_restrict_conditions[root]` with `table_expr.restriction`
3. Propagate via `_propagate_restrictions(root, mode="restrict")`

### `delete(self, transaction=True, prompt=None, dry_run=False) -> int | dict`

Execute cascading delete. Requires `cascade()` first.

1. If `dry_run`: return `preview()` without modifying data
2. Get non-alias nodes with restrictions in topological order
3. If `prompt`: show preview (table name + row count for each)
4. Start transaction
5. Delete in **reverse** topological order (leaves first) via `_restrict_freetable()` + `delete_quick()`
6. On `IntegrityError`: cancel transaction, parse FK error for actionable message about unloaded schemas
7. Post-check `part_integrity="enforce"`: if any part table had rows deleted but its master did not, cancel transaction and raise
8. Confirm/commit, return count from the root table

### `drop(self, prompt=None, part_integrity="enforce", dry_run=False)`

Drop all tables in `nodes_to_show` in reverse topological order. Pre-checks `part_integrity` structurally (tables, not rows). If `dry_run`, returns row counts without dropping.

### `preview(self) -> dict[str, int]`

Return `{full_table_name: row_count}` for each node with a restriction. Requires `cascade()` or `restrict()` first. Uses `_restrict_freetable()` to apply restrictions with correct OR/AND semantics.

### `prune(self) -> Diagram`

Remove tables with zero matching rows. With restrictions, removes nodes where the restricted query yields zero rows. Without restrictions, removes physically empty tables. Idempotent and chainable.

### `_restrict_freetable(ft, restrictions, mode="cascade") -> FreeTable`

Static helper. Applies restrictions to a `FreeTable` using `restrict()` for proper SQL generation.

- **cascade mode:** Passes the entire restriction list to `restrict()`, creating an OrList (OR semantics).
- **restrict mode:** Iterates restrictions, calling `restrict()` for each (AND semantics).

### `_from_table(cls, table_expr) -> Diagram`

Classmethod factory for `Table.delete()` and `Table.drop()`. Creates a Diagram containing `table_expr` and all its descendants.

## `Table` Integration

```python
def delete(self, transaction=True, prompt=None, part_integrity="enforce", dry_run=False):
diagram = Diagram._from_table(self)
diagram = diagram.cascade(self, part_integrity=part_integrity)
return diagram.delete(transaction=transaction, prompt=prompt, dry_run=dry_run)

def drop(self, prompt=None, part_integrity="enforce", dry_run=False):
if self.restriction:
raise DataJointError("A restricted Table cannot be dropped.")
diagram = Diagram._from_table(self)
diagram.drop(prompt=prompt, part_integrity=part_integrity, dry_run=dry_run)
```

## API Examples

```python
# cascade: OR propagation for delete
rd = dj.Diagram(schema).cascade(Session & 'subject_id=1')
rd.preview() # show affected tables and row counts
rd.delete() # downstream only, OR at convergence

# restrict: AND propagation for data subsetting
rd = (dj.Diagram(schema)
.restrict(Session & 'subject_id=1')
.restrict(Stimulus & 'type="visual"'))
rd.preview() # show selected tables and row counts

# prune: remove tables with zero matching rows
rd = (dj.Diagram(schema)
.restrict(Subject & {'species': 'mouse'})
.restrict(Session & 'session_date > "2024-01-01"')
.prune())
rd.preview() # only tables with matching rows

# dry_run: preview without executing
counts = (Session & 'subject_id=1').delete(dry_run=True)
# returns {full_table_name: affected_row_count}

# Table.delete() delegates to Diagram internally
(Session & 'subject_id=1').delete()
```

## Advantages

| | Error-driven | Graph-driven |
|---|---|---|
| MySQL 8 + limited privileges | Crashes ([#1110](https://github.com/datajoint/datajoint-python/issues/1110)) | Works — no error parsing needed |
| PostgreSQL | Savepoint overhead per attempt | No errors triggered |
| Multiple FKs to same child | One-at-a-time via retry loop | All paths resolved upfront |
| part_integrity enforcement | Post-hoc check after delete | Data-driven post-check (no false positives) |
| Unloaded schemas | Crash with opaque error | Clear error: "activate schema X" |
| Reusability | Delete-only | Delete, drop, export, prune |
| Inspectability | Opaque recursive cascade | `preview()` / `dry_run` before executing |
Loading
Loading