Skip to content

Provenance-first schema evolution: restructure dependencies and recompute safely #1482

Description

@dimitri-yatsenko

Summary

Roadmap capability from the strict-provenance adoption analysis (#1474 discussion, @ttngu207): provenance-first schema evolution — tooling to change a pipeline's dependency structure and recompute safely, instead of drop-and-repopulate.

Why

Strict provenance surfaces undeclared dependencies (e.g. "sideways" reads to a cousin table). The remediation is always restructure the schema — declare the dependency, move a table, split a fat ingestion make(). Today there is no tooling for that: users fall back on dropping and repopulating, which destroys history. This is the primary adoption blocker for strict_provenance in existing pipelines: the alarm shipped in 2.3; this is the fire exit.

Shape

Built from the primitives 2.3 already ships:

  • Diagram.trace answers what is upstream of the change (what must be preserved/re-read).
  • Diagram.cascade answers what the change invalidates (what must be recomputed).
  • A migration runner sequences: declare new FK structure → backfill/attribute the affected rows → recompute invalidated subgraph via the normal populate machinery → retire the old structure.

Not alembic: the hard part is not DDL diffing but managing the invalidation cascade with provenance intact.

Scope notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementIndicates new improvements

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions