fix: surface state:modified nodes in summary lineage graph#1192
fix: surface state:modified nodes in summary lineage graph#1192dtaniwaki wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
... and 4 files with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes the PR summary lineage graph so it can surface state:modified nodes even when their SQL checksum hasn’t changed (e.g., macro/config-only changes), by plumbing the adapter-computed lineage_diff.diff into the summary graph builder and allowing that diff to override checksum-derived status.
Changes:
- Pass
lineage_diff.diffinto_build_lineage_graphfromgenerate_markdown_summary. - Add a
Node.apply_diff()override mechanism (_forced_change_status) so externally computed diffs can drivechange_status. - Add unit tests covering
apply_diffbehavior and_build_lineage_graph(..., diff=...).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
recce/summary.py |
Allows lineage graph node status to be overridden by adapter diff and wires diff into markdown summary generation. |
tests/test_summary.py |
Adds tests ensuring diff-driven “modified” nodes appear in the graph and in “What’s Changed”. |
Code Review — PR 1192SummaryFixes Findings[Warning] "Code" label is misleading for macro/config-only changesFile: [Warning] Pre-existing shared mutable class-level defaults in
|
|
@dtaniwaki thanks for the PR! Can you please resolve the conflict in If you feel the issues are not valid, please say as such - and address them as necessary. |
| @@ -336,6 +346,14 @@ def _build_lineage_graph(base, current) -> LineageGraph: | |||
| node = graph.nodes[node_id] | |||
| node.update_data(node_data, "current") | |||
|
|
|||
| # Apply externally computed diff (e.g., from state:modified or macro detection). | |||
| # This allows nodes whose SQL checksum didn't change (e.g., macro-affected nodes) | |||
| # to be surfaced as modified in the graph. | |||
| if diff: | |||
| for node_id, node_diff in diff.items(): | |||
| if node_id in graph.nodes: | |||
| graph.nodes[node_id].apply_diff(node_diff) | |||
|
|
|||
There was a problem hiding this comment.
LineageGraph.nodes and LineageGraph.edges are mutable class attributes, so _build_lineage_graph() mutates shared state across calls. With the new apply_diff() / _forced_change_status override, a node marked modified in one summary can remain modified in later summaries even when the next call’s diff doesn’t include it (and graphs can also retain stale nodes/edges from previous builds). Make nodes/edges instance attributes (e.g., add __init__ that sets self.nodes = {} and self.edges = {}), or otherwise ensure each _build_lineage_graph() call starts from a fresh graph state.
|
Hey @dtaniwaki , thanks for the update on the PR however I see a few issues:
Once that is addressed we should be good to go |
Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
4a6a388 to
4ad2699
Compare
gcko
left a comment
There was a problem hiding this comment.
Review: APPROVE — no blocking issues
The core fix is correct, minimal, well-targeted, and regression-safe. Plumbing lineage_diff.diff into _build_lineage_graph (plus the apply_diff / _forced_change_status override on Node) is exactly what surfaces macro/config-only state:modified nodes that were previously invisible in the summary lineage graph, and it does so additively — checksum-detected changes absent from diff still fall back to the original logic, the new diff parameter is optional with a None default, and the if node_id in graph.nodes guard prevents KeyError for package-filtered/absent nodes. The change also clears both prior-cycle blockers: Copilot's LineageGraph shared-mutable-class-attr concern (now a proper __init__) and the reversed base/current test-argument order. Verified against the real producer — the dbt adapter's get_lineage_diff populates diff via select_nodes(select="state:modified") — so the premise holds. Patch coverage is 98.8%.
All remaining findings are LOW/NIT and non-blocking follow-ups.
Non-blocking follow-ups
- LOW —
recce/summary.py:163-164(Node._what_changed): a node force-markedmodifiedviaapply_diff(macro/config change, SQL body unchanged) is labeled "Code" changed. A user opening that node's SQL diff sees no difference, which is misleading. Non-blocking — invisible nodes were strictly worse — but worth a follow-up to extendNodeDiffwith a reason and emit "Macro"/"Config". (Flagged in the prior cycle by the claude review; still open in the current diff.) - LOW —
recce/summary.py:78-80(Node.apply_diff): the method consumes onlynode_diff.change_statusand discardsNodeDiff.change(thecategorybreaking/non-breaking signal and per-columncolumnsmap). Acceptable for the summary graph's current needs, but it means breaking/schema signaling cannot flow through this path later without reworking the override. - LOW —
recce/summary.py:82-85(Node.change_status): the property returns_forced_change_statusbefore the structuraldata_from(added/removed) check, so the diff's status is trusted over the node's actual base/current presence. Currently safe — both the dbt adapter and the base adapter builddiffconsistently with node presence — but an inconsistent diff would silently drive wrong edge rendering inget_edge_str. Defensive observation, not a live bug. - NIT —
tests/test_summary.py:326-328(test_no_diff_preserves_existing_behavior): the test compares_build_lineage_graph(base, curr)against_build_lineage_graph(base, curr, None), which are identical by the default-argument definition, making it near-tautological. It never assertsmodified_setis non-empty, nor that the with-diff path is a superset of the no-diff path. Weak regression guard; consider asserting concrete expected nodes or comparing the diff-applied path against a known fixture. - NIT —
recce/summary.py:277(LineageGraph.checks): remains a class-level attribute (checks: List[CheckSummary] = None) outside the new__init__. It's an immutableNonedefault reassigned per instance, so no shared-state bug, but for consistency with thenodes/edgesfix it could move into__init__.
Nice work — approving.
Thanks for maintaining this project! I'd appreciate a review of this bug fix when you have a chance.
PR checklist
What type of PR is this?
fix
What this PR does / why we need it:
When a dbt macro or project config changes,
state:modifiedcorrectly returns the affected downstream nodes — even if their SQL body (and thus file checksum) is unchanged. However,generate_markdown_summarywas building the lineage graph from checksums alone, never passing thelineage_diff.diffthat the adapter had already computed. As a result, nodes changed only by macros or configs were silently invisible in the PR summary comment.The fix is a one-line change in
generate_markdown_summary: passlineage_diff.diffto_build_lineage_graph. A small supporting mechanism (apply_diff/_forced_change_status) lets the graph override the checksum-based status for nodes thatstate:modifiedflagged but whose SQL didn't change.Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
The
diffdict was already being populated correctly by the adapter (it callsstate:modifiedand maps each node to aNodeDiff). The bug was purely in the summary layer not consuming it.Does this PR introduce a user-facing change?: