Skip to content

[IR Container] Phase 3 Draft#6030

Draft
mdavis36 wants to merge 6 commits intomd/segmenter-container-sharingfrom
md/P3-irc
Draft

[IR Container] Phase 3 Draft#6030
mdavis36 wants to merge 6 commits intomd/segmenter-container-sharingfrom
md/P3-irc

Conversation

@mdavis36
Copy link
Collaborator

@mdavis36 mdavis36 commented Mar 7, 2026

No description provided.

@mdavis36
Copy link
Collaborator Author

mdavis36 commented Mar 7, 2026

!test

mdavis36 added 2 commits March 9, 2026 11:26
When a Fusion is destroyed, removeStatementsOwnedBy frees its Exprs
but previously did not clean up uses_/definition_ pointers on Vals.
Shared scalars that survive via the multi-owner guard retained dangling
Expr pointers in their uses_ vectors, causing heap corruption.

Four fixes:
- Swap Expr/Val processing order in removeStatementsOwnedBy so Exprs
  are cleaned up before Vals, and call removeUse()/setDefinition(nullptr)
  before freeing each Expr (matching removeExpr() pattern)
- Add multi-owner guards to removeVal, removeStatementsCreatedAfter,
  and the erase_if path so shared Vals are not freed while other
  Fusions still reference them
- Skip addUse() for shared scalars in registerExpr and clear uses_
  when a Val transitions to shared (addOwningFusion), preventing
  cross-Fusion DAG leakage
- Remove fatal NVF_ERROR in Val::uses() for shared scalars — they
  now correctly return an empty vector
Remove the debug guard (if false &&) and add two exclusion conditions
identified during Phase 3 testing:

- !isFusionInput(): Fusion input scalars (pad widths, reshape sizes)
  need per-Fusion argument binding and must not be shared
- !uses().empty(): Orphaned scalars (extent Vals replaced by
  DynamicTransform concretization) have broken evaluation chains
  and must not be shared
@mdavis36
Copy link
Collaborator Author

mdavis36 commented Mar 9, 2026

!test

1 similar comment
@mdavis36
Copy link
Collaborator Author

mdavis36 commented Mar 9, 2026

!test

Four code paths created Fusions with fresh IrContainers (default
constructor) instead of sharing the source Fusion's container. This
broke the getCurFusion()-based traversal in iter_visitor.cpp which
assumes all Vals are in the same shared container.

Changes:
- fusion_segmenter.cpp: Welford translation test copy now shares
  the source Fusion's IrContainer
- host_ir/container.h: Add shared-container constructor to
  HostIrContainer (forwarding to Fusion's protected constructor)
- communication_executor.cpp, host_ir/lowering.cpp, host_ir/lower.cpp:
  Use shared-container constructor for HostIrContainer creation
@mdavis36
Copy link
Collaborator Author

!test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant