Skip to content

HealDA layers and DiT patches for PNM Core#1371

Merged
pzharrington merged 26 commits intoNVIDIA:mainfrom
aayushg55:healda-core
Feb 11, 2026
Merged

HealDA layers and DiT patches for PNM Core#1371
pzharrington merged 26 commits intoNVIDIA:mainfrom
aayushg55:healda-core

Conversation

@aayushg55
Copy link
Copy Markdown
Contributor

@aayushg55 aayushg55 commented Feb 4, 2026

PhysicsNeMo Pull Request

Description

Adds necessary layers and patches the existing DiT to enable HealDA integration into PhysicsNemo.

DiT

  • Makes the DiT conditioning module for timestep and condition modular, allowing for custom conditioning_embedder modules. The existing embedding method embedded the timestep and condition separately and then added the two (following the original DiT implementation), as opposed to the EDM/SongUNet style approach, where the two are embedded jointly.
  • Adds DropPath to the DiT
  • Adds a final_dropout toggle to the Mlp
  • Adds options for qk normalization in DiT attention with timm and TE (Note: TE does not support qk_norm_affine unlike timm)

HealDA

  • Scatter aggregation method for tokenizing sparse data onto a dense grid

Breaking Changes/Bug fixes:

  • TE MultiHeadAttention defaults to sequence first sbhd qkv_format format matching PyTorch. This does not match the PNM documentation of TESelfAttention, suggesting batch-first (B, L, D) or the timm default of batch-first, so the qkv_format argument is exposed and the default is changed to bshd (batch-first) for consistency.
  • TE MultiHeadAttention does not support projection dropout, and so the appropriate dropout layer was skipped/not applied previously, making the timm and TE backends incompatible. Added a dropout layer following the attn_op for consistency. Breaking change for any model trained using TE backend with proj_drop_rate > 0.
  • Previously in the DiTBlock, the dropout param in the Mlp was hardcoded to 0, so mlp_drop_rate was ignored. Changed to propagate this correctly. Breaking change to any model trained with mlp_drop_rate>0

Checklist

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@NickGeneva NickGeneva self-requested a review February 4, 2026 23:02
@aayushg55 aayushg55 marked this pull request as ready for review February 5, 2026 18:41
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

This PR adds HealDA integration layers and patches to the DiT model for PhysicsNeMo Core. The changes include new HEALPix-based tokenizer/detokenizer modules, modular conditioning embedders, and several enhancements to the DiT architecture.

Key Changes:

  • New HealDA modules: HPXPatchTokenizer, HPXPatchDetokenizer, CalendarEmbedding, FrequencyEmbedding, and ScatterAggregator
  • Modular conditioning system: replaced hardcoded timestep/condition embedding with pluggable ConditioningEmbedderBase classes (DiTConditionEmbedder, EDMConditionEmbedder, ZeroConditioningEmbedder)
  • Added DropPath support to DiT blocks with linear scheduling
  • Added QK normalization options (RMSNorm, LayerNorm) for attention modules
  • Added final_dropout toggle to Mlp module
  • Fixed bug where mlp_drop_rate was ignored in DiTBlock (previously hardcoded to 0)
  • Fixed bug where TE MultiHeadAttention projection dropout was not applied
  • Changed TE qkv_format default from sbhd to bshd for batch-first consistency

Breaking Changes (as noted in PR description):

  • TE backend now defaults to bshd format instead of sbhd
  • TE backend now correctly applies projection dropout (affects models trained with proj_drop_rate > 0)
  • DiTBlock now correctly propagates mlp_drop_rate (affects models trained with mlp_drop_rate > 0)

Issues Found:

  • QK norm implementation for LayerNorm doesn't pass the qk_norm_affine parameter (line 220 in layers.py)
  • pos_embed initialization assumes it's always a Parameter, but it can be scalar 0 when pos_embed != "learnable" (line 827 in layers.py)
  • Empty ValueError message in CalendarEmbedding (line 104-105 in embedding.py)

Important Files Changed

Filename Overview
physicsnemo/experimental/models/dit/layers.py Added new conditioning embedders (DiT, EDM, Zero), qk_norm support, DropPath, PerSampleDropout; fixed TE proj_drop and mlp_drop bugs; changed TE qkv_format default
physicsnemo/experimental/models/healda/embedding.py New FrequencyEmbedding and CalendarEmbedding modules for time-based embeddings

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment thread physicsnemo/experimental/models/dit/layers.py Outdated
Comment thread physicsnemo/experimental/models/healda/embedding.py Outdated
Comment thread physicsnemo/experimental/models/dit/layers.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 5, 2026

Additional Comments (1)

physicsnemo/experimental/models/dit/layers.py
attempting to initialize pos_embed parameter without checking if it exists - when pos_embed != "learnable", self.pos_embed is set to scalar 0. at line 817, so this line will error

        if isinstance(self.pos_embed, nn.Parameter):
            nn.init.normal_(self.pos_embed, std=0.02)

aayushg55 and others added 3 commits February 5, 2026 10:51
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@pzharrington pzharrington self-requested a review February 6, 2026 18:43
@NickGeneva
Copy link
Copy Markdown
Collaborator

Moved HPX layers into module folder, can update this one now to use the embedding layers there

#1377

Comment thread physicsnemo/experimental/models/dit/dit.py Outdated
Comment thread physicsnemo/experimental/models/dit/dit.py Outdated
Comment thread physicsnemo/experimental/models/dit/dit.py
Comment thread physicsnemo/experimental/models/dit/layers.py
Comment thread physicsnemo/experimental/models/dit/layers.py
Comment thread physicsnemo/experimental/models/dit/layers.py Outdated
Comment thread physicsnemo/experimental/models/dit/layers.py Outdated
Comment thread physicsnemo/experimental/models/dit/layers.py Outdated
Comment thread physicsnemo/experimental/models/dit/layers.py Outdated
Comment thread physicsnemo/experimental/models/dit/layers.py Outdated
@aayushg55
Copy link
Copy Markdown
Contributor Author

Thanks @NickGeneva, updated to use the new hpx embedding layers.

Comment thread physicsnemo/experimental/models/dit/dit.py
@NickGeneva
Copy link
Copy Markdown
Collaborator

NickGeneva commented Feb 9, 2026

Over all I think this looks good, only big pending item left imois verifying that stormscope checkpoints remain operational. Other than than the changes reasonable extensions that dont have major api breaks.

Comment thread physicsnemo/nn/module/mlp_layers.py
@pzharrington
Copy link
Copy Markdown
Collaborator

/blossom-ci

@NickGeneva
Copy link
Copy Markdown
Collaborator

/blossom-ci

@NickGeneva
Copy link
Copy Markdown
Collaborator

/blossom-ci

@NickGeneva
Copy link
Copy Markdown
Collaborator

/blossom-ci

@NickGeneva
Copy link
Copy Markdown
Collaborator

/blossom-ci

@pzharrington pzharrington added this pull request to the merge queue Feb 11, 2026
Merged via the queue into NVIDIA:main with commit 37b3e5c Feb 11, 2026
4 checks passed
@aayushg55 aayushg55 deleted the healda-core branch February 11, 2026 01:16
nbren12 pushed a commit to nbren12/modulus that referenced this pull request Mar 24, 2026
* Add HealDA layers and DiT patches

* renamed the conditioning_embedder for clarity

* delete duplicate embedding module

* update docs

* updated test

* update license

* Propogate qk_norm_affine to layerNorm in timm Attention

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* make ValueError more descriptive

* fix tokenizer pos_embed weight init

* remove hpx layer additions and import from new hpx module

* move conditioning embedder to separate file

* pass list of drop path rates instead of hardcoded linear schedule

* changed default of ditBlock kwargs from None to {}

* change ConditioningEmbedder configuration to enum, cleanup EDMConditionEmbedder, pass timestep_embed_kwargs

* cleanup cond embedder tests

* testing checkpoint compat

* add docstring to mlp

* Test fix

* Update

* Test fix

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Peter Harrington <pharrington@nvidia.com>
Co-authored-by: Nicholas Geneva <5533524+NickGeneva@users.noreply.github.com>
Co-authored-by: Nicholas Geneva <ngeneva@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants