Geotransolver 2d 3d by coreyjadams · Pull Request #1676 · NVIDIA/physicsnemo

coreyjadams · 2026-05-27T15:49:37Z

PhysicsNeMo Pull Request

Reopening this Pull Request.

This refactors GeoTransolver, Flare, and some components of transolver to be enabled fro 2D and 3D cases. The goal is to make these more suitable for structured datasets, and enable domain parallelism in these cases.

I also enabled them in the Darcy transolver model, just so we have an example for users to test these.

In order to enable ShardTensor for GeoTransolver and Flare, and move them out of experimental, we should get this or a similar refactor in.

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Refactor flare to reduce from three duplicate implementations to one.

…odesl, optionally

…E as well

copy-pr-bot · 2026-05-27T15:49:41Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coreyjadams · 2026-05-27T15:50:31Z

As part of this PR, we will need to do model checkpoint surgery on our existing chekpoints. Tagging @ktangsali so we can be sure to validate checkpoints against this.

coreyjadams · 2026-05-27T15:51:48Z

/ok to test 931d139

greptile-apps · 2026-05-27T15:55:24Z

Greptile Summary

This PR extends GeoTransolver, FLARE, and related components to support structured 2D and 3D grids (in addition to the existing unstructured mesh path), and wires the new variants into the Darcy example. The refactoring also extracts shared helpers (_project_input, _compute_slices_from_projections, _flare_self_attention) to reduce duplication across attention backends, and consolidates GALE_FA into gale.py.

New structured-grid support: GALEStructuredMesh2D/3D, StructuredContextProjector, and a structured_shape parameter on GeoTransolver enable Conv2d/Conv3d slice projections with inputs accepted as flat (B,N,C) or spatial (B,H,W,C) / (B,H,W,D,C), with output reshaped to match.
Darcy example refactored: model instantiation switched to hydra.utils.instantiate, Muon optimizer support added, separate train/val TensorBoard writers, and JSONL metrics logging added; however, the dataset paths in config_fix.yaml were replaced with a user-specific internal cluster path that will not work for anyone outside of that environment.
Tensor layout unified: all attention code migrated from (B,H,N,D) to (B,N,H,D) token-first layout with temperature shape updated accordingly; the change appears consistent across all affected modules.

Important Files Changed

Filename	Overview
examples/cfd/darcy_transolver/config_fix.yaml	Config refactored to use Hydra defaults for model selection; dataset paths replaced with user-specific internal cluster paths that will fail for all other users.
examples/cfd/darcy_transolver/train_transolver_darcy_fix.py	Training script significantly extended: model instantiation via Hydra, Muon optimizer support, separate TensorBoard writers, JSONL metrics logging. metrics_file not protected by try/finally.
physicsnemo/experimental/models/geotransolver/gale.py	GALE_FA moved from its own module into gale.py; GALEStructuredMesh2D/3D added via mixin pattern; shared helpers extracted. concrete_dropout silently ignored for structured variants.
physicsnemo/experimental/models/geotransolver/context_projector.py	ContextProjector refactored to use _SliceToContextMixin; StructuredContextProjector added for Conv2d/Conv3d geometry encoding; tensor layout changed from (B,H,N,D) to (B,N,H,D) consistently.
physicsnemo/nn/module/physics_attention.py	_project_input and _compute_slices_from_projections extracted as free functions; temperature shape updated to (1,1,H,1) for new (B,N,H,S) layout; project_input_onto_slices simplified in all concrete subclasses.
physicsnemo/experimental/models/geotransolver/geotransolver.py	structured_shape parameter added; _flatten_for_structured handles 2D/3D spatial or flat inputs; output unflattened to match input layout when structured.
physicsnemo/experimental/nn/flare_attention.py	_flare_self_attention extracted as a reusable free function; FLARE.forward simplified to call it.
test/models/geotransolver/test_geotransolver.py	Four new tests added covering structured 2D/3D forward pass, global context with structured grids, and rejection of incompatible options.

Comments Outside Diff (1)

examples/cfd/darcy_transolver/train_transolver_darcy_fix.py, line 493 (link)

metrics_file not closed on exception

metrics_file is opened with open(metrics_path, "a") and only closed in a guarded block at the very end of darcy_trainer. Any exception raised during training (e.g., a CUDA OOM or a shape mismatch) will skip the closing block, leaving the file handle open until the process exits. Wrapping the training loop in a try/finally — or using a context manager — would ensure the file is always flushed and closed.

_{Reviews (1): Last reviewed commit: "Merge branch 'NVIDIA:main' into geotrans..." | Re-trigger Greptile}

greptile-apps · 2026-05-27T15:55:30Z

+  train_path: //lustre/fsw/portfolios/coreai/users/coreya/datasets/darcy_fix/example_data/piececonst_r421_N1024_smooth1.npz
+  test_path: //lustre/fsw/portfolios/coreai/users/coreya/datasets/darcy_fix/example_data/piececonst_r421_N1024_smooth2.npz


Hardcoded internal cluster paths

train_path and test_path are set to paths on NVIDIA's internal Lustre filesystem (//lustre/fsw/portfolios/coreai/users/coreya/...). Anyone cloning the repo and running this example will immediately hit a file-not-found error. These should be replaced with placeholder paths such as /path/to/piececonst_r421_N1024_smooth1.npz — matching the documented README — so that users know they need to point these at their own downloaded copies of the dataset.

greptile-apps · 2026-05-27T15:55:31Z

+    dir: .
+
+output_dir: ./output/
+run_id: ${hydra:runtime.choices.model}-muon_${precision}_r${resolution}_b${data.batch_size}_s${model.slice_num}


run_id has the string "muon" hard-coded, so runs using optimizer.type: adamw will still produce a run_id that says muon. Consider interpolating from the config so the recorded name always matches the actual optimizer in use.

Suggested change

run_id: ${hydra:runtime.choices.model}-muon_${precision}_r${resolution}_b${data.batch_size}_s${model.slice_num}

run_id: ${hydra:runtime.choices.model}-${optimizer.type}_${precision}_r${resolution}_b${data.batch_size}_s${model.slice_num}

greptile-apps · 2026-05-27T15:55:36Z

        else:
            self.ln_1 = nn.LayerNorm(hidden_dim)

-        # Attention layer
+        dim_head = hidden_dim // num_heads
+        # First match on attention backend, then on spatial shape
        match attention_type:
            case 'GALE':
-                self.Attn = GALE(
-                    hidden_dim,
-                    heads=num_heads,
-                    dim_head=hidden_dim // num_heads,
-                    dropout=dropout,
-                    slice_num=slice_num,
-                    use_te=use_te,
-                    plus=plus,
-                    context_dim=context_dim,
-                    concrete_dropout=concrete_dropout,
-                    state_mixing_mode=state_mixing_mode,
-                )
+                if spatial_shape is None:
+                    self.Attn = GALE(
+                        hidden_dim,
+                        heads=num_heads,
+                        dim_head=dim_head,
+                        dropout=dropout,
+                        slice_num=slice_num,
+                        use_te=use_te,
+                        plus=plus,
+                        context_dim=context_dim,
+                        concrete_dropout=concrete_dropout,
+                        state_mixing_mode=state_mixing_mode,
+                    )
+                elif len(spatial_shape) == 2:
+                    self.Attn = GALEStructuredMesh2D(
+                        hidden_dim,
+                        spatial_shape=(int(spatial_shape[0]), int(spatial_shape[1])),
+                        heads=num_heads,
+                        dim_head=dim_head,
+                        dropout=dropout,
+                        slice_num=slice_num,
+                        use_te=use_te,
+                        plus=plus,
+                        context_dim=context_dim,
+                        state_mixing_mode=state_mixing_mode,
+                    )
+                elif len(spatial_shape) == 3:
+                    self.Attn = GALEStructuredMesh3D(
+                        hidden_dim,
+                        spatial_shape=(
+                            int(spatial_shape[0]),
+                            int(spatial_shape[1]),
+                            int(spatial_shape[2]),
+                        ),
+                        heads=num_heads,
+                        dim_head=dim_head,
+                        dropout=dropout,
+                        slice_num=slice_num,
+                        use_te=use_te,
+                        plus=plus,
+                        context_dim=context_dim,
+                        state_mixing_mode=state_mixing_mode,
+                    )
+                else:
+                    raise ValueError(
+                        f"spatial_shape must be None, length-2, or length-3; got {spatial_shape!r}"
+                    )
            case 'GALE_FA':
                self.Attn = GALE_FA(
                    hidden_dim,
                    heads=num_heads,
-                    dim_head=hidden_dim // num_heads,
+                    dim_head=dim_head,
                    dropout=dropout,
                    n_global_queries=slice_num,
                    use_te=use_te,


concrete_dropout silently ignored for structured GALE variants

When spatial_shape is not None, GALE_block selects GALEStructuredMesh2D or GALEStructuredMesh3D, neither of which accepts a concrete_dropout argument. If a caller sets concrete_dropout=True on a structured GALE_block, the option is silently dropped — standard nn.Dropout is used instead and no warning is emitted. At minimum, a warnings.warn when concrete_dropout=True and spatial_shape is not None would make this limitation discoverable.

coreyjadams · 2026-05-28T14:05:37Z

/blossom-ci

coreyjadams added 23 commits March 13, 2026 22:36

Update geotransolver for 2d and 3d use cases

a0cd8b0

Merge branch 'main' into geotransolver-2d-3d

1f3f225

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

036723a

Merge branch 'main' into geotransolver-2d-3d

46919d6

Merge branch 'main' into geotransolver-2d-3d

15d721d

Merge branch 'main' into geotransolver-2d-3d

b498499

Fix issues with the merge of context projector.

f13862b

Merge branch 'main' into geotransolver-2d-3d

8688949

Refactor physics attention and gale to consolidate implementations.

57981f5

Refactor flare to reduce from three duplicate implementations to one.

Finish wrapping up some code movements.

93084eb

Merge branch 'main' into geotransolver-2d-3d

2dad0b5

Update transolver darcy example to use the 2Dgeotransolver or flare m…

ea2aa99

…odesl, optionally

Update README

81ddd6a

Add missing classes in comments

b2579d8

Merge branch 'main' into geotransolver-2d-3d

2f6d5bd

Address review comments for geotransolver 2d/3d unification with FLAR…

1350ff8

…E as well

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

82785d7

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

d0d6ca3

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

beaf582

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

021a7ed

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

9dc61aa

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

8bd97e0

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

931d139

coreyjadams requested a review from loliverhennigh as a code owner May 27, 2026 15:49

coreyjadams requested a review from ktangsali May 27, 2026 15:49

greptile-apps Bot reviewed May 27, 2026

View reviewed changes

Merge branch 'main' into geotransolver-2d-3d

e06568e

coreyjadams mentioned this pull request May 29, 2026

Fix Transolver structured-embedding flattening #1684

Merged

6 tasks

coreyjadams added 2 commits June 1, 2026 20:01

Merge branch 'main' into geotransolver-2d-3d

000fde6

Merge branch 'NVIDIA:main' into geotransolver-2d-3d

b4a6d83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Geotransolver 2d 3d#1676

Geotransolver 2d 3d#1676
coreyjadams wants to merge 26 commits into
NVIDIA:mainfrom
coreyjadams:geotransolver-2d-3d

coreyjadams commented May 27, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

coreyjadams commented May 27, 2026

Uh oh!

coreyjadams commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 27, 2026

Uh oh!

greptile-apps Bot May 27, 2026

Uh oh!

greptile-apps Bot May 27, 2026

Uh oh!

coreyjadams commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		train_path: //lustre/fsw/portfolios/coreai/users/coreya/datasets/darcy_fix/example_data/piececonst_r421_N1024_smooth1.npz
		test_path: //lustre/fsw/portfolios/coreai/users/coreya/datasets/darcy_fix/example_data/piececonst_r421_N1024_smooth2.npz

	run_id: ${hydra:runtime.choices.model}-muon_${precision}_r${resolution}_b${data.batch_size}_s${model.slice_num}
	run_id: ${hydra:runtime.choices.model}-${optimizer.type}_${precision}_r${resolution}_b${data.batch_size}_s${model.slice_num}

Conversation

coreyjadams commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

coreyjadams commented May 27, 2026

Uh oh!

coreyjadams commented May 27, 2026

Uh oh!

greptile-apps Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coreyjadams commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coreyjadams commented May 27, 2026 •

edited

Loading

greptile-apps Bot commented May 27, 2026 •

edited

Loading