Add AeroJEPA model + SuperWing tutorial recipe (experimental) by fgiral000 · Pull Request #1690 · NVIDIA/physicsnemo

fgiral000 · 2026-06-01T18:07:19Z

PhysicsNeMo Pull Request

Description

Adds the AeroJEPA model and a SuperWing tutorial recipe under
physicsnemo.experimental and examples/cfd/external_aerodynamics/.
AeroJEPA is a Joint-Embedding Predictive Architecture for 3D
aerodynamic surrogate modeling: instead of mapping geometry directly to
a flow field, it predicts a latent representation of the flow from a
latent representation of the geometry and operating conditions, and
reconstructs the field through a continuous implicit decoder when
needed (Giral et al., arXiv:2605.05586).

What this PR delivers:

Model at physicsnemo.experimental.models.aerojepa.
AeroJEPA composes a context encoder, a target encoder, a query-token
field decoder (collectively AeroJEPATrunk), and a JEPA predictor
head (PrototypeTokenJEPAHead) into a single
physicsnemo.core.module.Module. The training path takes context
positions/features, independent target encoder surface/volume inputs,
and operating conditions; the predictor predicts target tokens, and
the decoder evaluates the field at user-supplied query points.
predict is a no-grad inference wrapper; decode_field_chunked
supports memory-bounded evaluation over very large query sets.
Concrete encoders (ContextTransformer, TargetTransformer,
PointTransformer), the QueryTokenDecoder, and the encoder ABCs
are all exposed as composable components.
Building blocks at
physicsnemo.experimental.models.aerojepa.layers. TokenSet and
EncoderOutput token dataclasses, a deterministic
FourierPositionalEncoding, ResidualMLP, the
LocalPointTransformerBlock / LocalTokenCrossAttentionBlock
attention blocks (with optional AdaLN / AdaLN-Zero conditioning), the
PointCloudTokenizer (seven center-selection strategies with k-NN
cluster pooling), token batching / mask / k-NN helpers, and prototype
anchor build / load utilities. TokenSet and EncoderOutput are
re-exported from the model package for convenience.
Losses at physicsnemo.experimental.models.aerojepa.losses.
SIGReg and TokenLatentSIGReg (a sketch isotropic-Gaussian
regularizer for latent-token distributions, with a padding-aware
wrapper), the flatten_valid_token_features /
reshape_token_features_for_sigreg masking helpers, and the
reconstruction loss family (MSELoss / RelativeL2Loss /
RelativeMSELoss / RelativeL2MSELoss, each with functional and
nn.Module forms, optional per-channel weights stored as a
persistent buffer, optional per-point weights, and an optional
validity mask).
Tutorial recipe at
examples/cfd/external_aerodynamics/aerojepa. End-to-end Hydra-driven
workflow on the public SuperWing dataset (Yang et al.,
arXiv:2512.14397): dataset download via the Hugging Face Hub
(yunplus/SuperWing), automatic split-by-geometry manifest and
per-channel normalization stats, JEPA training (reconstruction +
latent + SIGReg with linear warmups; AdamW +
warmup-cosine; optional EMA), checkpointed inference with chunked
decoding, three-panel GT | Pred | |Error| field plots for the three
surface channels (Cp, Cf_tau, Cf_z), per-channel relative-L2 /
RMSE / MAE metrics on the test split, and a pressure-only CL/CD
post-processor that integrates the surface field and emits a per-case
CSV plus a parity scatter.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Tests

193 unit tests under test/experimental/models/aerojepa/
(constructor + attribute checks, non-regression shape checks on the
encoders, decoder, predictor, trunk, top-level model, layers, and
losses). pytest test/experimental/models/aerojepa/ -q passes
locally on CPU (~20 s).
Full SuperWing end-to-end smoke-tested on a single GPU:
train.py -> inference.py -> superwing_metrics -> superwing_forces.
Training losses decrease monotonically; inference produces field
plots, per-case field-error metrics, and a force-coefficient parity
scatter.

Dependencies

No new core dependencies. The example recipe adds optional
example-side dependencies in
examples/cfd/external_aerodynamics/aerojepa/requirements.txt
(Hugging Face Hub for the dataset download, plotting and
post-processing utilities). Pre-commit hooks, ruff, interrogate,
markdownlint, and the SPDX license check pass on every file in the
PR.

Create an empty subpackage for the AeroJEPA reusable building blocks (attention blocks, geometry tokenizer, context/target encoders, decoder, predictor) that land in subsequent commits. Establishes the SPDX license header, module docstring, and ``__all__`` placeholder so that follow-up commits only need to register new public symbols. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

TokenSet bundles token features with their geometric coordinates and optional mask, global token, and auxiliary side data; EncoderOutput is a thin wrapper used by context and target encoders to surface a global summary alongside the per-token output. Includes raw-string docstrings with Parameters/Examples sections (three executable doctests), modern union syntax, and the SPDX header. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

A deterministic log-frequency sinusoidal positional encoding used to lift continuous query coordinates into a high-dimensional feature space before the implicit decoder consumes them. Distinct from physicsnemo.nn.FourierEmbedding (random Gaussian frequencies on scalar timesteps); this variant uses fixed log-powers of pi on multi-dim coordinates with the standard sin/cos band layout. Includes an out_dim property, jaxtyping on forward, and an executable doctest. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

A private _gpu_knn module bundling chunked torch.cdist plus topk for building homogeneous (gpu_knn_self) and bipartite (gpu_knn_bipartite) k-NN graphs and inverse-distance interpolation (gpu_knn_interpolate). Pure PyTorch, no warp or custom CUDA — works on CPU too, just slower. The leading underscore on the filename makes the module package- private; callers live inside the aerojepa subpackage only. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

A token_utils module with the helpers used by the AeroJEPA tokenizer, encoders and attention blocks: gather_rows, counts_to_mask, flatten_padded_batch / unflatten_to_padded, compute_batch_offset_step, flatten_batched_coords, chunked_knn_indices (CPU/GPU dispatcher with the AE_KNN_BACKEND env override), masked_mean, trim_batched_tokens, and pad_token_sets. Behavior preserved; types modernized and the TokenSet import is package-relative. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

A reusable trio of attention building blocks: ResidualMLP (pre-norm residual MLP with optional AdaLN / AdaLN-Zero conditioning), LocalPointTransformerBlock (local self-attention over a per-point k-NN graph with learned relative-position bias), and LocalTokenCrossAttentionBlock (cross-attention from queries to a per-query k-NN of context tokens, with a 5-way conditioning MLP that modulates query and key/value sides independently). Behavior preserved: zero-init conditioning MLPs give an identity transform at construction time, and the N<=1 / empty-input fallbacks short-circuit the same way they do upstream. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

A tokenizer module that reduces a raw point set to a bounded token budget before attention. Seven strategies: identity, random, FPS, random/FPS/voxel-FPS cluster pooling, and prototype-anchored clustering. The cluster strategies return the kNN indices that link each token center back to the source points, allowing a downstream encoder to replace the default feature mean with a learned pooling (e.g. the message-passing PointClusterGraphPool that lands with the encoders in PR NVIDIA#3). Behavior preserved including the non-persistent prototype_coords buffer and the per-sample loop used by the prototype strategy. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Build the fixed k-means anchor set used by the data_prototype_cluster tokenizer strategy. The build pass walks a training dataset, tokenizes each sample to obtain candidate token coordinates, optionally subsamples, runs chunked Lloyd-iteration k-means with empty-cluster FPS refill, sorts the centers lexicographically, and serializes them with a JSON metadata blob. Two load functions (target / context - identical file layout) and two ensure_* helpers (load-if-exists else build) round out the public surface. Behavior preserved; the seed argument governs k-means initialization and candidate subsampling but not the tokenizer pass, which intentionally uses random sampling. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Export the 23 public symbols from the six source modules at the package level: TokenSet/EncoderOutput dataclasses, FourierPositionalEncoding, ResidualMLP and the two local attention blocks, PointCloudTokenizer, the ten batching/mask/kNN helpers, and the six prototype anchor build/load functions. Module docstring tightened to reflect the actual contents (encoders / decoder / predictor land in physicsnemo.experimental.models.aerojepa in a later PR). The package-private _gpu_knn helpers remain accessible via their submodule path but are intentionally not re-exported. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Re-export the five AeroJEPA nn.Module layer classes (FourierPositionalEncoding, ResidualMLP, LocalPointTransformerBlock, LocalTokenCrossAttentionBlock, PointCloudTokenizer) at the experimental.nn parent namespace, alongside the existing FLARE and DiffusionUNet3D family. Data types (TokenSet, EncoderOutput), batching/mask helpers, and prototype-anchor builders stay scoped to the aerojepa subpackage to keep the parent namespace focused on actual layers. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Tests for TokenSet and EncoderOutput covering construction (both batched and unbatched), the is_batched / token_dim properties, the with_updates immutability + selective-replacement contract, and the independence of the default aux dict across instances. Uses the shared device fixture so the CUDA path runs when available. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

…locks Six new test files covering positional_encoding, attention_blocks, point_tokenizer, token_utils, _gpu_knn, and prototype_anchors. 85 new tests covering: constructor validation paths, forward output shapes, edge cases (N<=1 LPT fallback, empty cross-attention, empty/single- point kNN, missing voxel_size, non-persistent prototype_coords buffer), identity-at-init of AdaLN-Zero conditioning MLPs, the AE_KNN_BACKEND env override, and build/load round-trips with a tiny fake dataset. All tests use the shared device fixture so CUDA runs when available; CPU run is 18 s wall. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Documents the new physicsnemo.experimental.nn.aerojepa subpackage contributed across the preceding 12 commits on this branch: token dataclasses, Fourier positional encoding, ResidualMLP, the two local attention blocks, PointCloudTokenizer, token batching/mask/kNN helpers, and prototype anchor utilities, plus the parent-namespace re-export of the five layer classes. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Create an empty subpackage for JEPA-style losses and regularizers (SIGReg, TokenLatentSIGReg, the padding-aware masking helpers, and the reconstruction loss family) that land in subsequent commits. Establishes the SPDX license header, module docstring, and ``__all__`` placeholder so that follow-up commits only need to register new public symbols. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Two utilities used by SIGReg / TokenLatentSIGReg to flatten padded batched token features and reshape them into the (T, B, D) layout SIGReg expects. flatten_valid_token_features is a passthrough on rank-2 inputs and uses boolean masking on rank-3 inputs; reshape_token_features_for_sigreg adds the leading T=1 axis and emits a zero-element (1, 0, D) placeholder when the mask removes every row. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

SIGReg pushes a learned latent toward N(0, I) by comparing the empirical Fourier characteristic function of random projections against the reference Gaussian one on a uniform knot grid (the LeWorldModel construction). Three non-learnable buffers cache the knot positions, the reference window, and the trapezoidal + window-weighted integration weights. TokenLatentSIGReg is a thin wrapper that accepts (B, N, D) or (N, D) features plus an optional mask, drops padded rows via the masking helpers, and short-circuits to a zero scalar when the mask removes every row. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Four loss families exposed as functional and nn.Module variants: mse_loss / MSELoss (channel-weighted MSE with mask + point weights), relative_l2_loss / RelativeL2Loss (per-channel relative L2 averaged over channels), relative_mse_loss / RelativeMSELoss (relative MSE with selectable pointwise vs channel_max normalization), and the relative_l2_mse_loss / RelativeL2MSELoss hybrid that linearly combines the L2 and MSE terms. Channel weights are stored as a persistent float32 buffer on the Module variants when supplied, and as a non-persistent None buffer otherwise. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Move the JEPA losses subpackage from physicsnemo.experimental.metrics.jepa to .metrics.aerojepa so it mirrors the nn.aerojepa naming. Populate the package __init__ with the 12 public re-exports from masking, sigreg, and reconstruction (flatten/reshape token helpers, SIGReg/TokenLatentSIGReg, and the four reconstruction loss families both functional and as nn.Module). Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Three test files mirroring the source modules: test_masking, test_sigreg, test_reconstruction. 37 tests covering constructor validation, forward shape, edge cases (rank-1, empty batch, all-False mask), the SIGReg buffer layout, state_dict persistence of channel_weights on the reconstruction Module variants, both modes of relative_mse_loss, and the hybrid degenerating to either of its two sub-losses when the corresponding weight is zero. CPU run is 4 s wall; device fixture picks up CUDA automatically. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Documents the new physicsnemo.experimental.metrics.aerojepa subpackage contributed across the preceding 6 commits on this branch: SIGReg / TokenLatentSIGReg regularizers, masking helpers, and the four reconstruction loss families. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Create an empty subpackage for the top-level AeroJEPA model and its model-specific subcomponents (context/target/point encoders, decoder, predictor, trunk) that land in subsequent commits. Module docstring points readers to experimental.nn.aerojepa for the reusable building blocks the model is composed from. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Abstract base classes BaseContextEncoder and BaseTargetEncoder (plus the encoders subpackage init) define the contract concrete encoders must satisfy: a required forward returning an EncoderOutput and an optional forward_batched gated by a supports_batched_forward class flag. The context encoder's forward args are named context_pos / context_feat (these bundle the boundary and any volumetric samples in whole-domain models; the SDF channel in context_feat distinguishes the two halves at inference). The target encoder keeps the surface / volume split because training-time subsamplings for the two are intentionally decoupled. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Context tokens are produced from geometry alone - operating conditions enter the model downstream at the predictor head, not at the context branch. Remove gen_params from BaseContextEncoder forward and forward_batched signatures. Class docstring spells out the intent. BaseTargetEncoder is untouched. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

PointTransformer (point.py) is a point-cloud encoder building block: tokenizes the input via PointCloudTokenizer, embeds tokens with a Fourier positional encoding plus per-feature linear projection, optionally adds a conditioning vector, runs a stack of LocalPointTransformerBlock layers with configurable dilation, and emits an EncoderOutput. Two entry points - encode_single for unbatched inputs and forward_batched for padded batches with per-batch coordinate offsetting so the inner k-NN does not mix tokens across batch items. The same file carries the build_geometry_features helper (assembles per-point features from positions and optional SDF / normals / n-dot channels) and the message-passing PointClusterGraphPool used when tokenizer_cluster_pooling='graph'. ContextTransformer (context.py) is the concrete BaseContextEncoder. Takes context_pos and context_feat - no gen_params, since operating conditions enter the model downstream at the predictor head. Internally wraps PointTransformer with conditioning disabled. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Mirror the context-side change: target encoders take their inputs straight, with no gen_params threaded through. Operating conditions enter the model only at the predictor head. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

JEPA target encoders are self-attention only. Remove context_tokens from forward and forward_batched. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Concrete BaseTargetEncoder that wraps an inner PointTransformer. Forward concatenates surface and volume into one bundled point set; forward_batched weaves variable-length surface and volume halves per batch via counts_to_mask. Self-attention only. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Implicit field decoder driven by cross-attention to target tokens. Per-query embedding is a Fourier positional encoding plus optional SDF channel and optional cond vector; cross-attention to the target token set refines it, a trunk MLP and head produce the output. Several optional behaviors wire in: wall-velocity gate, pressure split head (MLP or SIREN), final SIREN refinement, extra SDF features. Both forward (single) and forward_batched (padded) process queries in chunks of query_chunk_size and return (pred, query_embeddings). SineLayer and SirenHead are the small SIREN building blocks used by the optional heads. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

The JEPA predictor head. Maps a target-token coordinate set to predicted target-token features, given context tokens and a conditioning vector. Operating conditions enter the model here (via the cond argument), projected once and threaded into every self- and cross-attention block. Accepts both unbatched (rank-2 context features) and padded batched (rank-3) inputs; target_positions and cond are broadcast across the batch when their leading dim is 1. The forward signature uses target_positions as the parameter name (not target_coords) for consistency with the rest of the model API. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Owns context encoder, target encoder, and decoder, and wires them together. encode_context runs both encoders and emits a dict with context tokens, target tokens, and the decoder-side cond_global. decode_queries decodes a target token set at supplied query positions, optionally producing a per-query mask logit when the mask head is enabled. forward_single and forward_batch are convenience wrappers chaining the two phases for unbatched and padded batched inputs respectively. Public args use context_pos / context_feat naming; gen_params is used to build cond_global but is not threaded into the encoders. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Composes AeroJEPATrunk and PrototypeTokenJEPAHead into a single physicsnemo.Module with full MOD-001 / MOD-006 / MOD-010 compliance: @DataClass AeroJEPAMetaData inheriting ModelMetaData, jaxtyping on all public methods, validation guarded by torch.compiler.is_compiling, constructor taking typed components (no cfg dict, no kwargs). The forward entry takes context_pos / context_feat / gen_params / query_pos / query_sdf and derives target-token positions internally via build_target_token_coords (the target encoder's tokenizer with a placeholder feature tensor); callers no longer supply target_coords. predict is a no-grad wrapper around forward. encode_geometry, encode_geometry_and_flow, predict_field_tokens, decode_field, and build_target_token_coords are exposed for training-loop callers and for latent-optimization workflows that want to cache target coordinates across many predictor evaluations. decode_field_chunked wraps the decoder in chunked + autocast + CPU-offload for memory-bounded inference on very large query sets. The class docstring ships an executable doctest that wires the whole chain and asserts a forward-pass shape. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Re-export AeroJEPA + AeroJEPAMetaData (top-level model), AeroJEPATrunk and PrototypeTokenJEPAHead (composable components), QueryTokenDecoder, BaseContextEncoder / BaseTargetEncoder (ABCs for custom encoders), and the three concrete encoders ContextTransformer / PointTransformer / TargetTransformer. Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Ten new test files mirroring the source modules: encoders/test_base, encoders/test_context, encoders/test_point, encoders/test_target, test_decoder, test_predictor, test_trunk, and test_aerojepa. 63 tests covering constructor validation, signature checks (drops of target_coords / gen_params / context_tokens per the API redesign), forward / forward_batched shapes, every optional decoder feature (pressure split head, SIREN refinement, wall gate, extra SDF features), predictor broadcasting paths, trunk wiring (mask head on/off), and the top-level model contract (physicsnemo Module subclass, plain-tensor forward, no-grad predict, single-arg build_target_token_coords, chunked CPU-offload decode). Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>

Documents the new physicsnemo.experimental.models.aerojepa subpackage: the AeroJEPA top-level model and its composable subcomponents (context/target encoders, decoder, predictor, trunk). Signed-off-by: fgiral000 <fa.giral@alumnos.upm.es>