[1843][DRAFT] Visualize Denoised Latents in Diffusion Engine by moritzhauschulz · Pull Request #1845 · ecmwf/WeatherGenerator

moritzhauschulz · 2026-02-17T08:24:54Z

Description

This is a DRAFT PR to understand the changes made in an attempt to visualize denoised images with the existing inference and evaluate pipelines. This is currently still causing errors.

Issue Number

Relates to #1843

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

Implemented Identity class TODO: implement EMATeacher

The big question on the EMA teacher side to me is how to allow for a fleixble teacher and student architecture that can differ We updated some APIs of the abstract base class to allow the ema_model forward, subject to change given the loss calculator, which is imho the second big question mark

Easier to read and as batchsize gets more complicated in SSL this will be a useful abstraction

It runs so far. Next steps: - Route all the config options - Start writing the loss functions to understand the state requirements

…andom and healpix masking. Open issues with _coords_local, centroids and probably other things.

…samples

…o the model.

…ll handling

TODO: - Forecast still needs to be adapted - Some more cleanup of variable naming, return values etc

* initial commit [draft] * adapt noise conditioner to make it closer to DiT * adapt dimensionalities – code runs with default config * lint * fix: add conditional prediction mode handling This commit resolves architectural incompatibilities when integrating diffusion-based forecast engines: 1. FSDP Sharding: DiffusionForecastEngine wraps ForecastingEngine as `self.net`, but trainer code assumed direct `fe_blocks` access. Fixed by: - Adding fe_diffusion_model conditional check in init_model_and_shard() - Routing to model.forecast_engine.net.fe_blocks for diffusion mode 2. Model Initialization: Reordered ForecastingEngine creation to handle both standard and diffusion-wrapped variants with proper fallback. 3. Target Format Handling: Autoencoder mode uses different target structure than diffusion mode. Added conditional formatting: - Diffusion: targets = {"targets": [targets], "aux_outputs": aux} - Autoencoder: targets = {"physical": batch[0]} 4. Config Updates: added file config/diffusion_config.yml for diffusion model config * added forecast engine argument * removed unecessary logging * reverting back to the previous config * replaced getattr by get * modification of forecasting engine initialization --------- Co-authored-by: moritzhauschulz <moritz.hauschulz@gmail.com> Co-authored-by: Matthias Karlbauer <matthias.karlbauer@ecmwf.int>

sophie-xhonneux and others added 30 commits October 30, 2025 17:27

Abstract class for target/aux computation

3f1bb7d

Implemented Identity class TODO: implement EMATeacher

adding loss calculator base class

28d9b22

Option for constructing teacher model flexibly

192beb6

Extract get batch size util function

aac7e29

Easier to read and as batchsize gets more complicated in SSL this will be a useful abstraction

Fix mismatched dtypes in the target computation

145d18a

It runs so far. Next steps: - Route all the config options - Start writing the loss functions to understand the state requirements

abstract loss calc structure

f1e7132

add abstract method to loss calculator base class

e822e12

add latent loss class

d24ef48

update loss calc config and rename files

c259c20

restructure loss modules

a19ee16

add ModelOutput dataclass

bf3e128

First draft of diffusion model

711f29b

NOT WORKING: initial draft for index-based masking. Implemented for r…

81bd6eb

…andom and healpix masking. Open issues with _coords_local, centroids and probably other things.

Minor modifications

f367bb4

Linter

1cc168c

Copyright attribution to EDM

48934c2

NOT WORKING: Finished src, target still to be done.

51f437f

Adapt diffusion model to expected data structure

6046694

Corrected data retrieval to only access model_samples and not target_…

f66c9fa

…samples

Minor correction

7e48c39

Masking target is working in principle but errors when feeding data t…

e4a9cc0

…o the model.

Working version for ERA5, NPP-ATMS. Problems with SYNOP with empty ce…

a581405

…ll handling

Minor cleanup

9229e48

Fixed linting

db6f285

Restructuring and correcting forward pass during inference

7866ff7

Fixed remaining problems that occured for NPP-ATMS and SYNOP.

ec38123

TODO: - Forecast still needs to be adapted - Some more cleanup of variable naming, return values etc

Enabled support for forecast. Cleaned up some bits and pieces.

0634105

merge develop

0fa60db

mv streams_data declaration under if condition

cab9fbe

Jubeku and others added 16 commits January 14, 2026 16:09

debug target_aux, loss_module, engines, etc

458e652

debug, diffusion_rn and batch.sample

61dce39

Corrected latent token retrieval in loss calculation

ea4d76c

working training loop on single sample

b875734

update config to fit forecast checkpoint

c91d5c9

Merge branch 'develop' into mk/develop/1300_assemble_diffusion_model

3a8fead

Merge branch 'develop' into mk/develop/1300_assemble_diffusion_model

d58032d

reset default config

91d633b

modify default config for diffusion

bbdb3a1

adding encoder loading to model interface

43b21c4

setting checkpoint to null temporarily

52b6bb1

rm activation checkpoint around diff forecast engine

0f7d4e5

Correct forecast engine initialization

47566be

Merge branch 'develop' into 1300_assemble_diffusion_model_w_develop

82a78f9

code runs...

3ce80f0

github-project-automation bot added this to WeatherGen-dev Feb 17, 2026

moritzhauschulz marked this pull request as draft February 17, 2026 08:25

moritzhauschulz changed the base branch from develop to mk/develop/1300_assemble_diffusion_model February 17, 2026 08:34

moritzhauschulz added 2 commits February 18, 2026 09:21

remove some debugging code

a144867

Merge branch 'develop' into mh/develop/1843_viz_denoised_image

e5cccbe

moritzhauschulz changed the base branch from mk/develop/1300_assemble_diffusion_model to develop February 18, 2026 10:35

moritzhauschulz added 3 commits February 18, 2026 12:42

adjusted diffusion config

63b3f78

fixed inference

83bb4c9

actually fiex inference (via config)

bb3bbe5

github-actions bot added the model Related to model training or definition (not generic infra) label Feb 19, 2026

moritzhauschulz changed the title ~~[1843][DRAFT] Visualize Denoised Images~~ [1843][DRAFT] Visualize Denoised Latents in Diffusion Engine Feb 22, 2026

moritzhauschulz mentioned this pull request Feb 23, 2026

[1843][DRAFT] Diffusion Single Sample #1909

Draft

4 tasks

moritzhauschulz closed this Feb 23, 2026

github-project-automation bot moved this to Done in WeatherGen-dev Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[1843][DRAFT] Visualize Denoised Latents in Diffusion Engine#1845

[1843][DRAFT] Visualize Denoised Latents in Diffusion Engine#1845
moritzhauschulz wants to merge 244 commits intoecmwf:developfrom
moritzhauschulz:mh/develop/1843_viz_denoised_image

moritzhauschulz commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Comments

Conversation

moritzhauschulz commented Feb 17, 2026

Description

Issue Number

Checklist before asking for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants