Skip to content

Comments

Diffusion development branch#1899

Open
clessig wants to merge 258 commits intodevelopfrom
mk/mh/1843_viz_denoised_image
Open

Diffusion development branch#1899
clessig wants to merge 258 commits intodevelopfrom
mk/mh/1843_viz_denoised_image

Conversation

@clessig
Copy link
Collaborator

@clessig clessig commented Feb 21, 2026

Description

Diffusion development branch

Issue Number

Closes #944

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

Jubeku and others added 30 commits November 6, 2025 17:21
…andom and healpix masking. Open issues with _coords_local, centroids and probably other things.
TODO:
- Forecast still needs to be adapted
- Some more cleanup of variable naming, return values etc
Jubeku and others added 26 commits January 19, 2026 18:12
* initial commit [draft]

* adapt noise conditioner to make it closer to DiT

* adapt dimensionalities – code runs with default config

* lint

* fix: add conditional prediction mode handling

This commit resolves architectural incompatibilities when integrating
diffusion-based forecast engines:

1. FSDP Sharding: DiffusionForecastEngine wraps ForecastingEngine
   as `self.net`, but trainer code assumed direct `fe_blocks` access. Fixed by:
   - Adding fe_diffusion_model conditional check in init_model_and_shard()
   - Routing to model.forecast_engine.net.fe_blocks for diffusion mode

2. Model Initialization: Reordered ForecastingEngine creation to handle both
   standard and diffusion-wrapped variants with proper fallback.

3. Target Format Handling: Autoencoder mode uses different target
   structure than diffusion mode. Added conditional formatting:
   - Diffusion: targets = {"targets": [targets], "aux_outputs": aux}
   - Autoencoder: targets = {"physical": batch[0]}

4. Config Updates: added file config/diffusion_config.yml for diffusion
   model config

* added forecast engine argument

* removed unecessary logging

* reverting back to the previous config

* replaced getattr by get

* modification of forecasting engine initialization

---------

Co-authored-by: moritzhauschulz <moritz.hauschulz@gmail.com>
Co-authored-by: Matthias Karlbauer <matthias.karlbauer@ecmwf.int>
@github-actions github-actions bot added the model Related to model training or definition (not generic infra) label Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Related to model training or definition (not generic infra)

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

8 participants