Enable pretrained EMA teacher by clessig · Pull Request #1838 · ecmwf/WeatherGenerator

clessig · 2026-02-15T14:55:47Z

Description

Also rebased and linted.

Issue Number

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

* implemented * remove eval in interface * lint * incoporate requested changes * fix imports * Fix corner case in inference where data window is empty * Fix missing handling of missing load_chkpt argument in config --------- Co-authored-by: moritzhauschulz <moritz.hauschulz@gmail.com>

…but also as entrypoints. (#1778) * Clean up docstrings, separate cli parsing from running. * remove unused argument stream_dir * separate parser instantiation from adding args * add unified parser with subparsers * implement main function in run_train using subparsers. * update integration tests * remove redundant methods *_from_args (previously used by integration tests) * Move entrypoints to the top of run_train.py * fix typo in small_multi_stream_test.infer_multi_stream * fix formatting * Organize strings into enum. * fix parser

* Implement best effort backward compatibility. * use new `data_pathes` option to look up training data. * fix integration tests * linting * correct spelling in config.py * correct spelling in multi_stream_data_sampler.py * fix typo "data_path_anmoi" -> "data_path_anemoi" in config.py * Update test_config.py * Add suggested comment.

* Fixed most parts of plot_train. Currently missing: handling of stage_configs when these are derived from an earlier stage. * Removed outdated or unsupported options * Fixed final problems with consolidated training/validation config. Required to move Stage to a more appropriate place * Removed old, unused code

…iex/dev/pretrained-frozen-teacher

* nse_metric * length --------- Co-authored-by: Jesica Pinyon Rodriguez <jpinyonr@login07.leonardo.local>

* Add Q-Q analysis metric for extreme value evaluation * Add Q-Q analysis metric for extreme value evaluation * Address PR review: refactor qq_analysis for clarity and extensibility - Rename methods, add helper functions, fix naming conventions - Make percentile thresholds configurable - Create QuantilePlots class, implement generic coordinate handling - All reviewer comments addressed * Fix: remove duplicate inline import in plot_utils.py * Remove JSON serialization bottleneck in Q-Q analysis, use xarray attributes * Add coords='minimal' and combine_attrs='drop_conflicts' to plot_metrics concat for score maps --------- Co-authored-by: iluise <72020169+iluise@users.noreply.github.com>

* accept JSON files in MergeReader * restore original config * merge scores * add reader_type jsonmerge * merge underlying array * set run_id key * add comments * update docstring * add config example * remove opt arguments * same var names scores * remove debug message

* Taking config from forecast_release_v0 branch * Updated config to develop

* Replace cf.rank==0 with utils.distributed.is_root * add 2d rope to develop * simplify assimilate global, forecast mode config * add 2d rope to forecast eigine only once * only keep global & forecast engine add 2d rope * simplify the code * fix lint * small fix * fix annotation * fix lint * add annotation * default config * fix default use_reentrant * use_2d_rope false as defaut * Add copyright notice for RoPE functions and update naming - Add copyright attribution for rotate_half() and apply_rotary_pos_emb() functions - Rename apply_rotary_pos_emb_2d() to rotary_pos_emb_2d() for consistency - Rename config parameter use_2D_rope to rope_2D for better extensibility when supporting different RoPE variants in the future * fix lint * fix lint * add 2d rope to all forecast steps * more confs * add missing enumerate * def forecast config * aux_info=None in Forecast Eng forward * lint * add rope to global engine, which was moved to encoder * 1)init attention module with_2d_rope and rope_learnable_freq 2) add code for learnable frequencey * solve some reviews * fix lint * fix 2 bugs: remove rope in QueryAggregation, and change bs in model.py * temporally remove learnable rope * add rope for register and class tokens; fix lint * rename aux_info in queryaggregation * remove position_ids and change raise valueError to assert * batch size get from get_batch_size_from_config() * revert to default config without batchsize * use self.batch_size * fix lint --------- Co-authored-by: wang85 <wang85@jwlogin22.juwels> Co-authored-by: wang85 <wang85@jwlogin24.juwels> Co-authored-by: Jifeng Wang <wjifeng@santis-ln002.cscs.ch> Co-authored-by: Jifeng Wang <wjifeng@santis-ln001.cscs.ch> Co-authored-by: Kerem Tezcan <kctezcan@gmail.com> Co-authored-by: wang85 <wang85@jwlogin21.juwels> Co-authored-by: wang85 <wang85@jwlogin23.juwels>

* Fix inconsistencies related to logger info, docstrings. Linting * Remove comment in utils.py Removed comment about special handling for Q-Q analysis metric.

* Fix inconsistencies related to logger info, docstrings. Linting * Improve the scatter plot title when it comes to time

* Replace cf.rank==0 with utils.distributed.is_root * hot fix for rope config --------- Co-authored-by: wang85 <wang85@jwlogin22.juwels> Co-authored-by: wang85 <wang85@jwlogin24.juwels> Co-authored-by: Jifeng Wang <wjifeng@santis-ln001.cscs.ch>

* Taking config from forecast_release_v0 branch * Updated config to develop * Rename default forecast config * Increase fe_blocks from 8 to 16 * Added fine-tuning config

* remove nans from geoinfos too * lint

* Fixed logging for pre-training. * Commit right files

…sig/sophiex/dev/pretrained-frozen-teacher

clessig and others added 10 commits February 6, 2026 11:27

fix latent_loss check in mode handling (#1784)

720473c

Fixed for SSL pretraining val loss

15527e4

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into soph…

1cde0d0

…iex/dev/pretrained-frozen-teacher

Linting

dac50b1

Added suppost for EMATeacher from pre-trained model.

1ffcfc9

Config for JEPA from forecast model

0ba03ec

github-project-automation bot added this to WeatherGen-dev Feb 15, 2026

clessig and others added 15 commits February 16, 2026 10:48

Version of logging for fine-tuning

5175703

nse_metric (#1833)

f74bd4f

* nse_metric * length --------- Co-authored-by: Jesica Pinyon Rodriguez <jpinyonr@login07.leonardo.local>

Add the possible the ensemble members to be given as str (#1853)

4ad5919

Adding Europe region for plotting (#1856)

afefb91

[1861][model] Update forecast config (#1862)

dd5fa18

* Taking config from forecast_release_v0 branch * Updated config to develop

Fix corner case for lr scheduler when num_epochs=0

45a9892

Fix inconsistencies related to logger info, docstrings (#1863)

fb5fa0e

* Fix inconsistencies related to logger info, docstrings. Linting * Remove comment in utils.py Removed comment about special handling for Q-Q analysis metric.

Remove nano-seconds from map plots (#1869)

2c5f013

* Fix inconsistencies related to logger info, docstrings. Linting * Improve the scatter plot title when it comes to time

[Hot Fix] 2D Rope config (#1872)

6b33518

* Replace cf.rank==0 with utils.distributed.is_root * hot fix for rope config --------- Co-authored-by: wang85 <wang85@jwlogin22.juwels> Co-authored-by: wang85 <wang85@jwlogin24.juwels> Co-authored-by: Jifeng Wang <wjifeng@santis-ln001.cscs.ch>

[1861][model] Update forecast config v2 (#1875)

37a00eb

* Taking config from forecast_release_v0 branch * Updated config to develop * Rename default forecast config * Increase fe_blocks from 8 to 16 * Added fine-tuning config

change parameter for mergereader (#1874)

e5d4e52

Adding finetung config

97acb35

github-actions bot added data Anything related to the datasets used in the project infra Issues related to infrastructure model Related to model training or definition (not generic infra) labels Feb 19, 2026

Updated configs for warm-start

3e0d4c1

clessig and others added 9 commits February 19, 2026 13:50

Fixed handlign of configs

ea7c37e

Remove nan geoinfos in addition to coords (#1887)

d4b1d0a

* remove nans from geoinfos too * lint

Fixed logging for pre-training (#1883)

7ce1371

* Fixed logging for pre-training. * Commit right files

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into cles…

8594274

…sig/sophiex/dev/pretrained-frozen-teacher

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into cles…

b36b6f6

…sig/sophiex/dev/pretrained-frozen-teacher

Adding missing support for num_samples for collapse metrics

bbbfa2e

Fixed some logging details

2c0a5a5

Introduced channel parameter; improved robustness

8d12643

Sppofing also in validation

ea6e159

github-actions bot added the eval anything related to the model evaluation pipeline label Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Enable pretrained EMA teacher#1838

Enable pretrained EMA teacher#1838
clessig wants to merge 35 commits intosophiex/dev/pretrained-frozen-teacherfrom
clessig/sophiex/dev/pretrained-frozen-teacher

clessig commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Comments

Conversation

clessig commented Feb 15, 2026

Description

Issue Number

Checklist before asking for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants