Skip to content

Comments

Enable pretrained EMA teacher#1838

Open
clessig wants to merge 35 commits intosophiex/dev/pretrained-frozen-teacherfrom
clessig/sophiex/dev/pretrained-frozen-teacher
Open

Enable pretrained EMA teacher#1838
clessig wants to merge 35 commits intosophiex/dev/pretrained-frozen-teacherfrom
clessig/sophiex/dev/pretrained-frozen-teacher

Conversation

@clessig
Copy link
Collaborator

@clessig clessig commented Feb 15, 2026

Description

Also rebased and linted.

Issue Number

Closes #1815

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

clessig and others added 10 commits February 6, 2026 11:27
* implemented

* remove eval in interface

* lint

* incoporate requested changes

* fix imports

* Fix corner case in inference where data window is empty

* Fix missing handling of missing load_chkpt argument in config

---------

Co-authored-by: moritzhauschulz <moritz.hauschulz@gmail.com>
…but also as entrypoints. (#1778)

* Clean up docstrings, separate cli parsing from running.

* remove unused argument stream_dir

* separate parser instantiation from adding args

* add unified parser with subparsers

* implement main function in run_train using subparsers.

* update integration tests

* remove redundant methods *_from_args (previously used by integration tests)

* Move entrypoints to the top of run_train.py

* fix typo in small_multi_stream_test.infer_multi_stream

* fix formatting

* Organize strings into enum.

* fix parser
* Implement best effort backward compatibility.

* use new `data_pathes` option to look up training data.

* fix integration tests

* linting

* correct spelling in config.py

* correct spelling in multi_stream_data_sampler.py

* fix typo "data_path_anmoi" -> "data_path_anemoi" in config.py

* Update test_config.py

* Add suggested comment.
* Fixed most parts of plot_train. Currently missing: handling of stage_configs when these are derived from an earlier stage.

* Removed outdated or unsupported options

* Fixed final problems with consolidated training/validation config. Required to move Stage to a more appropriate place

* Removed old, unused code
clessig and others added 15 commits February 16, 2026 10:48
* nse_metric

* length

---------

Co-authored-by: Jesica Pinyon Rodriguez <jpinyonr@login07.leonardo.local>
* Add Q-Q analysis metric for extreme value evaluation

* Add Q-Q analysis metric for extreme value evaluation

* Address PR review: refactor qq_analysis for clarity and extensibility

- Rename methods, add helper functions, fix naming conventions
- Make percentile thresholds configurable
- Create QuantilePlots class, implement generic coordinate handling
- All reviewer comments addressed

* Fix: remove duplicate inline import in plot_utils.py

* Remove JSON serialization bottleneck in Q-Q analysis, use xarray attributes

* Add coords='minimal' and combine_attrs='drop_conflicts' to plot_metrics concat for score maps

---------

Co-authored-by: iluise <72020169+iluise@users.noreply.github.com>
* accept JSON files in MergeReader

* restore original config

* merge scores

* add reader_type jsonmerge

* merge underlying array

* set run_id key

* add comments

* update docstring

* add config example

* remove opt arguments

* same var names scores

* remove debug message
* Taking config from forecast_release_v0 branch

* Updated config to develop
* Replace cf.rank==0 with utils.distributed.is_root

* add 2d rope to develop

* simplify assimilate global, forecast mode config

* add 2d rope to forecast eigine only once

* only keep global & forecast engine add 2d rope

* simplify the code

* fix lint

* small fix

* fix annotation

* fix lint

* add annotation

* default config

* fix default use_reentrant

* use_2d_rope false as defaut

* Add copyright notice for RoPE functions and update naming

- Add copyright attribution for rotate_half() and apply_rotary_pos_emb()
  functions

- Rename apply_rotary_pos_emb_2d() to rotary_pos_emb_2d() for consistency

- Rename config parameter use_2D_rope to rope_2D for better extensibility
  when supporting different RoPE variants in the future

* fix lint

* fix lint

* add 2d rope to all forecast steps

* more confs

* add missing enumerate

* def forecast config

* aux_info=None in Forecast Eng forward

* lint

* add rope to global engine, which was moved to encoder

* 1)init attention module with_2d_rope and rope_learnable_freq
2) add code for learnable frequencey

* solve some reviews

* fix lint

* fix 2 bugs: remove rope in QueryAggregation, and change bs in model.py

* temporally remove learnable rope

* add rope for register and class tokens; fix lint

* rename aux_info in queryaggregation

* remove position_ids and change raise valueError to assert

* batch size get from get_batch_size_from_config()

* revert to default config without batchsize

* use self.batch_size

* fix lint

---------

Co-authored-by: wang85 <wang85@jwlogin22.juwels>
Co-authored-by: wang85 <wang85@jwlogin24.juwels>
Co-authored-by: Jifeng Wang <wjifeng@santis-ln002.cscs.ch>
Co-authored-by: Jifeng Wang <wjifeng@santis-ln001.cscs.ch>
Co-authored-by: Kerem Tezcan <kctezcan@gmail.com>
Co-authored-by: wang85 <wang85@jwlogin21.juwels>
Co-authored-by: wang85 <wang85@jwlogin23.juwels>
* Fix inconsistencies related to logger info, docstrings. Linting

* Remove comment in utils.py

Removed comment about special handling for Q-Q analysis metric.
* Fix inconsistencies related to logger info, docstrings. Linting

* Improve the scatter plot title when it comes to time
* Replace cf.rank==0 with utils.distributed.is_root

* hot fix for rope config

---------

Co-authored-by: wang85 <wang85@jwlogin22.juwels>
Co-authored-by: wang85 <wang85@jwlogin24.juwels>
Co-authored-by: Jifeng Wang <wjifeng@santis-ln001.cscs.ch>
* Taking config from forecast_release_v0 branch

* Updated config to develop

* Rename default forecast config

* Increase fe_blocks from 8 to 16

* Added fine-tuning config
@github-actions github-actions bot added data Anything related to the datasets used in the project infra Issues related to infrastructure model Related to model training or definition (not generic infra) labels Feb 19, 2026
@github-actions github-actions bot added the eval anything related to the model evaluation pipeline label Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Anything related to the datasets used in the project eval anything related to the model evaluation pipeline infra Issues related to infrastructure model Related to model training or definition (not generic infra)

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

10 participants