fixed scheduler checkpoint loading, typo#63
Closed
AnnihilatorChess wants to merge 51 commits intoPolymathicAI:masterfrom
Closed
fixed scheduler checkpoint loading, typo#63AnnihilatorChess wants to merge 51 commits intoPolymathicAI:masterfrom
AnnihilatorChess wants to merge 51 commits intoPolymathicAI:masterfrom
Conversation
- Add doc - Center badges
Add badges to README
Updated `shear_flow` results
* Add arXiv badge * Update link to arXiv paper
Califronia -> California
…eadme Fix shear_flow README.md
* List all the Well dataset in utils.py * Use the dataset list in download script * Order MHD datasets by dimension
Data: - Add Rayleigh Benard uniform dataset - Edit information about Shear Flow data Statistics and Metrics: - Add RMS statistics - Add Pearson correlation metrics Code Refactoring: - Refine video generation control - Refactor sample load from HDF5 - Add transformation and augmentation based on resizing and roation - Allow specifying for dataset split - Format with ruff
* Update citation after NeurIPS release * Update citation in docs too
* Add the Well dataset collection mention to HF card * Ignore streamlit local runs * Make uploaded dataset public by default * Add option to skip repacking HDF5 file * Increase CPU resources in the uploading script
* Factorize models with a BaseModel * Improve AFNO typing * Add tests for the different models * Do not pass dataset metadata to model * Remove unecessary arguments in super Co-authored-by: François Rozet <francois.rozet@outlook.com> --------- Co-authored-by: François Rozet <francois.rozet@outlook.com>
* Make models inherit from PytorchModelHubMixin * Rename upload -> upload_dataset * Add script draft for uploading models * Add config template to upload model * Add ReadMes for the 4 models to upload to HF * Specify data n_inputs in model upload config * Update FNO README * Complete model uploading script * Fix path issues in model uploading script * Improve model path and name retrieval * Change model path retrieval strategy * Change dataset -> model in upload folder method * Update README.md * Factorize models with a BaseModel * Add tests for the different models * Do not pass dataset metadata to model * Improve AFNO typing * Edit model path retrieval * Update links in FNO readme * Update FNO Readme * Add header to model READMEs * Add tables to model READMEs * Add code sample to load models to READMEs * Fix model instantiation * Simplify uploading script * Simplify uploading logic * Fix typo in spatial * Convert Omegaconf containers to be jsonable * Improve type checking enforcement Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com> * Simplify model path Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com> * Update datasetname variable in README code snippet * Apply suggested pathlib edits * Factorize model card generation * Remove duplicated header from model READMEs * Fix model card template name * Factorize further model README files * Fix dataset name in model card * Make model name variable in model card * Fix missing model name update * Fix typo in spatial resolution of UNetConvNext * Edit links in README with appropriate model names * Edit links in model README files --------- Co-authored-by: Ruben Ohana <50375255+rubenohana@users.noreply.github.com> Co-authored-by: Miles Cranmer <miles.cranmer@gmail.com>
* Change HF link to point to the Well collection * Document retrieval of checkpoints through HF
- Refactor DeltaWellDataset for time step differences - Refactor normalization - Fix AFNO and AViT models Co-authored-by: Payel Mukhopadhyay <payelmukhopadhyay180@gmail.com> Co-authored-by: Mike McCabe <mike.mccabe210@gmail.com>
* Increment version from 1.0.1 to 1.1.0 * Add list of maintainers * Add 3.13 to supported Python versions * Test max and min supported Python versions
* Add missing statistics * Remove try-except block causing silent failure * Add DeltaWellDataset to the list of data imports * Add dataset tests to check delta statistics * Round statistics to 4 decimal places * Fix argument in round function * Make compute statistics script parallel * Write stats with 4 decimal scientific notation * Edit yaml dumping for scientific notation * Factorize dataset download tests with fixtures * Reorganize dataset tests * Add comments to pytest fixtures * Simplify step selection * Raise error when stride and normalization are set
Co-authored-by: Payel Mukhopadhyay <payelmukhopadhyay180@gmail.com>
* Rewrite normalization tests Now only test the normalization class instead of the actual dataset stats. --------- Co-authored-by: Lucas Meyer <lucas.thibaut.meyer@gmail.com>
* added max rollout steps to dataset docstring * Update the_well/data/datasets.py Co-authored-by: Lucas Meyer <LTMeyer@users.noreply.github.com> --------- Co-authored-by: Lucas Meyer <LTMeyer@users.noreply.github.com>
* Add template for bug reports * Update already existing issue message * Add version and environment to issue template * Add code snippet to obtain version and environment * Fix typo in code snippet
…g_page Add missing symbolic link to rayleigh_benard_uniform
fix: stop overwriting `best.pt` every validation
fix: denominator calculation for short validation
Contributor
Author
Contributor
|
Thanks for the contribution @AnnihilatorChess . The change looks good at a quick glance, but I think it'll be a few days before someone can do a more detailed check. For now I'll trigger the workflow and make sure it doesn't break any of the tests. |
Collaborator
|
@AnnihilatorChess This was accidentally closed during a restructuring of the repo. We would love to have your contribution, so once we are done with the restructuring and release, we will ping you for submitting a PR again. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Hello,
I found a bug where resuming a run from a checkpoint incorrectly restarts the LR scheduler's warmup and cosine decay. This is because the Trainer in training.py saves and loads the optimizer state but not the lr_scheduler state.
This PR fixes the save_model and load_checkpoint methods to include the lr_scheduler.state_dict() in the checkpoint, ensuring that training resumes with the correct learning rate.
(I also fixed a small typo: optimizer_state_dit -> optimizer_state_dict.)"