Skip to content

Add new category containing interstitial benchmarks#337

Open
zhonganr wants to merge 7 commits intoddmms:mainfrom
zhonganr:benchmark_interstitial
Open

Add new category containing interstitial benchmarks#337
zhonganr wants to merge 7 commits intoddmms:mainfrom
zhonganr:benchmark_interstitial

Conversation

@zhonganr
Copy link
Copy Markdown

@zhonganr zhonganr commented Feb 4, 2026

Add a new category interstitial to assess the models' predictive performance for interstitial defect properties. Two benchmarks FE1SIA and Relastab are included:

  • FE1SIA evaluates the formation energy of a single self-interstitial atom (SIA) in a host lattice for distinct configurations.
  • Relastab evaluates the ability of models to correctly rank the stability of different interstitial configurations.
    Related to Interstitial benchmarks #339

@ElliottKasoar ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Feb 6, 2026
Comment thread ml_peg/calcs/interstitial/FE1SIA/calc_FE1SIA.py Outdated
Comment thread ml_peg/calcs/interstitial/FE1SIA/calc_FE1SIA.py Outdated
Comment thread ml_peg/calcs/interstitial/FE1SIA/calc_FE1SIA.py Outdated
Comment thread ml_peg/calcs/interstitial/Relastab/calc_Relastab.py Outdated
Comment thread ml_peg/calcs/interstitial/Relastab/calc_Relastab.py Outdated
Comment thread ml_peg/calcs/interstitial/Relastab/calc_Relastab.py Outdated
Comment thread .gitignore Outdated
@ElliottKasoar
Copy link
Copy Markdown
Collaborator

Thanks for adding this and sharing the data! It's looking great so far!

@joehart2001 joehart2001 mentioned this pull request Mar 24, 2026
5 tasks
@zhonganr zhonganr force-pushed the benchmark_interstitial branch from e526ac1 to fbf8301 Compare April 5, 2026 15:30
Copy link
Copy Markdown
Collaborator

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing so much, @zhonganr, it's looking great!

I've left a few more minor questions/suggestions.

Also note that there are a few conflicts to be resolved with the main branch. Can you try rebasing? We are happy to help with this if you need it.

Comment thread ml_peg/calcs/defect/Defectstab/calc_Defectstab.py
Comment thread ml_peg/calcs/defect/Relastab/calc_Relastab.py
Comment thread .gitignore Outdated
Comment thread uv.lock
Comment thread ml_peg/analysis/defect/Defectstab/analyse_Defectstab.py Outdated
Comment thread ml_peg/analysis/defect/Defectstab/analyse_Defectstab.py Outdated
Comment thread ml_peg/analysis/defect/Relastab/analyse_Relastab.py Outdated
Comment thread ml_peg/analysis/defect/Relastab/analyse_Relastab.py Outdated
Comment thread ml_peg/analysis/defect/Relastab/analyse_Relastab.py Outdated
Comment thread ml_peg/analysis/defect/Defectstab/analyse_Defectstab.py
- Rename FE1SIA to Defectstab across all files and directories
- Rename category from interstitial to defect
- Split shared DB.zip into separate Defectstab.zip and Relastab.zip
- Rename data folders to match benchmark names (Defectstab/, Relastab/)
- Add default charge and spin for Orb model compatibility
- Replace silent fallbacks with ValueError for unparseable reference energies
- Update documentation references (relative_stability -> Relastab)
- Remove debug print statements from analysis files
@zhonganr zhonganr force-pushed the benchmark_interstitial branch from fbf8301 to 8e63dd9 Compare April 13, 2026 13:53
Copy link
Copy Markdown
Collaborator

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for responding to all the comments! It's looking really good!

Just a few more minor suggestions/questions and I think we'll be good to merge.

Comment thread ml_peg/analysis/defect/Defectstab/metrics.yml
Comment thread ml_peg/app/defect/Defectstab/app_Defectstab.py Outdated
Comment thread ml_peg/analysis/defect/Relastab/analyse_Relastab.py
Comment thread ml_peg/app/defect/Defectstab/app_Defectstab.py Outdated
Comment thread ml_peg/app/defect/Relastab/app_Relastab.py Outdated
Comment thread ml_peg/analysis/defect/Relastab/analyse_Relastab.py
Comment thread ml_peg/analysis/defect/Defectstab/analyse_Defectstab.py
Copy link
Copy Markdown
Collaborator

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think other than the discussion around the metrics/thresholds, this is pretty much good to go!

One other note: I think your documentation was deleted during one of your commits. If you could add back this back in (and update it following whatever decision we make regarding metrics) that would be great!

Follow GMTKN55 pattern: each subset gets its own column in the metrics
table with its own threshold in metrics.yml (weight=0). Total metric
columns keep weight=1 and drive the benchmark score.

- Remove subset dropdowns and update_table_data callbacks from both apps
- Remove inner @build_table loops saving per-subset JSON files
- metrics fixture now returns {total_col: ..., subset_col: ...} directly
- Add per-subset entries to both metrics.yml files with appropriate
  good/bad thresholds and weight=0
Copy link
Copy Markdown
Collaborator

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the suggestions so quickly, @zhonganr!

I just spotted a few final minor things we may want to tidy, but otherwise I think we'll probably be good to merge!

Comment thread docs/source/user_guide/benchmarks/defect.rst
Comment thread ml_peg/analysis/defect/Defectstab/metrics.yml Outdated
Comment on lines +7 to +35
level_of_theory: DFT
weight: 1
fe_sia:
good: 0.0
bad: 1.32
unit: eV
tooltip: "RMSD of formation energies for Fe SIA subset"
level_of_theory: DFT
weight: 0
boroncarbide_stoichiometry:
good: 0.0
bad: 11.09
unit: eV
tooltip: "RMSD of formation energies for boron carbide stoichiometry subset"
level_of_theory: DFT
weight: 0
boroncarbide_defects:
good: 0.0
bad: 0.57
unit: eV
tooltip: "RMSD of formation energies for boron carbide defects subset"
level_of_theory: DFT
weight: 0
mapi_tetragonal:
good: 0.0
bad: 0.39
unit: eV
tooltip: "RMSD of formation energies for MAPI tetragonal subset"
level_of_theory: DFT
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
level_of_theory: DFT
weight: 1
fe_sia:
good: 0.0
bad: 1.32
unit: eV
tooltip: "RMSD of formation energies for Fe SIA subset"
level_of_theory: DFT
weight: 0
boroncarbide_stoichiometry:
good: 0.0
bad: 11.09
unit: eV
tooltip: "RMSD of formation energies for boron carbide stoichiometry subset"
level_of_theory: DFT
weight: 0
boroncarbide_defects:
good: 0.0
bad: 0.57
unit: eV
tooltip: "RMSD of formation energies for boron carbide defects subset"
level_of_theory: DFT
weight: 0
mapi_tetragonal:
good: 0.0
bad: 0.39
unit: eV
tooltip: "RMSD of formation energies for MAPI tetragonal subset"
level_of_theory: DFT
level_of_theory: PBE
weight: 1
fe_sia:
good: 0.0
bad: 1.32
unit: eV
tooltip: "RMSD of formation energies for Fe SIA subset"
level_of_theory: PBE
weight: 0
boroncarbide_stoichiometry:
good: 0.0
bad: 11.09
unit: eV
tooltip: "RMSD of formation energies for boron carbide stoichiometry subset"
level_of_theory: PBE
weight: 0
boroncarbide_defects:
good: 0.0
bad: 0.57
unit: eV
tooltip: "RMSD of formation energies for boron carbide defects subset"
level_of_theory: PBE
weight: 0
mapi_tetragonal:
good: 0.0
bad: 0.39
unit: eV
tooltip: "RMSD of formation energies for MAPI tetragonal subset"
level_of_theory: PBE

Generally we try to give the functional, since it's useful to know when a model has been trained on data using the same functional or not

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the remark, indeed it's better.
The functionals used in Defectstab are not all PBE, but I've specified the corresponding functional for each subset.

Copy link
Copy Markdown
Collaborator

@ElliottKasoar ElliottKasoar Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, sorry I missed that, and thanks for fixing it!

Do you think it's still meaningful to combine them as an average then?

Comment thread ml_peg/app/defect/Defectstab/app_Defectstab.py Outdated
Comment thread ml_peg/analysis/defect/Relastab/metrics.yml Outdated
Comment thread ml_peg/analysis/defect/Relastab/metrics.yml
Comment thread ml_peg/app/defect/Relastab/app_Relastab.py Outdated

Reference data:

* Computed from the DFT total energies provided with the input structures.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note here either restating the functionals again or noting that the specific functionals are specified with the metrics above?


Reference data:

* Computed from the DFT total energies provided with the input structures.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note here either restating the different functionals again or noting that the specific functionals are specified with the metrics above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new benchmark Proposals and suggestions for new benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants