Add new category containing interstitial benchmarks by zhonganr · Pull Request #337 · ddmms/ml-peg

zhonganr · 2026-02-04T10:33:58Z

Add a new category interstitial to assess the models' predictive performance for interstitial defect properties. Two benchmarks FE1SIA and Relastab are included:

FE1SIA evaluates the formation energy of a single self-interstitial atom (SIA) in a host lattice for distinct configurations.
Relastab evaluates the ability of models to correctly rank the stability of different interstitial configurations.
Related to Interstitial benchmarks #339

ElliottKasoar · 2026-02-20T16:12:59Z

Thanks for adding this and sharing the data! It's looking great so far!

ElliottKasoar

Thanks for addressing so much, @zhonganr, it's looking great!

I've left a few more minor questions/suggestions.

Also note that there are a few conflicts to be resolved with the main branch. Can you try rebasing? We are happy to help with this if you need it.

- Rename FE1SIA to Defectstab across all files and directories - Rename category from interstitial to defect - Split shared DB.zip into separate Defectstab.zip and Relastab.zip - Rename data folders to match benchmark names (Defectstab/, Relastab/) - Add default charge and spin for Orb model compatibility - Replace silent fallbacks with ValueError for unparseable reference energies - Update documentation references (relative_stability -> Relastab) - Remove debug print statements from analysis files

ElliottKasoar

Thanks for responding to all the comments! It's looking really good!

Just a few more minor suggestions/questions and I think we'll be good to merge.

ElliottKasoar

I think other than the discussion around the metrics/thresholds, this is pretty much good to go!

One other note: I think your documentation was deleted during one of your commits. If you could add back this back in (and update it following whatever decision we make regarding metrics) that would be great!

Follow GMTKN55 pattern: each subset gets its own column in the metrics table with its own threshold in metrics.yml (weight=0). Total metric columns keep weight=1 and drive the benchmark score. - Remove subset dropdowns and update_table_data callbacks from both apps - Remove inner @build_table loops saving per-subset JSON files - metrics fixture now returns {total_col: ..., subset_col: ...} directly - Add per-subset entries to both metrics.yml files with appropriate good/bad thresholds and weight=0

…stab

ElliottKasoar

Thanks for addressing the suggestions so quickly, @zhonganr!

I just spotted a few final minor things we may want to tidy, but otherwise I think we'll probably be good to merge!

ElliottKasoar · 2026-04-22T10:05:32Z

+        level_of_theory: DFT
+        weight: 1
+    fe_sia:
+        good: 0.0
+        bad: 1.32
+        unit: eV
+        tooltip: "RMSD of formation energies for Fe SIA subset"
+        level_of_theory: DFT
+        weight: 0
+    boroncarbide_stoichiometry:
+        good: 0.0
+        bad: 11.09
+        unit: eV
+        tooltip: "RMSD of formation energies for boron carbide stoichiometry subset"
+        level_of_theory: DFT
+        weight: 0
+    boroncarbide_defects:
+        good: 0.0
+        bad: 0.57
+        unit: eV
+        tooltip: "RMSD of formation energies for boron carbide defects subset"
+        level_of_theory: DFT
+        weight: 0
+    mapi_tetragonal:
+        good: 0.0
+        bad: 0.39
+        unit: eV
+        tooltip: "RMSD of formation energies for MAPI tetragonal subset"
+        level_of_theory: DFT


Suggested change

level_of_theory: DFT

weight: 1

fe_sia:

good: 0.0

bad: 1.32

unit: eV

tooltip: "RMSD of formation energies for Fe SIA subset"

level_of_theory: DFT

weight: 0

boroncarbide_stoichiometry:

good: 0.0

bad: 11.09

unit: eV

tooltip: "RMSD of formation energies for boron carbide stoichiometry subset"

level_of_theory: DFT

weight: 0

boroncarbide_defects:

good: 0.0

bad: 0.57

unit: eV

tooltip: "RMSD of formation energies for boron carbide defects subset"

level_of_theory: DFT

weight: 0

mapi_tetragonal:

good: 0.0

bad: 0.39

unit: eV

tooltip: "RMSD of formation energies for MAPI tetragonal subset"

level_of_theory: DFT

level_of_theory: PBE

weight: 1

fe_sia:

good: 0.0

bad: 1.32

unit: eV

tooltip: "RMSD of formation energies for Fe SIA subset"

level_of_theory: PBE

weight: 0

boroncarbide_stoichiometry:

good: 0.0

bad: 11.09

unit: eV

tooltip: "RMSD of formation energies for boron carbide stoichiometry subset"

level_of_theory: PBE

weight: 0

boroncarbide_defects:

good: 0.0

bad: 0.57

unit: eV

tooltip: "RMSD of formation energies for boron carbide defects subset"

level_of_theory: PBE

weight: 0

mapi_tetragonal:

good: 0.0

bad: 0.39

unit: eV

tooltip: "RMSD of formation energies for MAPI tetragonal subset"

level_of_theory: PBE

Generally we try to give the functional, since it's useful to know when a model has been trained on data using the same functional or not

Thanks for the remark, indeed it's better.
The functionals used in Defectstab are not all PBE, but I've specified the corresponding functional for each subset.

Ah I see, sorry I missed that, and thanks for fixing it!

Do you think it's still meaningful to combine them as an average then?

ElliottKasoar · 2026-04-22T15:08:28Z

+
+Reference data:
+
+* Computed from the DFT total energies provided with the input structures.


Could you add a note here either restating the functionals again or noting that the specific functionals are specified with the metrics above?

ElliottKasoar · 2026-04-22T15:08:45Z

+
+Reference data:
+
+* Computed from the DFT total energies provided with the input structures.


Could you add a note here either restating the different functionals again or noting that the specific functionals are specified with the metrics above?

ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Feb 6, 2026

ElliottKasoar reviewed Feb 20, 2026

View reviewed changes

joehart2001 mentioned this pull request Mar 24, 2026

Split vacancies benchmark #437

Open

5 tasks

zhonganr force-pushed the benchmark_interstitial branch from e526ac1 to fbf8301 Compare April 5, 2026 15:30

ElliottKasoar reviewed Apr 7, 2026

View reviewed changes

zhonganr added 3 commits April 11, 2026 11:35

Add new category containing two interstitial benchmarks

065cf3b

Address PR review

8e63dd9

zhonganr force-pushed the benchmark_interstitial branch from fbf8301 to 8e63dd9 Compare April 13, 2026 13:53

ElliottKasoar reviewed Apr 15, 2026

View reviewed changes

Address PR review

701fb43

ElliottKasoar reviewed Apr 20, 2026

View reviewed changes

zhonganr added 2 commits April 21, 2026 15:25

Add per-subset bad thresholds and app-table metric columns for Defect…

07baf71

…stab

ElliottKasoar reviewed Apr 22, 2026

View reviewed changes

Address PR review

7b7619a

ElliottKasoar reviewed Apr 22, 2026

View reviewed changes


		Reference data:

		* Computed from the DFT total energies provided with the input structures.

Conversation

zhonganr commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ElliottKasoar commented Feb 20, 2026

Uh oh!

ElliottKasoar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ElliottKasoar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ElliottKasoar left a comment

Choose a reason for hiding this comment

Uh oh!

ElliottKasoar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ElliottKasoar Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

zhonganr Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ElliottKasoar Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ElliottKasoar Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ElliottKasoar Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhonganr commented Feb 4, 2026 •

edited

Loading

ElliottKasoar Apr 22, 2026 •

edited

Loading