SOF-7914 Update: use multiple MLFFs in Relax NB#327
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
📝 WalkthroughWalkthroughThis PR introduces an end-to-end MLFF-based structure relaxation workflow for Pyodide. It adds a unified MLFF factory ( ChangesMLFF Structure Relaxation for Pyodide
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (2)
src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py (2)
1118-1126: 💤 Low valueGuard check never matches —
_original_scriptattribute is never set.The check
hasattr(torch.jit, "_original_script")is intended to prevent double-patching, but nothing ever setstorch.jit._original_script. The_orig_scriptlocal variable is unused. If bothpatch_mattersim_deps()andpatch_nequip_deps()are called,torch.jit.scriptgets wrapped twice.♻️ Proposed fix to make guard effective
if not hasattr(torch.jit, "_original_script"): - _orig_script = torch.jit.script + torch.jit._original_script = torch.jit.script def _noop_script(obj=None, *a, **k): if obj is not None: return obj return lambda fn: fn torch.jit.script = _noop_script🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py` around lines 1118 - 1126, The guard currently checks hasattr(torch.jit, "_original_script") but never sets that attribute, so torch.jit.script can be wrapped multiple times (see _orig_script, torch.jit.script and the patch_mattersim_deps/patch_nequip_deps callers); fix by storing the original function onto torch.jit (e.g. assign torch.jit._original_script = _orig_script) the first time you patch and then replace torch.jit.script with the noop only if torch.jit._original_script is not present, and when unpatching or restoring use torch.jit._original_script to restore the original implementation.
1334-1451: ⚖️ Poor tradeoffSignificant code duplication with
patch_fairchem_deps().The
_hydra_instantiatefunction (lines 1347-1369), omegaconf stubs (lines 1386-1405), and tqdm stubs (lines 1443-1451) are nearly identical to implementations inpatch_fairchem_deps(). Consider extracting shared helpers to reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py` around lines 1334 - 1451, Duplicate logic (hydra instantiate, omegaconf stubs, tqdm stubs) appears here and in patch_fairchem_deps(); extract shared helpers to centralize behavior: create reusable functions (e.g., make_hydra_instantiate(), make_omegaconf_stub(), make_tqdm_stub() or a generic make_stub_module wrapper that returns configured modules) and call them from both locations instead of reimplementing _hydra_instantiate, omegaconf_mod creation, and _tqdm_passthrough; reference existing symbols _hydra_instantiate, omegaconf_mod, tqdm_mod, _tqdm_passthrough, _make_stub_module and the function patch_fairchem_deps to locate where to replace duplicated blocks. Ensure the new helpers preserve current behavior (recursive instantiation, OmegaConf.create/to_container/register_new_resolver, tqdm passthrough) and update imports/assignments so sys.modules entries remain identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@other/experiments/jupyterlite/relax_structure_with_mlff.ipynb`:
- Line 294: The current calculation uses area =
ase_original_interface.get_volume() / ase_original_interface.cell[2, 2], which
assumes an orthogonal cell; replace it by computing the in-plane area from the
lattice vectors (use ase_original_interface.cell and take the cross product of
the first two cell vectors) so the area works for tilted/non-orthogonal cells
(i.e., compute norm(cross(cell[0], cell[1])) and assign that to area).
- Around line 281-301: Before computing energies and dividing by atom counts,
validate that filter_atoms_by_tag returned non-empty Atom objects for
SUBSTRATE_TAG and FILM_TAG: check substrate_relaxed and layer_relaxed (and
optionally substrate_original/layer_original) to ensure n_substrate and n_layer
(from get_global_number_of_atoms()) are > 0 and raise a clear exception if not;
update the block around filter_atoms_by_tag, calculate_energy, and the
effective_delta_relaxed calculation to perform these checks and fail fast with a
descriptive error mentioning the missing tag (SUBSTRATE_TAG or FILM_TAG) and the
interface variables ase_original_interface / ase_final_interface so callers can
locate the problem.
- Line 314: The displayed conversion uses the wrong factor (dividing by 0.16);
update the f-string that prints effective_delta_relaxed (the string containing
"Effective relaxed Delta per area") to convert eV/Ų to J/m² using the correct
factor 16.021766 (i.e., multiply effective_delta_relaxed by 16.021766 when
showing J/m²) so the message shows accurate units.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/mattersim.py`:
- Around line 20-22: resolved_model_path may be None when both model_path and
checkpoint are unset, causing
MatterSimCalculator.from_checkpoint(load_path=str(resolved_model_path), ...) to
receive the string "None"; add a guard after computing resolved_model_path
(checking model_path and checkpoint) and before calling
MatterSimCalculator.from_checkpoint: if resolved_model_path is falsy, raise a
clear ValueError (or provide a valid default) explaining that a model_path or
checkpoint is required (mentioning model_path/checkpoint and
MatterSimCalculator.from_checkpoint in the message) so the failure is explicit
and debuggable.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/nequip.py`:
- Around line 38-41: The code currently sets resolved_model_path = model_path or
checkpoint and then calls load_nequip_model(str(resolved_model_path)), which
ends up passing "None" to torch.load if both model_path and checkpoint are
unset; update the non-Pyodide path to validate inputs and raise a ValueError
when both model_path and checkpoint are falsy before calling load_nequip_model;
i.e., check resolved_model_path (or check model_path and checkpoint) and raise a
clear ValueError mentioning that a model path or checkpoint is required, then
proceed to call load_nequip_model and create_nequip_calculator_from_model as
before.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py`:
- Around line 584-585: The current numba_mod.njit implementation returns the
same wrapper in both branches, losing proper decorator semantics; change
numba_mod.njit so that if called directly as `@njit` (i.e., first positional arg
exists and is callable) it returns that function unchanged, otherwise it returns
a decorator that accepts a function and returns it; keep numba_mod.jit aliased
to numba_mod.njit. Use the symbols numba_mod.njit and numba_mod.jit to locate
and update the logic.
- Around line 1190-1208: The _scatter function currently calls index.max() when
dim_size is None which raises on empty tensors; update the dim_size
determination in _scatter to check for an empty index (e.g., index.numel() == 0)
and handle that case before calling index.max(): if index is empty set dim_size
to 0 (or another appropriate minimal size for your use-case) so subsequent
tensor allocations like out = _torch.zeros(...) and count logic don't call max
on an empty tensor; adjust the mean branch to handle dim_size == 0 (early return
of empty out) so functions _scatter, variable index, and count handling remain
safe for empty inputs.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/uma.py`:
- Around line 23-26: create_uma_calculator currently sets resolved_model_path =
model_path or checkpoint and immediately calls
load_predict_unit(str(resolved_model_path), ...), which will pass the string
"None" if both are missing; add an explicit guard in create_uma_calculator that,
when NOT running under Pyodide, checks that either model_path or checkpoint is
provided (i.e., resolved_model_path is not None) and raises a clear error (or
returns early) before calling load_predict_unit; reference the
resolved_model_path variable and the load_predict_unit and FAIRChemCalculator
creation so the check happens right before the load_predict_unit invocation.
---
Nitpick comments:
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py`:
- Around line 1118-1126: The guard currently checks hasattr(torch.jit,
"_original_script") but never sets that attribute, so torch.jit.script can be
wrapped multiple times (see _orig_script, torch.jit.script and the
patch_mattersim_deps/patch_nequip_deps callers); fix by storing the original
function onto torch.jit (e.g. assign torch.jit._original_script = _orig_script)
the first time you patch and then replace torch.jit.script with the noop only if
torch.jit._original_script is not present, and when unpatching or restoring use
torch.jit._original_script to restore the original implementation.
- Around line 1334-1451: Duplicate logic (hydra instantiate, omegaconf stubs,
tqdm stubs) appears here and in patch_fairchem_deps(); extract shared helpers to
centralize behavior: create reusable functions (e.g., make_hydra_instantiate(),
make_omegaconf_stub(), make_tqdm_stub() or a generic make_stub_module wrapper
that returns configured modules) and call them from both locations instead of
reimplementing _hydra_instantiate, omegaconf_mod creation, and
_tqdm_passthrough; reference existing symbols _hydra_instantiate, omegaconf_mod,
tqdm_mod, _tqdm_passthrough, _make_stub_module and the function
patch_fairchem_deps to locate where to replace duplicated blocks. Ensure the new
helpers preserve current behavior (recursive instantiation,
OmegaConf.create/to_container/register_new_resolver, tqdm passthrough) and
update imports/assignments so sys.modules entries remain identical.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c418c69d-d06c-401b-8022-3e0ede00086f
📒 Files selected for processing (7)
other/experiments/jupyterlite/relax_structure_with_mlff.ipynbsrc/py/mat3ra/notebooks_utils/mlff.pysrc/py/mat3ra/notebooks_utils/pyodide/packages/mace.pysrc/py/mat3ra/notebooks_utils/pyodide/packages/mattersim.pysrc/py/mat3ra/notebooks_utils/pyodide/packages/nequip.pysrc/py/mat3ra/notebooks_utils/pyodide/packages/torch.pysrc/py/mat3ra/notebooks_utils/pyodide/packages/uma.py
| "substrate_original = filter_atoms_by_tag(ase_original_interface, SUBSTRATE_TAG)\n", | ||
| "layer_original = filter_atoms_by_tag(ase_original_interface, FILM_TAG)\n", | ||
| "substrate_relaxed = filter_atoms_by_tag(ase_final_interface, SUBSTRATE_TAG)\n", | ||
| "layer_relaxed = filter_atoms_by_tag(ase_final_interface, FILM_TAG)\n", | ||
| "\n", | ||
| "original_substrate_energy = calculate_energy(substrate_original, calculator)\n", | ||
| "original_layer_energy = calculate_energy(layer_original, calculator)\n", | ||
| "relaxed_substrate_energy = calculate_energy(substrate_relaxed, calculator)\n", | ||
| "relaxed_layer_energy = calculate_energy(layer_relaxed, calculator)\n", | ||
| "\n", | ||
| "delta_original = calculate_delta_energy(original_energy, original_substrate_energy, original_layer_energy)\n", | ||
| "delta_relaxed = calculate_delta_energy(relaxed_energy, relaxed_substrate_energy, relaxed_layer_energy)\n", | ||
| "\n", | ||
| "area = ase_original_interface.get_volume() / ase_original_interface.cell[2, 2]\n", | ||
| "n_interface = ase_final_interface.get_global_number_of_atoms()\n", | ||
| "n_substrate = substrate_relaxed.get_global_number_of_atoms()\n", | ||
| "n_layer = layer_relaxed.get_global_number_of_atoms()\n", | ||
| "effective_delta_relaxed = (\n", | ||
| " relaxed_energy / n_interface\n", | ||
| " - (relaxed_substrate_energy / n_substrate + relaxed_layer_energy / n_layer)\n", | ||
| " ) / (2 * area)\n", |
There was a problem hiding this comment.
Add explicit tag validation before energy decomposition.
If the interface lacks one of the expected tags, n_substrate or n_layer can be zero, causing failures in Line 300 and misleading energy outputs. Fail fast with a clear error.
💡 Proposed fix
substrate_original = filter_atoms_by_tag(ase_original_interface, SUBSTRATE_TAG)
layer_original = filter_atoms_by_tag(ase_original_interface, FILM_TAG)
substrate_relaxed = filter_atoms_by_tag(ase_final_interface, SUBSTRATE_TAG)
layer_relaxed = filter_atoms_by_tag(ase_final_interface, FILM_TAG)
+
+if len(substrate_original) == 0 or len(layer_original) == 0:
+ raise ValueError(
+ "Missing substrate/film tags in the original interface. "
+ "Expected non-empty groups for SUBSTRATE_TAG and FILM_TAG."
+ )
+if len(substrate_relaxed) == 0 or len(layer_relaxed) == 0:
+ raise ValueError(
+ "Missing substrate/film tags in the relaxed interface. "
+ "Expected non-empty groups for SUBSTRATE_TAG and FILM_TAG."
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/experiments/jupyterlite/relax_structure_with_mlff.ipynb` around lines
281 - 301, Before computing energies and dividing by atom counts, validate that
filter_atoms_by_tag returned non-empty Atom objects for SUBSTRATE_TAG and
FILM_TAG: check substrate_relaxed and layer_relaxed (and optionally
substrate_original/layer_original) to ensure n_substrate and n_layer (from
get_global_number_of_atoms()) are > 0 and raise a clear exception if not; update
the block around filter_atoms_by_tag, calculate_energy, and the
effective_delta_relaxed calculation to perform these checks and fail fast with a
descriptive error mentioning the missing tag (SUBSTRATE_TAG or FILM_TAG) and the
interface variables ase_original_interface / ase_final_interface so callers can
locate the problem.
| "delta_original = calculate_delta_energy(original_energy, original_substrate_energy, original_layer_energy)\n", | ||
| "delta_relaxed = calculate_delta_energy(relaxed_energy, relaxed_substrate_energy, relaxed_layer_energy)\n", | ||
| "\n", | ||
| "area = ase_original_interface.get_volume() / ase_original_interface.cell[2, 2]\n", |
There was a problem hiding this comment.
Use a general in-plane area formula, not volume / c_z.
Line 294 assumes an orthogonal cell and c-axis aligned with z. For tilted/non-orthogonal cells this gives the wrong interface area and corrupts all per-area metrics.
💡 Proposed fix
+import numpy as np
- area = ase_original_interface.get_volume() / ase_original_interface.cell[2, 2]
+cell = ase_original_interface.cell
+area = np.linalg.norm(np.cross(cell[0], cell[1]))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/experiments/jupyterlite/relax_structure_with_mlff.ipynb` at line 294,
The current calculation uses area = ase_original_interface.get_volume() /
ase_original_interface.cell[2, 2], which assumes an orthogonal cell; replace it
by computing the in-plane area from the lattice vectors (use
ase_original_interface.cell and take the cross product of the first two cell
vectors) so the area works for tilted/non-orthogonal cells (i.e., compute
norm(cross(cell[0], cell[1])) and assign that to area).
| "print(f\"Relaxed Delta per area: {delta_relaxed / area:.4f} eV/Ang^2\")\n", | ||
| "print(f\"Relaxed interface energy: {relaxed_energy:.4f} eV\")\n", | ||
| "print(\n", | ||
| " f\"Effective relaxed Delta per area: {effective_delta_relaxed:.4f} eV/Ang^2 ({effective_delta_relaxed / 0.16:.4f} J/m^2)\")" |
There was a problem hiding this comment.
Fix the eV/Ų → J/m² conversion factor.
Line 314 divides by 0.16, which is not the correct conversion.
1 eV/Ų = 16.021766 J/m².
💡 Proposed fix
+EV_PER_ANG2_TO_J_PER_M2 = 16.021766
print(
- f"Effective relaxed Delta per area: {effective_delta_relaxed:.4f} eV/Ang^2 ({effective_delta_relaxed / 0.16:.4f} J/m^2)")
+ f"Effective relaxed Delta per area: {effective_delta_relaxed:.4f} eV/Ang^2 "
+ f"({effective_delta_relaxed * EV_PER_ANG2_TO_J_PER_M2:.4f} J/m^2)"
+)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/experiments/jupyterlite/relax_structure_with_mlff.ipynb` at line 314,
The displayed conversion uses the wrong factor (dividing by 0.16); update the
f-string that prints effective_delta_relaxed (the string containing "Effective
relaxed Delta per area") to convert eV/Ų to J/m² using the correct factor
16.021766 (i.e., multiply effective_delta_relaxed by 16.021766 when showing
J/m²) so the message shows accurate units.
| resolved_model_path = model_path or checkpoint | ||
|
|
||
| return MatterSimCalculator.from_checkpoint(load_path=str(resolved_model_path), device=device, **kwargs) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Show the exact create_mattersim_calculator implementation around the reported lines
FILE="src/py/mat3ra/notebooks_utils/pyodide/packages/mattersim.py"
echo "=== $FILE ==="
nl -ba "$FILE" | sed -n '1,120p'
# 2) Find all usages of create_mattersim_calculator in the repo
echo
echo "=== rg usages of create_mattersim_calculator ==="
rg -n "create_mattersim_calculator\s*\(" -S .
# 3) Inspect the jupyterlite notebook settings for mattersim MLFF_SETTINGS (look for "mattersim")
NB="other/experiments/jupyterlite/relax_structure_with_mlff.ipynb"
if [ -f "$NB" ]; then
echo
echo "=== grep mattersim config in $NB ==="
# Show only small relevant sections to avoid huge output
rg -n "\"mattersim\"|mattersim" "$NB" || true
# Extract a likely JSON path containing MLFF_SETTINGS["mattersim"]
echo
echo "=== extract nearby JSON around \"mattersim\" occurrences ==="
# Print ~80 lines around first occurrences
python3 - <<'PY'
import json, re
from pathlib import Path
nb_path = Path("other/experiments/jupyterlite/relax_structure_with_mlff.ipynb")
data = json.loads(nb_path.read_text(encoding="utf-8"))
matches = []
for cell in data.get("cells", []):
src = "".join(cell.get("source", []))
if "MLFF_SETTINGS" in src or "mattersim" in src or "mattersim" in src.lower():
if "mattersim" in src:
matches.append(src)
# Fallback: if not found in cells, scan full text for a segment
if not matches:
text = nb_path.read_text(encoding="utf-8")
idx = text.lower().find("mattersim")
if idx != -1:
start = max(0, idx-500)
end = min(len(text), idx+2000)
print(text[start:end])
else:
# Print only short excerpts
for i, m in enumerate(matches[:6]):
m2 = m
if len(m2) > 2000: m2 = m2[:2000] + "\n... (truncated)"
print(f"\n--- cell excerpt {i+1} ---\n{m2}\n")
PY
else
echo "Notebook not found at $NB"
fiRepository: Exabyte-io/api-examples
Length of output: 196
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 0) Locate the notebook file (in case the path differs)
echo "=== fd relax_structure_with_mlff.ipynb ==="
fd "relax_structure_with_mlff\.ipynb$" . || true
# 1) Show create_mattersim_calculator implementation around the reported lines
FILE="src/py/mat3ra/notebooks_utils/pyodide/packages/mattersim.py"
echo
echo "=== $FILE (around lines 1-80) ==="
sed -n '1,80p' "$FILE"
echo
echo "=== $FILE (around lines 15-35) ==="
sed -n '15,35p' "$FILE"
# 2) Find all usages of create_mattersim_calculator in the repo
echo
echo "=== rg usages of create_mattersim_calculator ==="
rg -n "create_mattersim_calculator\s*\(" -S .
# 3) Find possible non-pyodide call paths for mattersim (where non-pyodide branch might be hit)
echo
echo "=== rg for get_mattersim_model_pyodide / MatterSimCalculator ==="
rg -n "get_mattersim_model_pyodide|MatterSimCalculator" -S .
# 4) Inspect MLFF_SETTINGS["mattersim"] inside the Jupyterlite notebook (if present)
NB_CAND="other/experiments/jupyterlite/relax_structure_with_mlff.ipynb"
if [ -f "$NB_CAND" ]; then
NB="$NB_CAND"
else
# try any matching notebook found via fd
NB="$(fd "relax_structure_with_mlff\.ipynb$" . -1 | head -n 1 || true)"
fi
echo
echo "=== notebook MLFF_SETTINGS extraction (if found) ==="
if [ -n "${NB:-}" ] && [ -f "$NB" ]; then
echo "Using notebook: $NB"
python3 - <<'PY'
import json, re
from pathlib import Path
nb_path = Path("other/experiments/jupyterlite/relax_structure_with_mlff.ipynb")
# if the above is not correct, the script will rewrite below:
# (we pass via env)
PY
else
echo "Notebook not found; skipping MLFF_SETTINGS extraction."
fiRepository: Exabyte-io/api-examples
Length of output: 4073
🏁 Script executed:
#!/bin/bash
set -euo pipefail
NB="other/experiments/jupyterlite/relax_structure_with_mlff.ipynb"
FILE="src/py/mat3ra/notebooks_utils/primitive/environment.py"
echo "=== inspect is_pyodide_environment() ==="
sed -n '1,200p' "$FILE" | nl -ba | sed -n '1,160p' || sed -n '1,160p' "$FILE"
echo
echo "=== rg for create_mattersim_calculator in the notebook ==="
rg -n "create_mattersim_calculator|MLFF_SETTINGS|mattersim" -S "$NB" || true
echo
echo "=== extract MLFF_SETTINGS[\"mattersim\"] and related bits ==="
python3 - <<'PY'
import json, re
from pathlib import Path
nb_path = Path("other/experiments/jupyterlite/relax_structure_with_mlff.ipynb")
nb = json.loads(nb_path.read_text(encoding="utf-8"))
def join_source(cell):
src = cell.get("source", [])
if isinstance(src, list):
return "".join(src)
return str(src)
hits = []
for i, cell in enumerate(nb.get("cells", [])):
if cell.get("cell_type") != "code":
continue
src = join_source(cell)
if "MLFF_SETTINGS" in src or "mattersim" in src.lower():
hits.append((i+1, src))
print(f"code cells with MLFF_SETTINGS or mattersim: {len(hits)}")
for (cell_no, src) in hits[:20]:
print("\n--- code cell", cell_no, "---")
# keep output short
if len(src) > 4000:
print(src[:4000] + "\n...<truncated>...")
else:
print(src)
# Try to locate a literal JSON/Python dict for MLFF_SETTINGS in the raw text
text = nb_path.read_text(encoding="utf-8")
m = re.search(r"MLFF_SETTINGS\s*=\s*({.*?})\s*[\r\n]", text, flags=re.S)
if m:
raw = m.group(1)
print("\n=== Found MLFF_SETTINGS assignment (best-effort excerpt) ===")
print(raw[:5000] + ("\n...<truncated>..." if len(raw) > 5000 else ""))
else:
print("\n=== Could not find a simple MLFF_SETTINGS = {...} literal; falling back to targeted scan ===")
# Target scan for mattersim block near "mattersim"
idx = text.lower().find('"mattersim"')
if idx == -1:
idx = text.lower().find("mattersim")
if idx != -1:
start = max(0, idx-1200)
end = min(len(text), idx+2400)
excerpt = text[start:end]
print("\n=== Text excerpt around mattersim ===")
print(excerpt[:8000] + ("\n...<truncated>..." if len(excerpt) > 8000 else ""))
else:
print("No mattersim string found in notebook text (unexpected).")
PYRepository: Exabyte-io/api-examples
Length of output: 11950
Guard missing checkpoint/path before loading MatterSim model (non-Pyodide).
When both model_path and checkpoint are unset, resolved_model_path becomes None, and load_path=str(resolved_model_path) passes the literal "None" into MatterSimCalculator.from_checkpoint(...), leading to an opaque downstream failure. This can occur for non-Pyodide runs because the Jupyterlite notebook config for mattersim provides only {"model": "1m", "device": "cpu"} (no model_path/checkpoint).
Suggested fix
def create_mattersim_calculator(model="1m", device="cpu", model_path=None, checkpoint=None, **kwargs):
if is_pyodide_environment():
return get_mattersim_model_pyodide(model=model, device=device, **kwargs)
resolved_model_path = model_path or checkpoint
+ if resolved_model_path is None:
+ raise ValueError("For non-Pyodide runs, provide `model_path` or `checkpoint`.")
return MatterSimCalculator.from_checkpoint(load_path=str(resolved_model_path), device=device, **kwargs)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resolved_model_path = model_path or checkpoint | |
| return MatterSimCalculator.from_checkpoint(load_path=str(resolved_model_path), device=device, **kwargs) | |
| resolved_model_path = model_path or checkpoint | |
| if resolved_model_path is None: | |
| raise ValueError("For non-Pyodide runs, provide `model_path` or `checkpoint`.") | |
| return MatterSimCalculator.from_checkpoint(load_path=str(resolved_model_path), device=device, **kwargs) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/mattersim.py` around lines 20
- 22, resolved_model_path may be None when both model_path and checkpoint are
unset, causing
MatterSimCalculator.from_checkpoint(load_path=str(resolved_model_path), ...) to
receive the string "None"; add a guard after computing resolved_model_path
(checking model_path and checkpoint) and before calling
MatterSimCalculator.from_checkpoint: if resolved_model_path is falsy, raise a
clear ValueError (or provide a valid default) explaining that a model_path or
checkpoint is required (mentioning model_path/checkpoint and
MatterSimCalculator.from_checkpoint in the message) so the failure is explicit
and debuggable.
| resolved_model_path = model_path or checkpoint | ||
|
|
||
| nequip_model = load_nequip_model(str(resolved_model_path)) | ||
| return create_nequip_calculator_from_model(nequip_model, device=device) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the target file
ls -la src/py/mat3ra/notebooks_utils/pyodide/packages/nequip.py
# Show the relevant function and surrounding context (small ranges)
python3 - <<'PY'
from pathlib import Path
p = Path("src/py/mat3ra/notebooks_utils/pyodide/packages/nequip.py")
text = p.read_text(encoding="utf-8").splitlines()
for i in range(1, len(text)+1):
if "def create_nequip_calculator" in text[i-1]:
start = max(1, i-20)
end = min(len(text), i+80)
for j in range(start, end+1):
print(f"{j}:{text[j-1]}")
break
else:
print("create_nequip_calculator not found")
PY
# Find and show load_nequip_model implementation/usages
rg -n "def load_nequip_model|load_nequip_model\(" src/py/mat3ra/notebooks_utils/pyodide/packages/nequip.py src/py/mat3ra/notebooks_utils/pyodide/packages -S
# If load_nequip_model is defined elsewhere, locate it
rg -n "def load_nequip_model" src/py/mat3ra/notebooks_utils/pyodide/packages -S || true
rg -n "load_nequip_model" src/py/mat3ra/notebooks_utils/pyodide/packages -S
# Check call sites that might pass None explicitly
rg -n "create_nequip_calculator\(" src/py/mat3ra -SRepository: Exabyte-io/api-examples
Length of output: 3026
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show load_nequip_model implementation (small, targeted range)
python3 - <<'PY'
from pathlib import Path
p = Path("src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py")
lines = p.read_text(encoding="utf-8").splitlines()
# find def load_nequip_model
for idx, line in enumerate(lines, start=1):
if line.startswith("def load_nequip_model"):
start = max(1, idx-20)
end = min(len(lines), idx+120)
for j in range(start, end+1):
print(f"{j}:{lines[j-1]}")
break
else:
print("load_nequip_model not found")
PY
# Quick look for how checkpoint_path is used in that function (context)
rg -n "load_nequip_model|checkpoint_path" src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py -nRepository: Exabyte-io/api-examples
Length of output: 3341
Add fail-fast validation for missing model_path/checkpoint in non-Pyodide path.
In src/py/mat3ra/notebooks_utils/pyodide/packages/nequip.py (lines 38-41), resolved_model_path = model_path or checkpoint can become None; the code then calls load_nequip_model(str(resolved_model_path)), which passes "None" to torch.load(...) and fails with an unclear error (attempting to load a file literally named "None"). Raise a ValueError when both are unset.
Suggested fix
def create_nequip_calculator(model="oam_s", device="cpu", model_path=None, checkpoint=None):
if is_pyodide_environment():
return get_nequip_model_pyodide(model=model, device=device)
resolved_model_path = model_path or checkpoint
+ if resolved_model_path is None:
+ raise ValueError("For non-Pyodide runs, provide `model_path` or `checkpoint`.")
nequip_model = load_nequip_model(str(resolved_model_path))
return create_nequip_calculator_from_model(nequip_model, device=device)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resolved_model_path = model_path or checkpoint | |
| nequip_model = load_nequip_model(str(resolved_model_path)) | |
| return create_nequip_calculator_from_model(nequip_model, device=device) | |
| resolved_model_path = model_path or checkpoint | |
| if resolved_model_path is None: | |
| raise ValueError("For non-Pyodide runs, provide `model_path` or `checkpoint`.") | |
| nequip_model = load_nequip_model(str(resolved_model_path)) | |
| return create_nequip_calculator_from_model(nequip_model, device=device) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/nequip.py` around lines 38 -
41, The code currently sets resolved_model_path = model_path or checkpoint and
then calls load_nequip_model(str(resolved_model_path)), which ends up passing
"None" to torch.load if both model_path and checkpoint are unset; update the
non-Pyodide path to validate inputs and raise a ValueError when both model_path
and checkpoint are falsy before calling load_nequip_model; i.e., check
resolved_model_path (or check model_path and checkpoint) and raise a clear
ValueError mentioning that a model path or checkpoint is required, then proceed
to call load_nequip_model and create_nequip_calculator_from_model as before.
| numba_mod.njit = lambda *a, **k: (lambda f: f) if not a or callable(a[0]) else lambda f: f | ||
| numba_mod.jit = numba_mod.njit |
There was a problem hiding this comment.
Both branches of ternary expression are identical — decorator semantics incorrect.
The condition not a or callable(a[0]) returns lambda f: f regardless of outcome. When @njit is used without parentheses (i.e., a[0] is the decorated function), it should return that function directly, not another wrapper.
🐛 Proposed fix
- numba_mod.njit = lambda *a, **k: (lambda f: f) if not a or callable(a[0]) else lambda f: f
+ numba_mod.njit = lambda *a, **k: a[0] if a and callable(a[0]) else lambda f: f🧰 Tools
🪛 Ruff (0.15.13)
[warning] 584-584: Useless if-else condition
(RUF034)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py` around lines 584 -
585, The current numba_mod.njit implementation returns the same wrapper in both
branches, losing proper decorator semantics; change numba_mod.njit so that if
called directly as `@njit` (i.e., first positional arg exists and is callable) it
returns that function unchanged, otherwise it returns a decorator that accepts a
function and returns it; keep numba_mod.jit aliased to numba_mod.njit. Use the
symbols numba_mod.njit and numba_mod.jit to locate and update the logic.
| def _scatter(src, index, dim_size=None, dim=0, reduce="sum"): | ||
| if dim_size is None: | ||
| dim_size = int(index.max()) + 1 | ||
| out = _torch.zeros(dim_size, *src.shape[1:], dtype=src.dtype, device=src.device) | ||
| if src.dim() == 1: | ||
| idx = index | ||
| else: | ||
| idx = index.unsqueeze(-1).expand_as(src) | ||
| if reduce == "sum" or reduce == "add": | ||
| out.scatter_add_(0, idx, src) | ||
| elif reduce == "mean": | ||
| out.scatter_add_(0, idx, src) | ||
| count = _torch.zeros(dim_size, dtype=src.dtype, device=src.device) | ||
| count.scatter_add_(0, index, _torch.ones(index.shape[0], dtype=src.dtype, device=src.device)) | ||
| count = count.clamp(min=1) | ||
| if src.dim() > 1: | ||
| count = count.unsqueeze(-1) | ||
| out = out / count | ||
| return out |
There was a problem hiding this comment.
index.max() will raise RuntimeError on empty tensor.
If _scatter is called with an empty index tensor and dim_size=None, index.max() fails with "reduce over no elements". Consider adding an empty-tensor guard.
🛡️ Proposed fix
def _scatter(src, index, dim_size=None, dim=0, reduce="sum"):
if dim_size is None:
- dim_size = int(index.max()) + 1
+ dim_size = int(index.max()) + 1 if index.numel() > 0 else 0
out = _torch.zeros(dim_size, *src.shape[1:], dtype=src.dtype, device=src.device)
+ if src.numel() == 0:
+ return out🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/torch.py` around lines 1190 -
1208, The _scatter function currently calls index.max() when dim_size is None
which raises on empty tensors; update the dim_size determination in _scatter to
check for an empty index (e.g., index.numel() == 0) and handle that case before
calling index.max(): if index is empty set dim_size to 0 (or another appropriate
minimal size for your use-case) so subsequent tensor allocations like out =
_torch.zeros(...) and count logic don't call max on an empty tensor; adjust the
mean branch to handle dim_size == 0 (early return of empty out) so functions
_scatter, variable index, and count handling remain safe for empty inputs.
| resolved_model_path = model_path or checkpoint | ||
|
|
||
| predictor = load_predict_unit(str(resolved_model_path), device=device, **kwargs) | ||
| return FAIRChemCalculator(predictor, task_name=task_name) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and display the relevant function in the target file
FILE="src/py/mat3ra/notebooks_utils/pyodide/packages/uma.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE" >&2
exit 1
fi
echo "=== File: $FILE (lines around 1-120) ==="
nl -ba "$FILE" | sed -n '1,140p'
echo
echo "=== Search for load_predict_unit definition/usages ==="
# Search in repo for load_predict_unit
rg -n "load_predict_unit" -S . || true
echo
echo "=== If load_predict_unit is imported, show import block ==="
rg -n "from .* import .*load_predict_unit|import .*load_predict_unit" -S "$FILE" || trueRepository: Exabyte-io/api-examples
Length of output: 218
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="src/py/mat3ra/notebooks_utils/pyodide/packages/uma.py"
if [ ! -f "$FILE" ]; then
echo "Missing file: $FILE" >&2
exit 1
fi
echo "=== File: $FILE (first ~200 lines with line numbers) ==="
cat -n "$FILE" | sed -n '1,200p'
echo
echo "=== Search for create_uma_calculator ==="
rg -n "def create_uma_calculator" -S "$FILE"
echo
echo "=== Show create_uma_calculator function body (approx lines around match) ==="
# Print a window around the definition
start=$(rg -n "def create_uma_calculator" "$FILE" | head -n1 | cut -d: -f1)
end=$((start+80))
cat -n "$FILE" | sed -n "${start},${end}p"
echo
echo "=== Search for load_predict_unit usage/definition ==="
rg -n "load_predict_unit" -S . || true
echo
echo "=== Search imports related to load_predict_unit in target file ==="
rg -n "load_predict_unit" -S "$FILE" || trueRepository: Exabyte-io/api-examples
Length of output: 2976
Validate non-Pyodide model_path/checkpoint before calling load_predict_unit.
create_uma_calculator sets resolved_model_path = model_path or checkpoint; if both are absent this becomes None, and load_predict_unit(str(resolved_model_path), ...) passes "None", leading to a confusing runtime failure. Add an explicit None guard (non-Pyodide only).
Suggested fix
def create_uma_calculator(model="f16", task_name="omat", device="cpu", model_path=None, checkpoint=None, **kwargs):
if is_pyodide_environment():
return get_uma_model_pyodide(model=model, task_name=task_name, device=device, **kwargs)
resolved_model_path = model_path or checkpoint
+ if resolved_model_path is None:
+ raise ValueError("For non-Pyodide runs, provide `model_path` or `checkpoint`.")
predictor = load_predict_unit(str(resolved_model_path), device=device, **kwargs)
return FAIRChemCalculator(predictor, task_name=task_name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resolved_model_path = model_path or checkpoint | |
| predictor = load_predict_unit(str(resolved_model_path), device=device, **kwargs) | |
| return FAIRChemCalculator(predictor, task_name=task_name) | |
| resolved_model_path = model_path or checkpoint | |
| if resolved_model_path is None: | |
| raise ValueError("For non-Pyodide runs, provide `model_path` or `checkpoint`.") | |
| predictor = load_predict_unit(str(resolved_model_path), device=device, **kwargs) | |
| return FAIRChemCalculator(predictor, task_name=task_name) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/py/mat3ra/notebooks_utils/pyodide/packages/uma.py` around lines 23 - 26,
create_uma_calculator currently sets resolved_model_path = model_path or
checkpoint and immediately calls load_predict_unit(str(resolved_model_path),
...), which will pass the string "None" if both are missing; add an explicit
guard in create_uma_calculator that, when NOT running under Pyodide, checks that
either model_path or checkpoint is provided (i.e., resolved_model_path is not
None) and raises a clear error (or returns early) before calling
load_predict_unit; reference the resolved_model_path variable and the
load_predict_unit and FAIRChemCalculator creation so the check happens right
before the load_predict_unit invocation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@other/experiments/jupyterlite/relax_structure_with_mlff.ipynb`:
- Around line 85-91: The notebook imports symbols that don't exist in
mat3ra.notebooks_utils.mlff (ensure_mlff_switchable and mark_active_mlff),
causing an ImportError; either update the import line to use the actual exported
names from src/py/mat3ra/notebooks_utils/mlff.py (replace ensure_mlff_switchable
and/or mark_active_mlff with the correct function names), or add/rename wrapper
functions in mlff.py to export ensure_mlff_switchable and mark_active_mlff, and
keep get_mlff_install_profiles as-is; ensure the imported symbols referenced in
the notebook (ensure_mlff_switchable, get_mlff_install_profiles,
mark_active_mlff) match the actual function names in mlff.py so the subsequent
calls (ensure_mlff_switchable(MLFF_NAME), get_mlff_install_profiles(MLFF_NAME),
mark_active_mlff(MLFF_NAME)) run without ImportError.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f94b63c-b0b9-428d-ad2f-c2a787047344
📒 Files selected for processing (11)
.gitattributesconfig.ymlother/experiments/jupyterlite/relax_structure_with_mlff.ipynbpackages/antlr4_python3_runtime-4.9.3-py3-none-any.whlpackages/mattersim-1.1.2-py3-none-any.whlpackages/models/mattersim-v1.0.0-1M.pthpackages/models/nequip-oam-s-config-sd.pthpackages/models/uma-s-1p1-f16.ptpackages/models/uma-s-1p1-int8.ptpackages/nequip-0.15.0-py3-none-any.whlsrc/py/mat3ra/notebooks_utils/mlff.py
✅ Files skipped from review due to trivial changes (5)
- .gitattributes
- packages/nequip-0.15.0-py3-none-any.whl
- packages/mattersim-1.1.2-py3-none-any.whl
- packages/models/uma-s-1p1-f16.pt
- packages/models/nequip-oam-s-config-sd.pth
| "from mat3ra.notebooks_utils.mlff import ensure_mlff_switchable, get_mlff_install_profiles, mark_active_mlff\n", | ||
| "\n", | ||
| "ensure_mlff_switchable(MLFF_NAME)\n", | ||
| "\n", | ||
| "profiles = get_mlff_install_profiles(MLFF_NAME)\n", | ||
| "await install_packages(profiles)\n", | ||
| "mark_active_mlff(MLFF_NAME)\n", |
There was a problem hiding this comment.
Fix MLFF utility import mismatch (runtime ImportError).
Line 85 imports ensure_mlff_switchable and mark_active_mlff, but src/py/mat3ra/notebooks_utils/mlff.py does not define them. This breaks notebook execution before package install starts.
Suggested minimal fix
-from mat3ra.notebooks_utils.mlff import ensure_mlff_switchable, get_mlff_install_profiles, mark_active_mlff
+from mat3ra.notebooks_utils.mlff import get_mlff_install_profiles
-
-ensure_mlff_switchable(MLFF_NAME)
profiles = get_mlff_install_profiles(MLFF_NAME)
await install_packages(profiles)
-mark_active_mlff(MLFF_NAME)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@other/experiments/jupyterlite/relax_structure_with_mlff.ipynb` around lines
85 - 91, The notebook imports symbols that don't exist in
mat3ra.notebooks_utils.mlff (ensure_mlff_switchable and mark_active_mlff),
causing an ImportError; either update the import line to use the actual exported
names from src/py/mat3ra/notebooks_utils/mlff.py (replace ensure_mlff_switchable
and/or mark_active_mlff with the correct function names), or add/rename wrapper
functions in mlff.py to export ensure_mlff_switchable and mark_active_mlff, and
keep get_mlff_install_profiles as-is; ensure the imported symbols referenced in
the notebook (ensure_mlff_switchable, get_mlff_install_profiles,
mark_active_mlff) match the actual function names in mlff.py so the subsequent
calls (ensure_mlff_switchable(MLFF_NAME), get_mlff_install_profiles(MLFF_NAME),
mark_active_mlff(MLFF_NAME)) run without ImportError.
Summary by CodeRabbit
New Features
Chores