Skip to content

wwinp files: Fix MemoryError in WeightWindowsList.export_to_hdf5 and speed up from_wwinp#3942

Open
yrrepy wants to merge 5 commits into
openmc-dev:developfrom
yrrepy:wwinp_2GB_faster_B
Open

wwinp files: Fix MemoryError in WeightWindowsList.export_to_hdf5 and speed up from_wwinp#3942
yrrepy wants to merge 5 commits into
openmc-dev:developfrom
yrrepy:wwinp_2GB_faster_B

Conversation

@yrrepy
Copy link
Copy Markdown
Contributor

@yrrepy yrrepy commented May 22, 2026

Description

This PR:

  1. Fixes: openmc.WeightWindowsList.from_wwinp('wwinp') fails when processing wwinp files that are larger than ~2GB
  2. Speeds-up WeightWindowsList processing (which gets to be slow with multi-GB wwinp).

  1. WeightWindowsList.export_to_hdf5 raised MemoryError on multi-GB wwinp inputs because the XML path built multi-GB ASCII strings inside lxml. Now writes HDF5 directly via h5py, matching the C++ writer.

  2. WeightWindowsList.from_wwinp was dominated by per-element isinstance checks in check_iterable_type (~90% of total time on multi-million- element bound arrays). Added a fast path for numpy float/complex ndarrays; ~11× speedup on a 172M-element wwinp (397 s → 35 s).

This enables support for many-GB wwinp files and faster processing of them.

An alternative re-factoring style is available here:
https://github.com/yrrepy/openmc/tree/wwinp_2GB_faster

Checklist

  • I have performed a self-review of my own code
  • I have run clang-format (version 18) on any C++ source files (if applicable)
  • I have followed the style guidelines for Python source files (if applicable)
  • I have made corresponding changes to the documentation (if applicable)
  • I have added tests that prove my fix is effective or that my feature works (if applicable)

Float/complex ndarrays are dtype-validated, so the per-element
isinstance() scan is redundant. Also construct upper_ww_bounds in
WeightWindows.__init__ as an ndarray multiplication (not a list
comprehension) so the upper-bounds setter benefits too. ~11x speedup
on 172M-element wwinp inputs (397 s -> 35 s).
@yrrepy yrrepy requested a review from pshriwise as a code owner May 22, 2026 21:33
The XML serialization raised MemoryError on bound arrays >~200M
elements -- lxml's intermediate ASCII allocation fails before the
text node can be built. Write HDF5 directly via h5py, mirroring
the C++ WeightWindows::to_hdf5 writer.

Critical details for C++ compatibility:
- Bounds are 2D (ne, n_voxels) on disk (4D would segfault the
  C++ tensor::Tensor<double> reader).
- max_lower_bound_ratio is written unconditionally (default 1.0).
- Root attrs filetype and version are required by
  openmc_weight_windows_import.

Mesh writing is handled by a private _write_mesh_group helper in
weight_windows.py that dispatches by mesh type, matching the
reference implementation. UnstructuredMesh raises NotImplementedError
(wwinp cannot produce one).
@yrrepy yrrepy force-pushed the wwinp_2GB_faster_B branch from 09ed5dc to d3124ee Compare May 23, 2026 00:15
Comment thread openmc/weight_windows.py Outdated
Comment on lines +143 to +146
self.upper_ww_bounds = [
lb * upper_bound_ratio for lb in self.lower_ww_bounds
]
self.upper_ww_bounds = self.lower_ww_bounds * upper_bound_ratio
Copy link
Copy Markdown
Contributor

@GuySten GuySten May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous lines supported the case where self.lower_ww_bounds is a list.
The new lines do not support that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code:
wrapped in np.asarray() so the line reads correctly even when
self.lower_ww_bounds is list-like. In practice the lower_ww_bounds setter
(L247-259) normalizes to ndarray via np.asarray, but the explicit wrap here
is clearer and costs nothing (no-op when already ndarray).

Perry:
OK, is defensive to have the explicit wrap

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've missed the fact that the setter guarantees that self.lower_ww_bounds is an array. So there is no problem.
I think your original code is better.

Wrap with np.asarray() so the derivation reads correctly even when
self.lower_ww_bounds is list-like. No-op when already ndarray.
@GuySten
Copy link
Copy Markdown
Contributor

GuySten commented May 29, 2026

IMO the alternative refactoring style is better. It will be easier to support new mesh types when each mesh manage for itself the serialization to hdf5.

yrrepy added 2 commits May 29, 2026 17:35
The dtype-trust fast path returned for any float/complex ndarray of
matching depth, even when expected_type was int or another class --
the docstring promised element-type validation but the fast path
skipped it. Gate the fast path on expected_type in (Real, float,
complex) so it only fires when dtype.kind in 'fc' actually satisfies
the contract.
The direct-h5py writer cannot serialize an UnstructuredMesh from pure
Python: vertex and connectivity data live in the external .exo/.h5m
file and only exist in memory after LibMesh/MOAB loads them via
openmc.lib.init. Dispatch on mesh type up front: structured meshes
take the new fast path; UnstructuredMesh falls back to the previous
TemporarySession + openmc.lib.export_weight_windows route, which also
restores honoring of init_kwargs on that path.

Removes the dead NotImplementedError branch from _write_mesh_group.
@yrrepy
Copy link
Copy Markdown
Contributor Author

yrrepy commented May 30, 2026

Further checking the code, I had unwittingly disabled a Unstructured Mesh weight window workflow

  • d5251a8 tightens the check_iterable_type fast path so it only fires when expected_type is Real/float/complex (previously any float ndarray would silently pass even when an integer type was requested).

  • 5f5ba87 restores the UnstructuredMesh path that the direct-h5py writer couldn't reach (pure Python can't materialize the external mesh data — falls back to the previous TemporarySession route, which also restores init_kwargs forwarding on that path)

Thoughts?

@yrrepy
Copy link
Copy Markdown
Contributor Author

yrrepy commented May 30, 2026

IMO the alternative refactoring style is better. It will be easier to support new mesh types when each mesh manage for itself the serialization to hdf5.

OK, I am agnostic, I just want my 2GB+ wwinps!

Do we need another opinion or should I just scrap this PR and start a new PR with the alternative approach?
Make sure it isn't disabling UM ww, etc.

@GuySten
Copy link
Copy Markdown
Contributor

GuySten commented May 30, 2026

I think you can just open a new PR and close this one.
You can also ask for someone else opinion if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants