wwinp files: Fix MemoryError in WeightWindowsList.export_to_hdf5 and speed up from_wwinp#3942
wwinp files: Fix MemoryError in WeightWindowsList.export_to_hdf5 and speed up from_wwinp#3942yrrepy wants to merge 5 commits into
Conversation
Float/complex ndarrays are dtype-validated, so the per-element isinstance() scan is redundant. Also construct upper_ww_bounds in WeightWindows.__init__ as an ndarray multiplication (not a list comprehension) so the upper-bounds setter benefits too. ~11x speedup on 172M-element wwinp inputs (397 s -> 35 s).
The XML serialization raised MemoryError on bound arrays >~200M elements -- lxml's intermediate ASCII allocation fails before the text node can be built. Write HDF5 directly via h5py, mirroring the C++ WeightWindows::to_hdf5 writer. Critical details for C++ compatibility: - Bounds are 2D (ne, n_voxels) on disk (4D would segfault the C++ tensor::Tensor<double> reader). - max_lower_bound_ratio is written unconditionally (default 1.0). - Root attrs filetype and version are required by openmc_weight_windows_import. Mesh writing is handled by a private _write_mesh_group helper in weight_windows.py that dispatches by mesh type, matching the reference implementation. UnstructuredMesh raises NotImplementedError (wwinp cannot produce one).
09ed5dc to
d3124ee
Compare
| self.upper_ww_bounds = [ | ||
| lb * upper_bound_ratio for lb in self.lower_ww_bounds | ||
| ] | ||
| self.upper_ww_bounds = self.lower_ww_bounds * upper_bound_ratio |
There was a problem hiding this comment.
The previous lines supported the case where self.lower_ww_bounds is a list.
The new lines do not support that.
There was a problem hiding this comment.
Claude Code:
wrapped in np.asarray() so the line reads correctly even when
self.lower_ww_bounds is list-like. In practice the lower_ww_bounds setter
(L247-259) normalizes to ndarray via np.asarray, but the explicit wrap here
is clearer and costs nothing (no-op when already ndarray).
Perry:
OK, is defensive to have the explicit wrap
There was a problem hiding this comment.
I've missed the fact that the setter guarantees that self.lower_ww_bounds is an array. So there is no problem.
I think your original code is better.
Wrap with np.asarray() so the derivation reads correctly even when self.lower_ww_bounds is list-like. No-op when already ndarray.
|
IMO the alternative refactoring style is better. It will be easier to support new mesh types when each mesh manage for itself the serialization to hdf5. |
The dtype-trust fast path returned for any float/complex ndarray of matching depth, even when expected_type was int or another class -- the docstring promised element-type validation but the fast path skipped it. Gate the fast path on expected_type in (Real, float, complex) so it only fires when dtype.kind in 'fc' actually satisfies the contract.
The direct-h5py writer cannot serialize an UnstructuredMesh from pure Python: vertex and connectivity data live in the external .exo/.h5m file and only exist in memory after LibMesh/MOAB loads them via openmc.lib.init. Dispatch on mesh type up front: structured meshes take the new fast path; UnstructuredMesh falls back to the previous TemporarySession + openmc.lib.export_weight_windows route, which also restores honoring of init_kwargs on that path. Removes the dead NotImplementedError branch from _write_mesh_group.
|
Further checking the code, I had unwittingly disabled a Unstructured Mesh weight window workflow
Thoughts? |
OK, I am agnostic, I just want my 2GB+ wwinps! Do we need another opinion or should I just scrap this PR and start a new PR with the alternative approach? |
|
I think you can just open a new PR and close this one. |
Description
This PR:
WeightWindowsList.export_to_hdf5raisedMemoryErroron multi-GB wwinp inputs because the XML path built multi-GB ASCII strings inside lxml. Now writes HDF5 directly via h5py, matching the C++ writer.WeightWindowsList.from_wwinpwas dominated by per-elementisinstancechecks incheck_iterable_type(~90% of total time on multi-million- element bound arrays). Added a fast path for numpy float/complex ndarrays; ~11× speedup on a 172M-element wwinp (397 s → 35 s).This enables support for many-GB wwinp files and faster processing of them.
An alternative re-factoring style is available here:
https://github.com/yrrepy/openmc/tree/wwinp_2GB_faster
Checklist