-
Notifications
You must be signed in to change notification settings - Fork 629
Move depletion chain from XML to HDF (?) #3882
Description
Description
The depletion chain files hold lots of information, more than just the transmutation chains. Fission yield distributions and decay sources are part of the chain file. For good reasons!
However, these files can get rather large. Downloading the latest chain file from openmc.org/data produces a 27 MB, 31K line XML file. The fission yield and decay source data are specific cases where we have numeric data encoded as strings that need to be specially handed during the read/write step.
Additionally, to build part of the chain, you still have to traverse over the entire chain file.
I've been thinking about translating the chain file to HDF.
Pros
- Native handling of arrays (e.g., fission yields)
- Compression
- Able to load parts of the file contents into memory without loading the entire file
Cons
- Can't open the native file up in a text editor for simple manipulations
- Still lots of non-numeric data (e.g., reaction types, targets) that, if left as strings in the data file, don't make for the most friendly hdf experience
Alternatives
A middle ground could be XDMF that's a combination of HDF and XML. That does mean we have two files to pass around: primary "light" XML and secondary "heavy" HDF. I don't hate that solution, as it would let us off load the array-like things to HDF.
Compatibility
We presently have Chain.from_xml which could be supported going forward. We could add Chain.from_hdf or Chain.from_xdmf to handle the new file. Same for export_xml / export_to_hdf / export_to_xdmf
Other items
Lazy load
There is an option to lazily load the file data until we need it. This might be a bigger lift for not much gain. Example: if you're doing a decay problem, you maybe don't need to load in fission yield data (neglecting spontaneous fission). Same for photon sources. Maybe a longer term thing
Structure
I'm not sure on the best data layout. Is it nicer to be able to access all the depletion/decay data for a given nuclide as one group, like the current layout where everything is first under a <nuclide> tag?
Or to have all the fission yield data grouped together so we could have larger 2D arrays of products -> targets for a given energy group? And to write non-fission yield reaction data in a tabular format, like
| Nuclide | Reaction | Target | Q |
|---|---|---|---|
"H1" |
"(n,gamma)" |
"H2" |
2224648.0 |
and have each column be a vector since they have different data types