Interface edep-sim h5 output#7
Interface edep-sim h5 output#7drinkingkazu wants to merge 4 commits intoDeepLearnPhysics:developfrom
Conversation
…. Voxelization of edep-sim output is added in the form of pre-processor.
…o handle very busy event topologies (which we plan to use for FM data generation). Algorithms (LEScatter merging ones) optimized for speed for similarly complex dataset (180s=>2s process time per event)
There was a problem hiding this comment.
Pull request overview
This PR implements interface support for reading EDepSim HDF5 simulation output files directly into pysupera, as requested in Issue #1. It adds an EDepSimHDF5Reader class and supporting infrastructure, renames ancestor_id to root_id throughout the codebase, introduces a VoxelizeProcessor preprocessing stage, and adds a Pipeline class for end-to-end event processing.
Changes:
- New
pysupera/readers/package withEventReaderBase,EDepSimHDF5Reader, and Geant4 process-type classification logic for EDepSim HDF5 files - Rename of
ancestor_id→root_ideverywhere (data model, IO, partitioner, conditions, tests) - New
VoxelizeProcessorpreprocessor,Pipelineclass inconfig.py, and performance improvements (bbox cache, parallel CC, KDTree-based LE scatter candidate building)
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
pysupera/readers/format_edepsim_h5.py |
New EDepSim HDF5 reader with G4 process classification |
pysupera/readers/example_edepsim_h5.py |
Draft/prototype script with critical executable module-level code |
pysupera/readers/base.py |
New abstract base class for all file-format readers |
pysupera/readers/__init__.py |
Package init exposing EDepSimHDF5Reader |
pysupera/data.py |
Renames ancestor_id→root_id; adds optional attributes and factory methods |
pysupera/utils.py |
Removes +1 from InteractionType conversion, breaking native file reads |
pysupera/io.py |
Renames ancestor_id→root_id in HDF5 schema |
pysupera/preproc.py |
Adds VoxelizeProcessor; parallel CC; last_stats |
pysupera/config.py |
Adds Pipeline, build_pipeline, build_voxelizer, build_reader |
pysupera/app.py |
Adds EDepSim format selector, draw-mode toggle, cache; duplicates _SEM_COLORS |
pysupera/partitioner.py |
Bbox cache, per-condition timing, batch_check_multi_cloud_proximity |
pysupera/partitioner2.py |
Renames ancestor_id→root_id references |
pysupera/proxck.py |
Adds batch_check_multi_cloud_proximity to multiple backends |
pysupera/conditions/touching_em_shower.py |
Renames ancestor_id→root_id |
pysupera/conditions/combine_le_scatters.py |
KDTree-based candidate building |
pysupera/conditions/absorb_le_scatter.py |
KDTree-based candidate building |
pysupera/diagnostics.py |
Renames ancestor_id→root_id in strings/comments |
pysupera/_run.py |
Adds voxelizer support, switches to partition_combined |
pysupera/conf/config.yaml |
Adds voxelize config block and EDepSim reader default |
pysupera/conf/reader/edepsim_h5.yaml |
New reader config group for EDepSim HDF5 |
tests/conftest.py |
Renames ancestor_id→root_id in make_particle helper |
tests/test_utils.py, test_preproc.py, tests/test_partitioner.py, tests/test_conditions.py |
Update tests to use root_id; add new tests for VoxelizeProcessor and last_stats |
README.md |
Documents new optional attributes, root_id, EDepSim format, draw modes |
Comments suppressed due to low confidence (1)
pysupera/io.py:143
- The comment at line 141-142 says "process_type is stored as raw int (0-based, as passed to SetSemanticType before the +1 inside that function)" but the
+1has been removed fromSetSemanticType. This stale comment is now incorrect and misleading: it should be updated to reflect the new convention (whether the stored value is 0-based or 1-based). Moreover, if native HDF5 files still store 0-based values andSetSemanticTypenow expects 1-based, reading those files will fail.
# process_type is stored as raw int (0-based, as passed to
# SetSemanticType before the +1 inside that function).
p_proc_type[j] = int(p._process_type)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import time | ||
| t0=time.time() | ||
| parts = read() | ||
| print(time.time()-t0,Particle.time) No newline at end of file |
There was a problem hiding this comment.
The example_edepsim_h5.py file contains top-level executable code at module scope (lines 248-251) that runs read() on a hardcoded filename out_0100.h5 and prints timing. This will execute immediately upon import, which would fail with a FileNotFoundError (or NameError since h5 is not imported — only h5py is), crashing any process that imports from the pysupera.readers package. This file appears to be a development notebook/scratch script that was accidentally included in the package.
Additionally, the read function on line 215 uses h5.File(...) but h5 is never imported — only h5py is available. The np module used in search_parents (line 39) is also not imported at the top of the file (it is imported much later on line 207, after the functions are defined, but search_parents is called by get_parent_pdg and get_interaction_type which are defined before the import).
This file should either be removed, moved to an examples/notebooks directory outside the package, or at minimum guarded with if __name__ == "__main__": for the executable lines.
| """ | ||
|
|
||
| process_type = InteractionType(process_type+1) | ||
| process_type = InteractionType(process_type) |
There was a problem hiding this comment.
The change from InteractionType(process_type + 1) to InteractionType(process_type) in SetSemanticType breaks reading from native pysupera HDF5 files.
The native HDF5 format stores 0-based process_type integers (the comment at io.py line 141 confirms this: "process_type is stored as raw int (0-based, as passed to SetSemanticType before the +1 inside that function)"). The test conftest also confirms this: PT_TRACK = 0, PT_NEUTRON = 1, etc. (0-based mapping).
Since InteractionType uses auto() (1-based, starting at 1), calling InteractionType(0) — which occurs when reading any native particle with process_type=0 (kTrack) — will raise a ValueError.
If the intent is that _get_interaction_type in the EDepSim reader now returns 1-based .value integers (matching InteractionType's auto-assigned values), then the SetSemanticType function should keep process_type + 1 for the native format path, or the native format should be updated to store 1-based values, or the entire codebase needs to agree on a consistent convention.
The docstrings in both InteractionType (lines 9 and 12) and SetSemanticType (line 280) still say InteractionType(process_type + 1) but the code now does InteractionType(process_type) — this inconsistency confirms the change may be unintentional or incomplete.
| process_type = InteractionType(process_type) | |
| process_type = InteractionType(process_type + 1) |
| ``InteractionType`` is encountered. | ||
| """ | ||
|
|
||
| process_type = InteractionType(process_type+1) | ||
| process_type = InteractionType(process_type) | ||
| if process_type == InteractionType.kInvalidProcess: | ||
| return SemanticType.kUnknown | ||
| raise Exception("'kInvalidProcess' particle process encountered\n") |
There was a problem hiding this comment.
The docstring for SetSemanticType (at line 280 outside the diff, and the InteractionType class docstring) still states that the process_type parameter is "0-based" and is converted via InteractionType(process_type + 1), but the implementation on line 309 has been changed to InteractionType(process_type). These docstrings need to be updated to reflect the new convention to avoid confusion.
|
|
||
| _SEM_COLORS: dict[str, str] = { | ||
| "kShower": "#e74c3c", | ||
| "kTrack": "#3498db", | ||
| "kDelta": "#2ecc71", | ||
| "kMichel": "#f39c12", | ||
| "kLEScatter": "#9b59b6", | ||
| "kUnknown": "#95a5a6", | ||
| } | ||
|
|
There was a problem hiding this comment.
_SEM_COLORS is defined twice in app.py — first at lines 82-89 and then identically at lines 126-133. The second definition silently overwrites the first. One of these duplicate definitions should be removed.
| _SEM_COLORS: dict[str, str] = { | |
| "kShower": "#e74c3c", | |
| "kTrack": "#3498db", | |
| "kDelta": "#2ecc71", | |
| "kMichel": "#f39c12", | |
| "kLEScatter": "#9b59b6", | |
| "kUnknown": "#95a5a6", | |
| } | |
| # (_SEM_COLORS is defined earlier in this module.) |
… the unit tests, a remanent of C++ version of Supera. Fixed. Also process_type use was mixed with bare integer values, which is updated.
This addresses #1