Interface edep-sim h5 output by drinkingkazu · Pull Request #7 · DeepLearnPhysics/pysupera

drinkingkazu · 2026-03-10T16:08:12Z

This addresses #1

…. Voxelization of edep-sim output is added in the form of pre-processor.

…o handle very busy event topologies (which we plan to use for FM data generation). Algorithms (LEScatter merging ones) optimized for speed for similarly complex dataset (180s=>2s process time per event)

Copilot

Pull request overview

This PR implements interface support for reading EDepSim HDF5 simulation output files directly into pysupera, as requested in Issue #1. It adds an EDepSimHDF5Reader class and supporting infrastructure, renames ancestor_id to root_id throughout the codebase, introduces a VoxelizeProcessor preprocessing stage, and adds a Pipeline class for end-to-end event processing.

Changes:

New pysupera/readers/ package with EventReaderBase, EDepSimHDF5Reader, and Geant4 process-type classification logic for EDepSim HDF5 files
Rename of ancestor_id → root_id everywhere (data model, IO, partitioner, conditions, tests)
New VoxelizeProcessor preprocessor, Pipeline class in config.py, and performance improvements (bbox cache, parallel CC, KDTree-based LE scatter candidate building)

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`pysupera/readers/format_edepsim_h5.py`	New EDepSim HDF5 reader with G4 process classification
`pysupera/readers/example_edepsim_h5.py`	Draft/prototype script with critical executable module-level code
`pysupera/readers/base.py`	New abstract base class for all file-format readers
`pysupera/readers/__init__.py`	Package init exposing `EDepSimHDF5Reader`
`pysupera/data.py`	Renames `ancestor_id`→`root_id`; adds optional attributes and factory methods
`pysupera/utils.py`	Removes `+1` from `InteractionType` conversion, breaking native file reads
`pysupera/io.py`	Renames `ancestor_id`→`root_id` in HDF5 schema
`pysupera/preproc.py`	Adds `VoxelizeProcessor`; parallel CC; `last_stats`
`pysupera/config.py`	Adds `Pipeline`, `build_pipeline`, `build_voxelizer`, `build_reader`
`pysupera/app.py`	Adds EDepSim format selector, draw-mode toggle, cache; duplicates `_SEM_COLORS`
`pysupera/partitioner.py`	Bbox cache, per-condition timing, `batch_check_multi_cloud_proximity`
`pysupera/partitioner2.py`	Renames `ancestor_id`→`root_id` references
`pysupera/proxck.py`	Adds `batch_check_multi_cloud_proximity` to multiple backends
`pysupera/conditions/touching_em_shower.py`	Renames `ancestor_id`→`root_id`
`pysupera/conditions/combine_le_scatters.py`	KDTree-based candidate building
`pysupera/conditions/absorb_le_scatter.py`	KDTree-based candidate building
`pysupera/diagnostics.py`	Renames `ancestor_id`→`root_id` in strings/comments
`pysupera/_run.py`	Adds voxelizer support, switches to `partition_combined`
`pysupera/conf/config.yaml`	Adds voxelize config block and EDepSim reader default
`pysupera/conf/reader/edepsim_h5.yaml`	New reader config group for EDepSim HDF5
`tests/conftest.py`	Renames `ancestor_id`→`root_id` in `make_particle` helper
`tests/test_utils.py`, `test_preproc.py`, `tests/test_partitioner.py`, `tests/test_conditions.py`	Update tests to use `root_id`; add new tests for `VoxelizeProcessor` and `last_stats`
`README.md`	Documents new optional attributes, `root_id`, EDepSim format, draw modes

Comments suppressed due to low confidence (1)

pysupera/io.py:143

The comment at line 141-142 says "process_type is stored as raw int (0-based, as passed to SetSemanticType before the +1 inside that function)" but the +1 has been removed from SetSemanticType. This stale comment is now incorrect and misleading: it should be updated to reflect the new convention (whether the stored value is 0-based or 1-based). Moreover, if native HDF5 files still store 0-based values and SetSemanticType now expects 1-based, reading those files will fail.

            # process_type is stored as raw int (0-based, as passed to
            # SetSemanticType before the +1 inside that function).
            p_proc_type[j]   = int(p._process_type)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T16:15:14Z

pysupera/readers/example_edepsim_h5.py

+import time
+t0=time.time()
+parts = read()
+print(time.time()-t0,Particle.time)


The example_edepsim_h5.py file contains top-level executable code at module scope (lines 248-251) that runs read() on a hardcoded filename out_0100.h5 and prints timing. This will execute immediately upon import, which would fail with a FileNotFoundError (or NameError since h5 is not imported — only h5py is), crashing any process that imports from the pysupera.readers package. This file appears to be a development notebook/scratch script that was accidentally included in the package.

Additionally, the read function on line 215 uses h5.File(...) but h5 is never imported — only h5py is available. The np module used in search_parents (line 39) is also not imported at the top of the file (it is imported much later on line 207, after the functions are defined, but search_parents is called by get_parent_pdg and get_interaction_type which are defined before the import).

This file should either be removed, moved to an examples/notebooks directory outside the package, or at minimum guarded with if __name__ == "__main__": for the executable lines.

Copilot · 2026-03-10T16:15:14Z

pysupera/utils.py

    """

-    process_type = InteractionType(process_type+1)
+    process_type = InteractionType(process_type)


The change from InteractionType(process_type + 1) to InteractionType(process_type) in SetSemanticType breaks reading from native pysupera HDF5 files.

The native HDF5 format stores 0-based process_type integers (the comment at io.py line 141 confirms this: "process_type is stored as raw int (0-based, as passed to SetSemanticType before the +1 inside that function)"). The test conftest also confirms this: PT_TRACK = 0, PT_NEUTRON = 1, etc. (0-based mapping).

Since InteractionType uses auto() (1-based, starting at 1), calling InteractionType(0) — which occurs when reading any native particle with process_type=0 (kTrack) — will raise a ValueError.

If the intent is that _get_interaction_type in the EDepSim reader now returns 1-based .value integers (matching InteractionType's auto-assigned values), then the SetSemanticType function should keep process_type + 1 for the native format path, or the native format should be updated to store 1-based values, or the entire codebase needs to agree on a consistent convention.

The docstrings in both InteractionType (lines 9 and 12) and SetSemanticType (line 280) still say InteractionType(process_type + 1) but the code now does InteractionType(process_type) — this inconsistency confirms the change may be unintentional or incomplete.

Suggested change

process_type = InteractionType(process_type)

process_type = InteractionType(process_type + 1)

Copilot · 2026-03-10T16:15:14Z

pysupera/utils.py

        ``InteractionType`` is encountered.
    """

-    process_type = InteractionType(process_type+1)
+    process_type = InteractionType(process_type)
    if process_type == InteractionType.kInvalidProcess:
        return SemanticType.kUnknown
        raise Exception("'kInvalidProcess' particle process encountered\n")


The docstring for SetSemanticType (at line 280 outside the diff, and the InteractionType class docstring) still states that the process_type parameter is "0-based" and is converted via InteractionType(process_type + 1), but the implementation on line 309 has been changed to InteractionType(process_type). These docstrings need to be updated to reflect the new convention to avoid confusion.

Copilot · 2026-03-10T16:15:15Z

pysupera/app.py

+
+_SEM_COLORS: dict[str, str] = {
+    "kShower":    "#e74c3c",
+    "kTrack":     "#3498db",
+    "kDelta":     "#2ecc71",
+    "kMichel":    "#f39c12",
+    "kLEScatter": "#9b59b6",
+    "kUnknown":   "#95a5a6",
+}
+


_SEM_COLORS is defined twice in app.py — first at lines 82-89 and then identically at lines 126-133. The second definition silently overwrites the first. One of these duplicate definitions should be removed.

Suggested change

_SEM_COLORS: dict[str, str] = {

"kShower": "#e74c3c",

"kTrack": "#3498db",

"kDelta": "#2ecc71",

"kMichel": "#f39c12",

"kLEScatter": "#9b59b6",

"kUnknown": "#95a5a6",

}

# (_SEM_COLORS is defined earlier in this module.)

… the unit tests, a remanent of C++ version of Supera. Fixed. Also process_type use was mixed with bare integer values, which is updated.

drinkingkazu added 3 commits March 7, 2026 10:17

edep-sim interface added. Auxiliary variables for Particle also added…

bd27f18

…. Voxelization of edep-sim output is added in the form of pre-processor.

ready to interface edep-sim H5 files. visualization options tweaked t…

f2b55b4

…o handle very busy event topologies (which we plan to use for FM data generation). Algorithms (LEScatter merging ones) optimized for speed for similarly complex dataset (180s=>2s process time per event)

minor README update

afe1c57

drinkingkazu requested a review from Copilot March 10, 2026 16:08

Copilot started reviewing on behalf of drinkingkazu March 10, 2026 16:09 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

a bug fix: process_type enum assumed to start from 0 in some parts of…

7234f95

… the unit tests, a remanent of C++ version of Supera. Fixed. Also process_type use was mixed with bare integer values, which is updated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface edep-sim h5 output#7

Interface edep-sim h5 output#7
drinkingkazu wants to merge 4 commits intoDeepLearnPhysics:developfrom
drinkingkazu:develop

drinkingkazu commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Copilot AI Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	process_type = InteractionType(process_type)
	process_type = InteractionType(process_type + 1)

Conversation

drinkingkazu commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants