Skip to content

Interface edep-sim h5 output#7

Open
drinkingkazu wants to merge 4 commits intoDeepLearnPhysics:developfrom
drinkingkazu:develop
Open

Interface edep-sim h5 output#7
drinkingkazu wants to merge 4 commits intoDeepLearnPhysics:developfrom
drinkingkazu:develop

Conversation

@drinkingkazu
Copy link
Copy Markdown
Contributor

This addresses #1

…. Voxelization of edep-sim output is added in the form of pre-processor.
…o handle very busy event topologies (which we plan to use for FM data generation). Algorithms (LEScatter merging ones) optimized for speed for similarly complex dataset (180s=>2s process time per event)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements interface support for reading EDepSim HDF5 simulation output files directly into pysupera, as requested in Issue #1. It adds an EDepSimHDF5Reader class and supporting infrastructure, renames ancestor_id to root_id throughout the codebase, introduces a VoxelizeProcessor preprocessing stage, and adds a Pipeline class for end-to-end event processing.

Changes:

  • New pysupera/readers/ package with EventReaderBase, EDepSimHDF5Reader, and Geant4 process-type classification logic for EDepSim HDF5 files
  • Rename of ancestor_idroot_id everywhere (data model, IO, partitioner, conditions, tests)
  • New VoxelizeProcessor preprocessor, Pipeline class in config.py, and performance improvements (bbox cache, parallel CC, KDTree-based LE scatter candidate building)

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pysupera/readers/format_edepsim_h5.py New EDepSim HDF5 reader with G4 process classification
pysupera/readers/example_edepsim_h5.py Draft/prototype script with critical executable module-level code
pysupera/readers/base.py New abstract base class for all file-format readers
pysupera/readers/__init__.py Package init exposing EDepSimHDF5Reader
pysupera/data.py Renames ancestor_idroot_id; adds optional attributes and factory methods
pysupera/utils.py Removes +1 from InteractionType conversion, breaking native file reads
pysupera/io.py Renames ancestor_idroot_id in HDF5 schema
pysupera/preproc.py Adds VoxelizeProcessor; parallel CC; last_stats
pysupera/config.py Adds Pipeline, build_pipeline, build_voxelizer, build_reader
pysupera/app.py Adds EDepSim format selector, draw-mode toggle, cache; duplicates _SEM_COLORS
pysupera/partitioner.py Bbox cache, per-condition timing, batch_check_multi_cloud_proximity
pysupera/partitioner2.py Renames ancestor_idroot_id references
pysupera/proxck.py Adds batch_check_multi_cloud_proximity to multiple backends
pysupera/conditions/touching_em_shower.py Renames ancestor_idroot_id
pysupera/conditions/combine_le_scatters.py KDTree-based candidate building
pysupera/conditions/absorb_le_scatter.py KDTree-based candidate building
pysupera/diagnostics.py Renames ancestor_idroot_id in strings/comments
pysupera/_run.py Adds voxelizer support, switches to partition_combined
pysupera/conf/config.yaml Adds voxelize config block and EDepSim reader default
pysupera/conf/reader/edepsim_h5.yaml New reader config group for EDepSim HDF5
tests/conftest.py Renames ancestor_idroot_id in make_particle helper
tests/test_utils.py, test_preproc.py, tests/test_partitioner.py, tests/test_conditions.py Update tests to use root_id; add new tests for VoxelizeProcessor and last_stats
README.md Documents new optional attributes, root_id, EDepSim format, draw modes
Comments suppressed due to low confidence (1)

pysupera/io.py:143

  • The comment at line 141-142 says "process_type is stored as raw int (0-based, as passed to SetSemanticType before the +1 inside that function)" but the +1 has been removed from SetSemanticType. This stale comment is now incorrect and misleading: it should be updated to reflect the new convention (whether the stored value is 0-based or 1-based). Moreover, if native HDF5 files still store 0-based values and SetSemanticType now expects 1-based, reading those files will fail.
            # process_type is stored as raw int (0-based, as passed to
            # SetSemanticType before the +1 inside that function).
            p_proc_type[j]   = int(p._process_type)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +248 to +251
import time
t0=time.time()
parts = read()
print(time.time()-t0,Particle.time) No newline at end of file
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example_edepsim_h5.py file contains top-level executable code at module scope (lines 248-251) that runs read() on a hardcoded filename out_0100.h5 and prints timing. This will execute immediately upon import, which would fail with a FileNotFoundError (or NameError since h5 is not imported — only h5py is), crashing any process that imports from the pysupera.readers package. This file appears to be a development notebook/scratch script that was accidentally included in the package.

Additionally, the read function on line 215 uses h5.File(...) but h5 is never imported — only h5py is available. The np module used in search_parents (line 39) is also not imported at the top of the file (it is imported much later on line 207, after the functions are defined, but search_parents is called by get_parent_pdg and get_interaction_type which are defined before the import).

This file should either be removed, moved to an examples/notebooks directory outside the package, or at minimum guarded with if __name__ == "__main__": for the executable lines.

Copilot uses AI. Check for mistakes.
"""

process_type = InteractionType(process_type+1)
process_type = InteractionType(process_type)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from InteractionType(process_type + 1) to InteractionType(process_type) in SetSemanticType breaks reading from native pysupera HDF5 files.

The native HDF5 format stores 0-based process_type integers (the comment at io.py line 141 confirms this: "process_type is stored as raw int (0-based, as passed to SetSemanticType before the +1 inside that function)"). The test conftest also confirms this: PT_TRACK = 0, PT_NEUTRON = 1, etc. (0-based mapping).

Since InteractionType uses auto() (1-based, starting at 1), calling InteractionType(0) — which occurs when reading any native particle with process_type=0 (kTrack) — will raise a ValueError.

If the intent is that _get_interaction_type in the EDepSim reader now returns 1-based .value integers (matching InteractionType's auto-assigned values), then the SetSemanticType function should keep process_type + 1 for the native format path, or the native format should be updated to store 1-based values, or the entire codebase needs to agree on a consistent convention.

The docstrings in both InteractionType (lines 9 and 12) and SetSemanticType (line 280) still say InteractionType(process_type + 1) but the code now does InteractionType(process_type) — this inconsistency confirms the change may be unintentional or incomplete.

Suggested change
process_type = InteractionType(process_type)
process_type = InteractionType(process_type + 1)

Copilot uses AI. Check for mistakes.
Comment on lines 306 to 312
``InteractionType`` is encountered.
"""

process_type = InteractionType(process_type+1)
process_type = InteractionType(process_type)
if process_type == InteractionType.kInvalidProcess:
return SemanticType.kUnknown
raise Exception("'kInvalidProcess' particle process encountered\n")
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for SetSemanticType (at line 280 outside the diff, and the InteractionType class docstring) still states that the process_type parameter is "0-based" and is converted via InteractionType(process_type + 1), but the implementation on line 309 has been changed to InteractionType(process_type). These docstrings need to be updated to reflect the new convention to avoid confusion.

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +134

_SEM_COLORS: dict[str, str] = {
"kShower": "#e74c3c",
"kTrack": "#3498db",
"kDelta": "#2ecc71",
"kMichel": "#f39c12",
"kLEScatter": "#9b59b6",
"kUnknown": "#95a5a6",
}

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_SEM_COLORS is defined twice in app.py — first at lines 82-89 and then identically at lines 126-133. The second definition silently overwrites the first. One of these duplicate definitions should be removed.

Suggested change
_SEM_COLORS: dict[str, str] = {
"kShower": "#e74c3c",
"kTrack": "#3498db",
"kDelta": "#2ecc71",
"kMichel": "#f39c12",
"kLEScatter": "#9b59b6",
"kUnknown": "#95a5a6",
}
# (_SEM_COLORS is defined earlier in this module.)

Copilot uses AI. Check for mistakes.
… the unit tests, a remanent of C++ version of Supera. Fixed. Also process_type use was mixed with bare integer values, which is updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants