DeepLearnPhysics · drinkingkazu · Mar 7, 2026 · Mar 10, 2026 · Mar 10, 2026 · Mar 10, 2026
diff --git a/README.md b/README.md
@@ -149,13 +149,23 @@ Then open the URL in a browser.  The sidebar lets you:
 | Section | Controls |
 |---|---|
 | **INPUT** | HDF5 file path and event index |
+| **INPUT FORMAT** | File format selector (`native` HDF5 or `edepsim_h5`); EDepSim-specific step/particle dataset keys and electron threshold |
 | **PREPROCESSING** | Enable merge-duplicates and/or defragmentation; backend; semantic-type filter |
 | **PARTITIONER** | Distance threshold, checker backend, `n_jobs` |
 | **CONDITIONS** | Toggle each of the four conditions independently |
-| **VIEW OPTIONS** | Show/hide legends; synchronise the two 3-D camera views |
-| **PARTICLE FILTER** | Instantly show/hide particles by semantic type (no re-run needed) |
+| **VIEW OPTIONS** | Show/hide legends; synchronise the two 3-D camera views; toggle **colour by sem type** (fast merged-trace rendering vs. per-particle instance colouring) |
+| **PARTICLE FILTER** | Instantly show/hide particles by semantic type (no re-run needed); **Min points to display** hides small point clouds from the view without re-running the pipeline |
 
-Hit **▶ Run** to execute the full pipeline and render two side-by-side 3-D point-cloud plots—original particles on the left, partitions on the right.  The **PARTICLE FILTER** and **show legend** checkbox take effect immediately without re-running the pipeline.
+Hit **▶ Run** to execute the full pipeline and render two side-by-side 3-D point-cloud plots—original particles on the left, partitions on the right.
+
+**Draw modes** (toggled via *colour by sem type* in VIEW OPTIONS):
+
+| Mode | Left plot | Right plot | Speed |
+|---|---|---|---|
+| **by instance** (default) | One trace per particle, unique colour per ID; hover shows particle ID, PDG, parent | One trace per partition, unique colour per partition index | Slower for large events (many traces) |
+| **by sem type** | One merged trace per semantic type with fixed colours | One merged trace per semantic type across all partitions | Fast — O(n\_sem\_types) traces regardless of event size |
+
+The **PARTICLE FILTER** and **show legend** checkbox take effect immediately without re-running the pipeline.
 
 <p align="center">
   <img src="figures/dash_display.png" alt="pysupera-app event display" width="900"/>
@@ -167,19 +177,58 @@ Hit **▶ Run** to execute the full pipeline and render two side-by-side 3-D poi
 
 ### Input: the `Particle` object
 
-Each particle carries:
+#### Required attributes
+
+These attributes must always be supplied at construction time and are guaranteed to be set on every `Particle` instance.
 
 | Attribute | Type | Description |
 |---|---|---|
-| `id` | `int` | Unique particle ID within an event |
-| `parent_id` | `int` | Direct parent particle ID |
-| `ancestor_id` | `int` | Root ancestor of the shower/track genealogy |
-| `pdg` | `int` | PDG Monte Carlo code |
-| `parent_pdg` | `int` | PDG code of the parent |
-| `sem_type` | `SemanticType` | Derived semantic category (see below) |
+| `id` | `int` | Unique particle ID within an event (Geant4 track ID) |
+| `parent_id` | `int` | Direct parent particle ID; equals `id` for primary particles |
+| `root_id` | `int` | ID of the primary ancestor at the root of the shower/track genealogy |
+| `pdg` | `int` | PDG Monte Carlo particle code |
+| `parent_pdg` | `int` | PDG code of the direct parent particle |
+| `process_type` | `InteractionType` | Physics process that created this particle; derived from the raw int stored in `_process_type` |
+| `sem_type` | `SemanticType` | High-level semantic category derived automatically at construction (see below) |
 | `point_cloud` | `ndarray (N, ≥3)` | 3-D hit positions; columns 0–2 are x, y, z |
 
-Semantic types (`SemanticType` enum):
+#### Optional attributes
+
+These attributes are not required at construction time.  Unset float32 scalars hold `FLOAT_UNSET` (= `np.float32('nan')`); all other unset attributes hold `None`.
+
+| Attribute | Type | Default | Description |
+|---|---|---|---|
+| `start` | `ndarray (3,) float32` | `None` | Trajectory start position (vertex).  Also accessible as `p.vertex`. |
+| `end` | `ndarray (3,) float32` | `None` | Trajectory end position. |
+| `momentum_start` | `ndarray (3,) float32` | `None` | 3-momentum at the start vertex (MeV/c). |
+| `momentum_end` | `ndarray (3,) float32` | `None` | 3-momentum at the trajectory end (MeV/c). |
+| `kinetic_energy_start` | `float32` | `NaN` | Kinetic energy at the start vertex (MeV). |
+| `kinetic_energy_end` | `float32` | `NaN` | Kinetic energy at the end of the trajectory (MeV). |
+| `mass` | `float32` | `NaN` | Particle rest mass (MeV/c²). |
+| `root_pdg` | `int` | `None` | PDG code of the primary (root) ancestor particle. |
+| `start_process_id` | `int` | `None` | Geant4/simulation process ID for this particle's creation. |
+| `start_subprocess_id` | `int` | `None` | Geant4/simulation sub-process ID for this particle's creation. |
+| `start_process_name` | `str` | `None` | Human-readable name of the creation process (e.g. `"eIoni"`). |
+| `end_process_id` | `int` | `None` | Geant4/simulation process ID for this particle's termination. |
+| `end_subprocess_id` | `int` | `None` | Geant4/simulation sub-process ID for this particle's termination. |
+| `end_process_name` | `str` | `None` | Human-readable name of the termination process. |
+
+**Checking whether an optional attribute is set:**
+
+```python
+if p.start is not None:              # array / int / str / list check
+    print(p.start)
+
+import numpy as np
+if not np.isnan(p.kinetic_energy_start):  # float32 scalar check
+    print(p.kinetic_energy_start)
+```
+
+**`vertex` property** — `p.vertex` is a read/write alias for `p.start`.
+
+#### Semantic types
+
+`sem_type` is derived automatically from `process_type`, `pdg`, `parent_pdg`, and `point_cloud` at construction time via `SetSemanticType`.  Particles with fewer than `min_pc_size` points may be reclassified as `kLEScatter`.
 
 | Value | Meaning |
 |---|---|
@@ -190,8 +239,6 @@ Semantic types (`SemanticType` enum):
 | `kLEScatter` | Low-energy scatter product |
 | `kUnknown` | Unclassified |
 
-`sem_type` is derived automatically from `process_type`, `pdg`, `parent_pdg`, and `point_cloud` at construction time via `SetSemanticType`.  Particles with fewer than `min_pc_size` points may be reclassified as `kLEScatter`.
-
 ---
 
 ### Step 0 — Configuration
@@ -472,7 +519,7 @@ Only the touching subset proceeds to `post_filter`, and only the final `merge_pa
 **Candidate filter:**
 - Both particles have PDG code 11 or 22.
 - Direct parent–child relationship.
-- Same `ancestor_id`.
+- Same `root_id`.
 
 **Merge direction:** child → parent (directed by the parent–child tree).
 

diff --git a/pysupera/__init__.py b/pysupera/__init__.py
@@ -1,5 +1,8 @@
 __version__ = "0.0.1"
 
+from .data import Particle, FLOAT_UNSET
 from .io import write_events, open_writer, read_events, EventStore, EventWriter
-from .config import build_checker, build_conditions, build_preprocessor, build_merge_processor, load_cfg, configure, check_particle_list
+from .config import (build_checker, build_conditions, build_preprocessor,
+                     build_merge_processor, build_voxelizer, build_pipeline,
+                     load_cfg, configure, check_particle_list, Pipeline)
 from .utils import trace_ancestry
diff --git a/pysupera/_run.py b/pysupera/_run.py
@@ -32,16 +32,20 @@ def main(cfg: DictConfig) -> None:
     from collections import defaultdict
     from pysupera import read_events, open_writer
     from pysupera.partitioner import ParticlePartitioner
-    from pysupera.config import build_conditions, build_preprocessor, build_merge_processor, configure
+    from pysupera.config import build_conditions, build_preprocessor, build_merge_processor, build_voxelizer, configure
 
     configure(cfg)  # set module-level defaults (e.g. min_pc_size) before any Particle is created
 
     merge_processor = build_merge_processor(cfg)  # None when merge_duplicates: false
+    voxelizer       = build_voxelizer(cfg)         # None when voxelize.enabled: false
     preprocessor    = build_preprocessor(cfg)      # None when defragment: false
 
+    _vox_info = (f"enabled (voxel_size={cfg.particle.voxelize.voxel_size})"
+                 if voxelizer is not None else "disabled")
     print(f"[run] checker         : {cfg.checker.name}")
     print(f"[run] distance        : {cfg.distance_threshold}")
     print(f"[run] merge_duplicates: {'enabled' if merge_processor is not None else 'disabled'}")
+    print(f"[run] voxelize        : {_vox_info}")
     print(f"[run] preprocessor    : {cfg.particle.get('preprocessor', {}).get('name', 'scipy') if cfg.particle.get('defragment', False) else 'disabled'}")
     print(f"[run] input         : {cfg.io.input_path}")
     print(f"[run] output        : {cfg.io.output_path}")
@@ -56,7 +60,7 @@ def main(cfg: DictConfig) -> None:
         'pc_size_min':    [],   # int  : smallest non-zero PC in event
         'pc_size_max':    [],   # int  : largest non-zero PC in event
         'pc_size_mean':   [],   # float: mean non-zero PC size in event
-        'n_partitions':   defaultdict(list),  # condition.name → [int per event]
+        'n_partitions':   [],   # int: final partition count after partition_combined
     }
     n_events = 0
 
@@ -71,11 +75,17 @@ def main(cfg: DictConfig) -> None:
                 t0 = time.perf_counter()
                 n_events += 1
 
-                if merge_processor is not None:
+                # skip merge_duplicates when voxelizer subsumes it
+                if merge_processor is not None and not (voxelizer and voxelizer.merge_duplicates):
                     _t = time.perf_counter()
                     particles = merge_processor.process(particles)
                     profile['merge_duplicates'] += time.perf_counter() - _t
 
+                if voxelizer is not None:
+                    _t = time.perf_counter()
+                    particles = voxelizer.process(particles)
+                    profile['voxelize'] += time.perf_counter() - _t
+
                 if preprocessor is not None:
                     _t = time.perf_counter()
                     particles = preprocessor.process(particles)
@@ -106,11 +116,10 @@ def main(cfg: DictConfig) -> None:
                 )
                 profile['partitioner_init'] += time.perf_counter() - _t
 
-                for condition in conditions:
-                    _t = time.perf_counter()
-                    partitions = partitioner.partition(condition, verbose=cfg.verbose)
-                    profile[condition.name] += time.perf_counter() - _t
-                    stats['n_partitions'][condition.name].append(len(partitions))
+                _t = time.perf_counter()
+                partitions = partitioner.partition_combined(conditions, verbose=cfg.verbose)
+                profile['partition_combined'] += time.perf_counter() - _t
+                stats['n_partitions'].append(len(partitions))
 
                 partitioner.checker.cleanup()
 
@@ -151,10 +160,8 @@ def _agg(vals):
         print(f"  {'PC size  mean':<{_W}} {lo:>10.2f} {mu:>10.2f} {hi:>10.2f}")
 
         print(f"  {_SEP}")
-        for cname, counts in stats['n_partitions'].items():
-            lo, mu, hi = _agg(counts)
-            label = f"Partitions [{cname}]"
-            print(f"  {label:<{_W}} {int(lo):>10,} {mu:>10.1f} {int(hi):>10,}")
+        lo, mu, hi = _agg(stats['n_partitions'])
+        print(f"  {'Partitions (final)':<{_W}} {int(lo):>10,} {mu:>10.1f} {int(hi):>10,}")
 
         # ── Time profile ──────────────────────────────────────────────────
         total = sum(profile.values())