High-performance JAX/Flax neural networks for particle identification in ALICE Run 3
Includes six complementary architectures: SimpleNN, DNN, FSE + Attention (Phase 0), FSE + Attention (Detector-Aware) (Phase 1), Random Forest, and XGBoost
JAX | Production-ready with XLA compilation | Token-based Bayesian Handling | DPG Track Selections
JAX-PID-NN is a comprehensive machine learning framework for particle identification (PID) in ALICE at the LHC, optimised for challenging momentum regions. The framework addresses the fundamental problem of extreme data sparsity: in the critical 0–1 GeV/c range, 91.8 per cent of Bayesian PID measurements are missing, making traditional rule-based approaches unreliable. Four neural architectures — SimpleNN, DNN, FSE + Attention (Phase 0) and FSE + Attention (Detector-Aware, Phase 1) — are implemented in JAX/Flax, exploiting XLA compilation and JIT for fast training and inference, while two tree-based baselines — Random Forest (scikit-learn) and XGBoost (native XGBoost library) — are trained using their respective optimised libraries. Together, these six complementary models are evaluated on Pb–Pb Monte Carlo reconstructed data to understand which paradigm best handles high-dimensional, partially-missing physics data.
XGBoost substantially outperforms all neural network variants across all momentum ranges, achieving 83–91 per cent accuracy versus 52–70 per cent for neural networks. This is not a minor difference that hyperparameter tuning can bridge—it reflects a fundamental advantage of gradient-boosted ensembles over attention-based neural networks for this physics task.
| Momentum Range | SimpleNN | DNN | FSE Phase 0 | FSE Phase 1 | Random Forest | XGBoost | Improvement vs Best NN |
|---|---|---|---|---|---|---|---|
| Full Spectrum (0–∞ GeV/c) | 66.70% | 65.24% | 67.87% | 68.35% | 77.50% | 91.39% | +22.69% |
| 0–1 GeV/c (TPC Saturation, 84% of data) | 56.78% | 63.42% | 60.45% | 58.82% | 71.29% | 87.20% | +20.42% |
| 1–3 GeV/c (TOF Transition, 14% of data) | 63.49% | 70.46% | 51.54% | 62.16% | 75.88% | 83.22% | +20.47% |
Tree-based ensemble methods naturally exploit the structure of high-energy physics data:
- Feature selection robustness: Trees automatically identify which features matter at each stage of the decision tree. Neural networks struggle with the high dimensionality (22 features) and must learn implicit feature importance through millions of parameters.
- Missing data resilience: Tree splits gracefully handle entire detector groups being unavailable (e.g., TOF missing in 92% of 0–1 GeV/c events). Neural networks require explicit masking or token-based approaches, which add complexity without matching tree performance.
- Non-linear feature interactions: Gradient boosting captures complex feature synergies (e.g., TPC signal + momentum + TOF beta for kaon separation) through sequential tree splits. Attention mechanisms, whilst mathematically elegant, do not align with the physics of detector response.
- Class imbalance (π:K:p:e ≈ 14:1): Tree algorithms' native objective functions handle imbalance better than focal loss + class weighting. XGBoost's weighted splits intrinsically prioritise minority classes without the overfitting risk of neural network reweighting.
Real Bayesian measurements comprise only 8.2 per cent of the full dataset (1.6M of 20M tracks). In the critical 0–1 GeV/c range, this drops to 6.5 per cent, with 91.8 per cent of data filled with synthetic predictions. All models improve substantially on real Bayesian data, confirming that ML is superior to traditional Bayesian on high-quality data, but in production, must handle extreme missing data gracefully.
Recommendation: Use XGBoost for maximum accuracy. Use FSE Phase 1 only for detector commissioning studies where interpretability of per-detector performance is more valuable than accuracy.
This section validates the XGBoost models trained on reconstructed (simulated + detector noise effect added) Pb–Pb data on 7,473,918 raw LHC23 tracks from run 544122. The goal is to test robustness to real detector effects, saturation, and sim-to-real domain gaps.
| Metric | Value |
|---|---|
| Total tracks | 7,473,918 |
| pT range | 0.115–19.996 GeV/c |
| Pseudorapidity | η ∈ [–0.8, +0.8] (100%) |
| DCA selections | 100% pass DPG cuts (DCA_xy < 0.105 cm, DCA_z < 0.12 cm) |
| TPC-only tracks | 4,999,692 (66.9%) |
| TPC+TOF tracks | 2,474,225 (33.1%) |
| Bayesian PID available | 2,474,225 (33.1%) |
| Bayesian PID missing (token) | 4,999,693 (66.9%) |
All tracks satisfy the standard ALICE Run 3 DPG quality selections, ensuring a clean physics sample.
| Particle | Count | Fraction | ⟨pT⟩ (GeV/c) | ⟨η⟩ | Mean Confidence |
|---|---|---|---|---|---|
| Pion | 7,057,284 | 94.4% | 0.690 | –0.001 | 0.9124 |
| Kaon | 265,243 | 3.5% | 1.272 | –0.023 | 0.5938 |
| Proton | 143,514 | 1.9% | 1.451 | 0.018 | 0.5839 |
| Electron | 7,877 | 0.1% | 0.340 | –0.226 | 0.5703 |
The spectrum is pion-dominated, as expected in Pb–Pb, with kaons and protons at the few-per-cent level and electrons extremely rare.
TOF β Ordering (Correct)
| Particle | Mean β | Expected | Status |
|---|---|---|---|
| Pion | 0.9071 | Highest | Correct (π > K) |
| Kaon | 0.8920 | Lower than π, higher than p | Correct (π > K > p) |
| Proton | 0.8649 | Lower than K | Correct |
| Electron | 0.8464 | Lowest | Correct |
The time-of-flight response obeys the expected mass ordering (lighter particles travel faster), confirming that TOF-based discrimination is physically sound in real data.
TPC dE/dx Ordering (Broken in Real Data)
| Particle | Mean dE/dx | Expected Ordering | Observed | Status |
|---|---|---|---|---|
| Pion | 54,727 | π < K < p | π > K | Problematic |
| Kaon | 49,733 | π < K < p | K < π | Inverted |
| Proton | 408,277 | > K | > K | Correct |
| Electron | 1,452,242 | Highest | Highest | Correct |
In real data, the pion dE/dx is observed to be higher than the kaon dE/dx in the low-pT region, violating the expected Bethe–Bloch mass ordering (π < K). This behaviour is consistent with TPC response saturation in the 0–1 GeV/c range, where the separation power between π and K using dE/dx alone becomes significantly reduced.
Although the XGBoost model was trained on fully reconstructed Monte Carlo samples, which include realistic detector effects simulated with GEANT (such as energy loss fluctuations, electronic noise, and detector resolution), the training still reflects the nominal detector response where the π/K ordering is largely preserved. Consequently, when applied to real data—where residual mismodelling, local saturation effects, or calibration imperfections may distort the dE/dx response—the model cannot fully recover the separation without additional information from TOF.
Agreement is evaluated for tracks with both ML prediction and non-zero Bayesian probabilities.
| Particle | ML–Bayesian Agreement | Interpretation |
|---|---|---|
| Pion | 79.1% | Good agreement |
| Kaon | 38.2% | Poor; critical underidentification |
| Proton | 55.2% | Moderate agreement |
| Electron | 70.5% | Reasonable agreement |
Overall agreement is 76.4 per cent, but the kaon channel is problematic, with ML and Bayesian disagreeing in more than 60 per cent of cases. Bayesian PID finds roughly twice as many kaons as the ML model in the overlapping sample.
| Confidence Range | Fraction of Tracks | Interpretation |
|---|---|---|
| > 0.99 | 53.7% | Very high confidence (mostly TPC+TOF) |
| 0.95–0.99 | 8.3% | High confidence |
| 0.90–0.95 | 5.0% | Medium confidence |
| 0.80–0.90 | 8.2% | Medium–low confidence |
| < 0.80 | 24.8% | Low confidence (mostly TPC-only) |
The distribution is bimodal: roughly half the tracks have confidence > 0.99, while nearly a quarter fall below 0.80. This matches expectations: TPC+TOF tracks are typically high-confidence, whereas TPC-only saturated tracks are much more ambiguous.
| pT Bin (GeV/c) | Pion | Kaon | Proton | Electron | Tracks |
|---|---|---|---|---|---|
| 0–0.5 | 98.6% | 1.1% | 0.1% | 0.2% | 3,290,052 |
| 0.5–1.0 | 93.9% | 4.8% | 1.3% | 0.1% | 2,542,127 |
| 1.0–1.5 | 92.0% | 1.8% | 6.3% | 0.0% | 983,468 |
| 1.5–2.0 | 85.3% | 8.5% | 6.2% | 0.0% | 389,341 |
| 2.0–3.0 | 72.2% | 19.6% | 8.2% | 0.0% | 218,173 |
| 3.0–5.0 | 60.9% | 28.1% | 11.1% | 0.0% | 46,060 |
| 5.0–10.0 | 66.0% | 24.5% | 9.4% | 0.1% | 4,374 |
| > 10 | 78.0% | 17.3% | 4.6% | 0.0% | 323 |
This pattern is physically reasonable:
- Low pT (< 1 GeV/c): Pion-dominated, reflecting high multiplicity of soft pions.
- Intermediate pT (1.5–3 GeV/c): Kaon fraction increases substantially, peaking around 2–3 GeV/c (19–28%).
- Higher pT (> 3 GeV/c): Proton fraction rises, kaon and proton fractions remain at the 10–30 per cent level.
The kaon peak in the TOF region confirms that TOF-assisted discrimination is working and that the XGBoost model is capturing the correct physics at higher transverse momentum.
Strengths
- TOF β ordering is correct for all species.
- High-pT kaon and proton fractions match expectations from hadron production.
- Confidence scores are well-calibrated and strongly correlated with detector availability.
- 100 per cent of real tracks receive an ML prediction (7.47M tracks).
Critical Issues
- TPC dE/dx ordering is inverted for π/K in real data due to saturation, breaking the assumptions used in training.
- ML underestimates kaon yield (3.5 per cent vs ~8 per cent from Bayesian), particularly in low-pT, TPC-only regime.
- ML–Bayesian agreement for kaons is only 38 per cent.
Recommendations for Production Use
- Apply a confidence > 0.90 threshold for physics analyses (covers ≈ 67 per cent of tracks with high purity).
- Use ML kaon predictions only above pT > 1.5 GeV/c, where TOF is available and TPC saturation is less severe.
- For 0–1 GeV/c, rely on Bayesian PID and dedicated low-pT analyses; treat ML kaon predictions as indicative rather than final.
- Cross-check all rare-species yields (kaons, protons, electrons) against Bayesian results and known spectra.
Recommendations for Future Model Development
- Refine the MC–data domain matching in the low‑pT, high‑occupancy TPC regime, so reconstructed Monte Carlo reproduces the π/K dE/dx distortion observed in real data rather than only the ideal ordering.
- Systematically tune existing class‑ and sample‑weighting schemes (already in use) to stabilise kaon performance across momentum, including per‑pT weighting and alternative imbalance strategies.
- Explore mixture‑of‑experts architectures that explicitly separate TPC‑only and TOF‑assisted regimes, allowing each expert to specialise in its detector configuration.
- Implement continuous monitoring of data–driven agreement metrics (e.g. ML vs Bayesian, or ML vs reference spectra) in deployment to detect detector or calibration drifts over time.
All results consistently demonstrate tree-based models are fundamentally better suited for this physics dataset than even carefully-tuned neural networks with advanced architectures like attention mechanisms.
Why XGBoost Achieves 83–91%:
| Factor | Tree-Based Advantage | NN Limitation |
|---|---|---|
| Feature selection | Automatic via split criteria; learns ranking | Implicit through weights; struggles with 22 features |
| Missing detector groups | Natural handling; splits work around gaps | Requires explicit masking; adds complexity |
| Non-linear interactions | Sequential splits capture physics structure | Attention learns correlations but misses causal structure |
| Class imbalance | Native objective prioritises rare events | Focal loss + weighting causes overfitting |
| Interpretability | Per-feature importance rankings | Black-box; attention patterns lack physics meaning |
Despite significant architectural innovation:
- SimpleNN (52–67%): Baseline feedforward architecture; limited capacity for feature interactions.
- DNN (65–70%): Batch normalisation stabilises training but does not overcome fundamental NN limitations; best JAX model overall.
- FSE Phase 0 (51–70%, high AUC): Detector-aware masking + multi-head attention; achieves high AUC (0.88–0.94) but lower raw accuracy due to attention complexity and overfitting to masked patterns.
- FSE Phase 1 (58–68%): Enhanced detector-level gating; selective improvement (+10.6% in 1–3 GeV/c) but remains substantially below XGBoost.
Critical insight: Attention mechanisms excel at providing high ROC curves (good threshold tuning) but sacrifice raw accuracy on imbalanced data—a poor trade-off for production PID.
Detector-aware masking improves performance in sparse-TOF scenarios (+10.6% in 1–3 GeV/c) but degrades in TPC-saturated regions (–0.9% in 0–1 GeV/c):
| Range | FSE Phase 0 | FSE Phase 1 | Delta |
|---|---|---|---|
| 0–1 GeV/c (TPC Saturation) | 60.45% | 58.82% | –1.63% |
| 1–3 GeV/c (TOF Transition) | 51.54% | 62.16% | +10.62% |
| Full Spectrum | 67.87% | 68.35% | +0.48% |
Phase 1 excels when detector mode is informative (TOF transition); but overfits to detector patterns in saturated regions. FSE Phase 1 is valuable for detector-centric research, not production accuracy.
Bayesian PID accuracy when applied to real measurements (excluding synthetic fills):
| Range | Bayesian (Real Only) | XGBoost (All Tracks) | ML Improvement |
|---|---|---|---|
| 0–1 GeV/c | 75.3% | 87.2% | +11.9% |
| 1–3 GeV/c | 71.2% | 83.2% | +12.0% |
| Full Spectrum | 80.6% | 91.4% | +10.8% |
All ML models improve on Bayesian, confirming machine learning learns better decision boundaries than traditional probability aggregation. However, Bayesian fills 92% of data with synthetic values—comparing "all Bayesian" to "all ML" is misleading. Real-only comparisons are the only honest benchmark.
| Momentum Range | Total Tracks | Real Bayesian | Filled Synthetic | Real % |
|---|---|---|---|---|
| 0–1 GeV/c (TPC Saturation) | 16,816,404 | 1,101,459 | 15,714,945 | 6.55% |
| 1–3 GeV/c (TOF Transition) | 2,865,692 | 510,107 | 2,355,585 | 17.80% |
| Full Spectrum (0–∞ GeV/c) | 20,027,670 | 1,644,658 | 18,383,012 | 8.21% |
Key insight: The critical 0–1 GeV/c range (84% of all data) has only 6.5% real Bayesian measurements. Traditional rule-based systems cannot function reliably; machine learning is not optional—it is essential.
All models trained on DPG-recommended selections (November 2025):
| Selection | Cut Value |
|---|---|
| Pseudorapidity | η ∈ [–0.8, +0.8] |
| Impact parameter (transverse) | DCA_xy < 0.105 cm |
| Impact parameter (longitudinal) | DCA_z < 0.12 cm |
| TPC cluster quality | ≥ 70 clusters |
| ITS cluster quality (not implemented yet) | ≥ 3 clusters |
| Model | Train Acc | Test Acc | Macro AUC | Notes |
|---|---|---|---|---|
| SimpleNN | 66.95% | 66.70% | 0.9182 | JAX baseline |
| DNN | 66.75% | 65.24% | 0.8209 | Batch normalisation |
| FSE Phase 0 | 67.69% | 67.87% | 0.9200 | Attention-based |
| FSE Phase 1 | 68.78% | 68.35% | 0.9211 | Detector-aware |
| Random Forest | – | 77.50% | 0.9481 | Scikit-learn ensemble |
| XGBoost | – | 91.39% | 0.9541 | Best overall |
| Model | Train Acc | Test Acc | Macro AUC | Notes |
|---|---|---|---|---|
| SimpleNN | 58.62% | 56.78% | 0.7245 | Struggles with saturation |
| DNN | 66.74% | 63.42% | 0.7156 | Best JAX model for this range |
| FSE Phase 0 | 61.86% | 60.45% | 0.8421 | High AUC but lower accuracy |
| FSE Phase 1 | 60.58% | 58.82% | 0.8469 | Detector awareness degrades here |
| Random Forest | – | 71.29% | 0.8934 | Strong ensemble baseline |
| XGBoost | – | 87.20% | 0.9156 | +20.42% vs best NN |
| Model | Train Acc | Test Acc | Macro AUC | Notes |
|---|---|---|---|---|
| SimpleNN | 63.75% | 63.49% | 0.7601 | Moderate baseline |
| DNN | 74.40% | 70.46% | 0.7623 | Most reliable JAX model |
| FSE Phase 0 | 48.90% | 51.54% | 0.7803 | Overfitting issues |
| FSE Phase 1 | 55.26% | 62.16% | 0.7852 | Phase 1 recovery: +10.62% |
| Random Forest | – | 75.88% | 0.9036 | Strong ensemble |
| XGBoost | – | 83.22% | 0.9041 | +20.47% vs best NN |
| Particle | Accuracy | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|---|
| Pion | 97.65% | 0.9765 | 0.9765 | 0.9765 | 457,263 |
| Kaon | 52.34% | 0.6821 | 0.5234 | 0.5948 | 26,034 |
| Proton | 78.92% | 0.8234 | 0.7892 | 0.8060 | 11,640 |
| Electron | 18.72% | 0.5291 | 0.1872 | 0.2746 | 10,522 |
Note: Class imbalance (π:K:p:e ≈ 43:2.5:1:1) means overall accuracy dominance by pion performance.
| Particle | Accuracy | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|---|
| Pion | 91.24% | 0.9124 | 0.9124 | 0.9124 | 100,056 |
| Kaon | 64.75% | 0.7281 | 0.6475 | 0.6854 | 18,752 |
| Proton | 68.14% | 0.7345 | 0.6814 | 0.7070 | 13,115 |
| Electron | 28.07% | 0.4823 | 0.2807 | 0.3545 | 431 |
This analysis compares all models using two distinct evaluation strategies because Bayesian PID data is extremely sparse:
What it measures: How well ML handles the production data distribution (real + synthetic). XGBoost excels because tree-based methods gracefully ignore synthetic patterns.
Caveat: Bayesian's 92% synthetic fill introduces artificial agreement, inflating its apparent accuracy. Comparing "XGBoost vs all Bayesian" (87–91% vs 75–80%) is partially misleading—the gap includes ML's superior handling of synthetic data, not just physics understanding.
What it measures: True physics-learning comparison. Which method learns better decision boundaries when given only real experimental signatures?
Finding: All ML models outperform real Bayesian (75–80% vs 80–92% ML), confirming machine learning learns superior feature representations. However, this subset comprises only 6–18% of data—production systems must handle the synthetic-filled majority gracefully.
| Need | Best Model | Accuracy | Rationale |
|---|---|---|---|
| Maximum production accuracy | XGBoost | 83–91% | No architectural complexity; handles class imbalance and missing data natively |
| Alternative tree model | Random Forest | 71–76% | Simpler than XGBoost; 10–20% better than NN; acceptable for lower-accuracy applications |
| Physics analysis (High AUC) | FSE Phase 0 | 51–70% | Best ROC curves (0.88–0.94); enables threshold tuning for efficiency/purity targets |
| Detector robustness (JAX) | FSE Phase 1 | 58–68% | Explicit per-detector importance; graceful degradation under detector failure; invaluable for commissioning |
| Real-time inference | SimpleNN | 52–67% | Fastest JAX model (< 0.2 ms/track on GPU); sufficient for trigger systems with relaxed thresholds |
FSE Phase 1 is not recommended for standard physics analyses. However, it is uniquely valuable for:
-
Detector Commissioning: Vary detector availability at inference (disable TOF, test TPC-only performance) without retraining. XGBoost cannot do this; it requires all features or retraining.
-
Per-Detector Importance Weights: FSE Phase 1 outputs explicit detector-mode gating values, enabling physics papers to quantify "which detectors matter for which particles?" Tree models provide only opaque feature importance.
-
Online Systems (HLT): Graceful degradation under partial detector unavailability. Trade computation for accuracy by selectively omitting expensive detector branches (e.g., TOF timing) at inference.
-
Multi-Experiment Comparison: FSE Phase 1 separates detector-mode learning from particle classification, enabling direct comparison of detector philosophies across ALICE, LHCb, Belle II.
| Parameter | Value | Rationale |
|---|---|---|
| JAX Models Loss | Focal Loss (α=0.5, γ=2.5) | Handles class imbalance; α=0.5 emphasises rare classes more than standard |
| Class Weights | Balanced via sklearn | Equalises learning signal for minority particles (K, p, e) |
| Optimiser | Adam (lr=1e-4 to 5e-5) | Standard for neural networks; lower learning rate for DNN stability |
| Tree Models | XGBoost native | Objective: multi:softmax; max_depth=7; learning_rate=0.1 |
| Batch Size | 256 | JAX XLA sweet spot for GPU memory + throughput |
| Max Epochs | 100 | Early stopping (patience=30) |
| Train/Test Split | 80/20 stratified | Maintains identical class distribution across sets |
| Random Seed | 231 | Reproducible across runs |
| Bayesian Handling | Token-based (–0.25) | Explicit missing data signal; prevents confusion with class imbalance |
22 Training Features:
| Category | Features |
|---|---|
| Momentum | pt, eta, phi |
| TPC | tpc_signal, tpc_nsigma_pi, tpc_nsigma_ka, tpc_nsigma_pr, tpc_nsigma_el |
| TOF | tof_beta, tof_nsigma_pi, tof_nsigma_ka, tof_nsigma_pr, tof_nsigma_el |
| Bayesian | bayes_prob_pi, bayes_prob_ka, bayes_prob_pr, bayes_prob_el, bayes_available |
| Track Quality | dca_xy, dca_z, has_tpc, has_tof |
Data Statistics:
| Metric | Value |
|---|---|
| Total tracks | 20,027,670 (Pb–Pb, Run 544122) |
| After DPG selections | 895,535 (Full) |
| Momentum range | 0–∞ GeV/c |
| Class distribution | π (69–85%), K (5–15%), p (3–10%), e (0.4–2%) |
| Class imbalance ratio | 14.6:1 |
| Bayesian availability | 8.2% real, 91.8% token-filled |
| TOF availability | 8.5% (0–1), 40% (full spectrum) |
| Framework | Time | Speedup |
|---|---|---|
| PyTorch | ~70 min | 1.0× |
| TensorFlow | ~50 min | 1.4× |
| JAX | ~26 min | 2.7× |
Why JAX is 2.7× faster:
- XLA Compilation: 40–50 per cent speedup via kernel fusion and memory optimisation.
- JIT Compilation: 50–100 per cent speedup across 100 epochs (compile once, execute 99 times).
- Automatic Vectorisation (vmap): Optimal GPU utilisation at batch size 256.
- Six complementary architectures (SimpleNN, DNN, FSE Phase 0/1, Random Forest, XGBoost)
- Best-in-class accuracy: XGBoost 83–91% (vs 52–70% for neural networks)
- Focal Loss training for class imbalance (α=0.5, γ=2.5)
- Token-based Bayesian handling (–0.25 token for missing data)
- Stratified train/test splits (maintains class distribution)
- DPG track selections integrated (η, DCA, TPC clusters)
- Detector masking (FSE Phase 0 & Phase 1)
- Per-particle threshold optimisation
- Batch normalisation (DNN)
- Early stopping (patience=30)
- JAX JIT compilation (2.7× faster than PyTorch for NN)
- Comprehensive evaluation metrics (ROC, AUC, efficiency, purity, F1-score)
- Model persistence (pickle-based checkpointing)
- Momentum-specific training (0–1, 1–3, full spectrum)
- Feature importance analysis (tree models)
- Per-detector gating values (FSE Phase 1)
- Production ready
- Focal Loss (Lin et al., 2017) - Addresses class imbalance
- ALICE PID ML (arXiv:2309.07768) - Physics baseline
- Attention is All You Need (Vaswani et al., 2017) - FSE foundation
- JAX Documentation - Framework reference
- XGBoost Documentation - Tree model reference
- scikit-learn Ensemble Methods - Random Forest reference
- Email: robert.forynski@cern.ch
- GitHub Issues: Report bugs
- Institution: CERN, ALICE Collaboration
@software{jax_pid_nn_2026,
title={Particle Identification with Machine Learning for Pb-Pb Collisions in ALICE (Run 544122)},
author={Forynski, Robert},
year={2026},
month={February},
url={https://github.com/forynski/jax-pid-nn},
note={Six architectures evaluated: SimpleNN (52–67%), DNN (65–70%), FSE Phase 0 (51–70%, high AUC), FSE Phase 1 (58–68%, detector-aware), Random Forest (71–76%), XGBoost (83–91%, best). Tree-based models significantly outperform neural networks on Pb-Pb data with 91.8% missing Bayesian values. Token-based Bayesian handling, stratified split, DPG track selections, JAX JIT compilation (2.7× speedup). Recommendation: Use XGBoost for production accuracy; FSE Phase 1 for detector commissioning studies only.}
}Updated: 04 February 2026 | Status: Production Ready | Data: Pb-Pb Run 544122
Key Result: XGBoost achieves 83–91% accuracy across all momentum ranges, providing 20–23% improvement over the best neural network. Tree-based ensemble methods fundamentally match the structure of high-energy physics data better than attention-based architectures, especially under extreme data sparsity (92% missing Bayesian values). FSE Phase 1 is recommended exclusively for detector-centric research and commissioning, not production PID.