JAX-PID-NN: Particle Identification in Challenging Momentum Regions

High-performance JAX/Flax neural networks for particle identification in ALICE Run 3

Includes six complementary architectures: SimpleNN, DNN, FSE + Attention (Phase 0), FSE + Attention (Detector-Aware) (Phase 1), Random Forest, and XGBoost

JAX | Production-ready with XLA compilation | Token-based Bayesian Handling | DPG Track Selections

Executive Summary

JAX-PID-NN is a comprehensive machine learning framework for particle identification (PID) in ALICE at the LHC, optimised for challenging momentum regions. The framework addresses the fundamental problem of extreme data sparsity: in the critical 0–1 GeV/c range, 91.8 per cent of Bayesian PID measurements are missing, making traditional rule-based approaches unreliable. Four neural architectures — SimpleNN, DNN, FSE + Attention (Phase 0) and FSE + Attention (Detector-Aware, Phase 1) — are implemented in JAX/Flax, exploiting XLA compilation and JIT for fast training and inference, while two tree-based baselines — Random Forest (scikit-learn) and XGBoost (native XGBoost library) — are trained using their respective optimised libraries. Together, these six complementary models are evaluated on Pb–Pb Monte Carlo reconstructed data to understand which paradigm best handles high-dimensional, partially-missing physics data.

Headline Finding: Tree-Based Models Dominate

XGBoost substantially outperforms all neural network variants across all momentum ranges, achieving 83–91 per cent accuracy versus 52–70 per cent for neural networks. This is not a minor difference that hyperparameter tuning can bridge—it reflects a fundamental advantage of gradient-boosted ensembles over attention-based neural networks for this physics task.

Momentum Range	SimpleNN	DNN	FSE Phase 0	FSE Phase 1	Random Forest	XGBoost	Improvement vs Best NN
Full Spectrum (0–∞ GeV/c)	66.70%	65.24%	67.87%	68.35%	77.50%	91.39%	+22.69%
0–1 GeV/c (TPC Saturation, 84% of data)	56.78%	63.42%	60.45%	58.82%	71.29%	87.20%	+20.42%
1–3 GeV/c (TOF Transition, 14% of data)	63.49%	70.46%	51.54%	62.16%	75.88%	83.22%	+20.47%

Why XGBoost Wins: Physics Structure vs Architectural Flexibility

Tree-based ensemble methods naturally exploit the structure of high-energy physics data:

Feature selection robustness: Trees automatically identify which features matter at each stage of the decision tree. Neural networks struggle with the high dimensionality (22 features) and must learn implicit feature importance through millions of parameters.
Missing data resilience: Tree splits gracefully handle entire detector groups being unavailable (e.g., TOF missing in 92% of 0–1 GeV/c events). Neural networks require explicit masking or token-based approaches, which add complexity without matching tree performance.
Non-linear feature interactions: Gradient boosting captures complex feature synergies (e.g., TPC signal + momentum + TOF beta for kaon separation) through sequential tree splits. Attention mechanisms, whilst mathematically elegant, do not align with the physics of detector response.
Class imbalance (π:K:p:e ≈ 14:1): Tree algorithms' native objective functions handle imbalance better than focal loss + class weighting. XGBoost's weighted splits intrinsically prioritise minority classes without the overfitting risk of neural network reweighting.

Bayesian PID Availability: The Data Crisis

Real Bayesian measurements comprise only 8.2 per cent of the full dataset (1.6M of 20M tracks). In the critical 0–1 GeV/c range, this drops to 6.5 per cent, with 91.8 per cent of data filled with synthetic predictions. All models improve substantially on real Bayesian data, confirming that ML is superior to traditional Bayesian on high-quality data, but in production, must handle extreme missing data gracefully.

Recommendation: Use XGBoost for maximum accuracy. Use FSE Phase 1 only for detector commissioning studies where interpretability of per-detector performance is more valuable than accuracy.

ML Inference on Real Raw Data (LHC23zzh Run 544122)

Deployment on 7.47M Real Tracks

This section validates the XGBoost models trained on reconstructed (simulated + detector noise effect added) Pb–Pb data on 7,473,918 raw LHC23 tracks from run 544122. The goal is to test robustness to real detector effects, saturation, and sim-to-real domain gaps.

Dataset Characteristics

Metric	Value
Total tracks	7,473,918
pT range	0.115–19.996 GeV/c
Pseudorapidity	η ∈ [–0.8, +0.8] (100%)
DCA selections	100% pass DPG cuts (DCA_xy < 0.105 cm, DCA_z < 0.12 cm)
TPC-only tracks	4,999,692 (66.9%)
TPC+TOF tracks	2,474,225 (33.1%)
Bayesian PID available	2,474,225 (33.1%)
Bayesian PID missing (token)	4,999,693 (66.9%)

All tracks satisfy the standard ALICE Run 3 DPG quality selections, ensuring a clean physics sample.

ML Production Fractions (XGBoost Predictions)

Particle	Count	Fraction	⟨pT⟩ (GeV/c)	⟨η⟩	Mean Confidence
Pion	7,057,284	94.4%	0.690	–0.001	0.9124
Kaon	265,243	3.5%	1.272	–0.023	0.5938
Proton	143,514	1.9%	1.451	0.018	0.5839
Electron	7,877	0.1%	0.340	–0.226	0.5703

The spectrum is pion-dominated, as expected in Pb–Pb, with kaons and protons at the few-per-cent level and electrons extremely rare.

Physics Validation on Real Data

TOF β Ordering (Correct)

Particle	Mean β	Expected	Status
Pion	0.9071	Highest	Correct (π > K)
Kaon	0.8920	Lower than π, higher than p	Correct (π > K > p)
Proton	0.8649	Lower than K	Correct
Electron	0.8464	Lowest	Correct

The time-of-flight response obeys the expected mass ordering (lighter particles travel faster), confirming that TOF-based discrimination is physically sound in real data.

TPC dE/dx Ordering (Broken in Real Data)

Particle	Mean dE/dx	Expected Ordering	Observed	Status
Pion	54,727	π < K < p	π > K	Problematic
Kaon	49,733	π < K < p	K < π	Inverted
Proton	408,277	> K	> K	Correct
Electron	1,452,242	Highest	Highest	Correct

In real data, the pion dE/dx is observed to be higher than the kaon dE/dx in the low-pT region, violating the expected Bethe–Bloch mass ordering (π < K). This behaviour is consistent with TPC response saturation in the 0–1 GeV/c range, where the separation power between π and K using dE/dx alone becomes significantly reduced.

Although the XGBoost model was trained on fully reconstructed Monte Carlo samples, which include realistic detector effects simulated with GEANT (such as energy loss fluctuations, electronic noise, and detector resolution), the training still reflects the nominal detector response where the π/K ordering is largely preserved. Consequently, when applied to real data—where residual mismodelling, local saturation effects, or calibration imperfections may distort the dE/dx response—the model cannot fully recover the separation without additional information from TOF.

ML vs Bayesian PID Agreement

Agreement is evaluated for tracks with both ML prediction and non-zero Bayesian probabilities.

Particle	ML–Bayesian Agreement	Interpretation
Pion	79.1%	Good agreement
Kaon	38.2%	Poor; critical underidentification
Proton	55.2%	Moderate agreement
Electron	70.5%	Reasonable agreement

Overall agreement is 76.4 per cent, but the kaon channel is problematic, with ML and Bayesian disagreeing in more than 60 per cent of cases. Bayesian PID finds roughly twice as many kaons as the ML model in the overlapping sample.

Confidence Distribution (All Real Tracks)

Confidence Range	Fraction of Tracks	Interpretation
> 0.99	53.7%	Very high confidence (mostly TPC+TOF)
0.95–0.99	8.3%	High confidence
0.90–0.95	5.0%	Medium confidence
0.80–0.90	8.2%	Medium–low confidence
< 0.80	24.8%	Low confidence (mostly TPC-only)

The distribution is bimodal: roughly half the tracks have confidence > 0.99, while nearly a quarter fall below 0.80. This matches expectations: TPC+TOF tracks are typically high-confidence, whereas TPC-only saturated tracks are much more ambiguous.

Momentum-Dependent Production Fractions

pT Bin (GeV/c)	Pion	Kaon	Proton	Electron	Tracks
0–0.5	98.6%	1.1%	0.1%	0.2%	3,290,052
0.5–1.0	93.9%	4.8%	1.3%	0.1%	2,542,127
1.0–1.5	92.0%	1.8%	6.3%	0.0%	983,468
1.5–2.0	85.3%	8.5%	6.2%	0.0%	389,341
2.0–3.0	72.2%	19.6%	8.2%	0.0%	218,173
3.0–5.0	60.9%	28.1%	11.1%	0.0%	46,060
5.0–10.0	66.0%	24.5%	9.4%	0.1%	4,374
> 10	78.0%	17.3%	4.6%	0.0%	323

This pattern is physically reasonable:

Low pT (< 1 GeV/c): Pion-dominated, reflecting high multiplicity of soft pions.
Intermediate pT (1.5–3 GeV/c): Kaon fraction increases substantially, peaking around 2–3 GeV/c (19–28%).
Higher pT (> 3 GeV/c): Proton fraction rises, kaon and proton fractions remain at the 10–30 per cent level.

The kaon peak in the TOF region confirms that TOF-assisted discrimination is working and that the XGBoost model is capturing the correct physics at higher transverse momentum.

Summary of Real-Data Inference

Strengths

TOF β ordering is correct for all species.
High-pT kaon and proton fractions match expectations from hadron production.
Confidence scores are well-calibrated and strongly correlated with detector availability.
100 per cent of real tracks receive an ML prediction (7.47M tracks).

Critical Issues

TPC dE/dx ordering is inverted for π/K in real data due to saturation, breaking the assumptions used in training.
ML underestimates kaon yield (3.5 per cent vs ~8 per cent from Bayesian), particularly in low-pT, TPC-only regime.
ML–Bayesian agreement for kaons is only 38 per cent.

Recommendations for Production Use

Apply a confidence > 0.90 threshold for physics analyses (covers ≈ 67 per cent of tracks with high purity).
Use ML kaon predictions only above pT > 1.5 GeV/c, where TOF is available and TPC saturation is less severe.
For 0–1 GeV/c, rely on Bayesian PID and dedicated low-pT analyses; treat ML kaon predictions as indicative rather than final.
Cross-check all rare-species yields (kaons, protons, electrons) against Bayesian results and known spectra.

Recommendations for Future Model Development

Refine the MC–data domain matching in the low‑pT, high‑occupancy TPC regime, so reconstructed Monte Carlo reproduces the π/K dE/dx distortion observed in real data rather than only the ideal ordering.
Systematically tune existing class‑ and sample‑weighting schemes (already in use) to stabilise kaon performance across momentum, including per‑pT weighting and alternative imbalance strategies.
Explore mixture‑of‑experts architectures that explicitly separate TPC‑only and TOF‑assisted regimes, allowing each expert to specialise in its detector configuration.
Implement continuous monitoring of data–driven agreement metrics (e.g. ML vs Bayesian, or ML vs reference spectra) in deployment to detect detector or calibration drifts over time.

Key Findings

1. Tree-Based Models Dramatically Outperform Neural Networks

All results consistently demonstrate tree-based models are fundamentally better suited for this physics dataset than even carefully-tuned neural networks with advanced architectures like attention mechanisms.

Why XGBoost Achieves 83–91%:

Factor	Tree-Based Advantage	NN Limitation
Feature selection	Automatic via split criteria; learns ranking	Implicit through weights; struggles with 22 features
Missing detector groups	Natural handling; splits work around gaps	Requires explicit masking; adds complexity
Non-linear interactions	Sequential splits capture physics structure	Attention learns correlations but misses causal structure
Class imbalance	Native objective prioritises rare events	Focal loss + weighting causes overfitting
Interpretability	Per-feature importance rankings	Black-box; attention patterns lack physics meaning

2. Neural Networks Show Consistent Performance Plateaus

Despite significant architectural innovation:

SimpleNN (52–67%): Baseline feedforward architecture; limited capacity for feature interactions.
DNN (65–70%): Batch normalisation stabilises training but does not overcome fundamental NN limitations; best JAX model overall.
FSE Phase 0 (51–70%, high AUC): Detector-aware masking + multi-head attention; achieves high AUC (0.88–0.94) but lower raw accuracy due to attention complexity and overfitting to masked patterns.
FSE Phase 1 (58–68%): Enhanced detector-level gating; selective improvement (+10.6% in 1–3 GeV/c) but remains substantially below XGBoost.

Critical insight: Attention mechanisms excel at providing high ROC curves (good threshold tuning) but sacrifice raw accuracy on imbalanced data—a poor trade-off for production PID.

3. FSE Phase 1 Detector-Aware Provides Stability, Not Superior Accuracy

Detector-aware masking improves performance in sparse-TOF scenarios (+10.6% in 1–3 GeV/c) but degrades in TPC-saturated regions (–0.9% in 0–1 GeV/c):

Range	FSE Phase 0	FSE Phase 1	Delta
0–1 GeV/c (TPC Saturation)	60.45%	58.82%	–1.63%
1–3 GeV/c (TOF Transition)	51.54%	62.16%	+10.62%
Full Spectrum	67.87%	68.35%	+0.48%

Phase 1 excels when detector mode is informative (TOF transition); but overfits to detector patterns in saturated regions. FSE Phase 1 is valuable for detector-centric research, not production accuracy.

4. Bayesian PID Baseline Outperformed on Real Data

Bayesian PID accuracy when applied to real measurements (excluding synthetic fills):

Range	Bayesian (Real Only)	XGBoost (All Tracks)	ML Improvement
0–1 GeV/c	75.3%	87.2%	+11.9%
1–3 GeV/c	71.2%	83.2%	+12.0%
Full Spectrum	80.6%	91.4%	+10.8%

All ML models improve on Bayesian, confirming machine learning learns better decision boundaries than traditional probability aggregation. However, Bayesian fills 92% of data with synthetic values—comparing "all Bayesian" to "all ML" is misleading. Real-only comparisons are the only honest benchmark.

Data Quality & Track Selections

Bayesian Availability (The Core Problem)

Momentum Range	Total Tracks	Real Bayesian	Filled Synthetic	Real %
0–1 GeV/c (TPC Saturation)	16,816,404	1,101,459	15,714,945	6.55%
1–3 GeV/c (TOF Transition)	2,865,692	510,107	2,355,585	17.80%
Full Spectrum (0–∞ GeV/c)	20,027,670	1,644,658	18,383,012	8.21%

Key insight: The critical 0–1 GeV/c range (84% of all data) has only 6.5% real Bayesian measurements. Traditional rule-based systems cannot function reliably; machine learning is not optional—it is essential.

Track Quality Filtering

All models trained on DPG-recommended selections (November 2025):

Selection	Cut Value
Pseudorapidity	η ∈ [–0.8, +0.8]
Impact parameter (transverse)	DCA_xy < 0.105 cm
Impact parameter (longitudinal)	DCA_z < 0.12 cm
TPC cluster quality	≥ 70 clusters
ITS cluster quality (not implemented yet)	≥ 3 clusters

Actual Model Performance (January 2026, Monte Carlo Reconstructed LHC25f6 Pb-Pb Run 544122)

Full Spectrum (0–∞ GeV/c)

Model	Train Acc	Test Acc	Macro AUC	Notes
SimpleNN	66.95%	66.70%	0.9182	JAX baseline
DNN	66.75%	65.24%	0.8209	Batch normalisation
FSE Phase 0	67.69%	67.87%	0.9200	Attention-based
FSE Phase 1	68.78%	68.35%	0.9211	Detector-aware
Random Forest	–	77.50%	0.9481	Scikit-learn ensemble
XGBoost	–	91.39%	0.9541	Best overall

0–1 GeV/c (TPC Saturation, 84% of Data)

Model	Train Acc	Test Acc	Macro AUC	Notes
SimpleNN	58.62%	56.78%	0.7245	Struggles with saturation
DNN	66.74%	63.42%	0.7156	Best JAX model for this range
FSE Phase 0	61.86%	60.45%	0.8421	High AUC but lower accuracy
FSE Phase 1	60.58%	58.82%	0.8469	Detector awareness degrades here
Random Forest	–	71.29%	0.8934	Strong ensemble baseline
XGBoost	–	87.20%	0.9156	+20.42% vs best NN

1–3 GeV/c (TOF Transition, 14% of Data)

Model	Train Acc	Test Acc	Macro AUC	Notes
SimpleNN	63.75%	63.49%	0.7601	Moderate baseline
DNN	74.40%	70.46%	0.7623	Most reliable JAX model
FSE Phase 0	48.90%	51.54%	0.7803	Overfitting issues
FSE Phase 1	55.26%	62.16%	0.7852	Phase 1 recovery: +10.62%
Random Forest	–	75.88%	0.9036	Strong ensemble
XGBoost	–	83.22%	0.9041	+20.47% vs best NN

Per-Class Performance (XGBoost, All Tracks)

0–1 GeV/c

Particle	Accuracy	Precision	Recall	F1-Score	Support
Pion	97.65%	0.9765	0.9765	0.9765	457,263
Kaon	52.34%	0.6821	0.5234	0.5948	26,034
Proton	78.92%	0.8234	0.7892	0.8060	11,640
Electron	18.72%	0.5291	0.1872	0.2746	10,522

Note: Class imbalance (π:K:p:e ≈ 43:2.5:1:1) means overall accuracy dominance by pion performance.

1–3 GeV/c

Particle	Accuracy	Precision	Recall	F1-Score	Support
Pion	91.24%	0.9124	0.9124	0.9124	100,056
Kaon	64.75%	0.7281	0.6475	0.6854	18,752
Proton	68.14%	0.7345	0.6814	0.7070	13,115
Electron	28.07%	0.4823	0.2807	0.3545	431

Methodology: Two-Tier Comparison Framework

This analysis compares all models using two distinct evaluation strategies because Bayesian PID data is extremely sparse:

All Tracks (Includes 92% Synthetic-Filled Bayesian)

What it measures: How well ML handles the production data distribution (real + synthetic). XGBoost excels because tree-based methods gracefully ignore synthetic patterns.

Caveat: Bayesian's 92% synthetic fill introduces artificial agreement, inflating its apparent accuracy. Comparing "XGBoost vs all Bayesian" (87–91% vs 75–80%) is partially misleading—the gap includes ML's superior handling of synthetic data, not just physics understanding.

Real Bayesian Only (Excluding Synthetic)

What it measures: True physics-learning comparison. Which method learns better decision boundaries when given only real experimental signatures?

Finding: All ML models outperform real Bayesian (75–80% vs 80–92% ML), confirming machine learning learns superior feature representations. However, this subset comprises only 6–18% of data—production systems must handle the synthetic-filled majority gracefully.

Architecture Recommendations

By Use Case

Need	Best Model	Accuracy	Rationale
Maximum production accuracy	XGBoost	83–91%	No architectural complexity; handles class imbalance and missing data natively
Alternative tree model	Random Forest	71–76%	Simpler than XGBoost; 10–20% better than NN; acceptable for lower-accuracy applications
Physics analysis (High AUC)	FSE Phase 0	51–70%	Best ROC curves (0.88–0.94); enables threshold tuning for efficiency/purity targets
Detector robustness (JAX)	FSE Phase 1	58–68%	Explicit per-detector importance; graceful degradation under detector failure; invaluable for commissioning
Real-time inference	SimpleNN	52–67%	Fastest JAX model (< 0.2 ms/track on GPU); sufficient for trigger systems with relaxed thresholds

FSE Phase 1: When It Wins Over XGBoost

FSE Phase 1 is not recommended for standard physics analyses. However, it is uniquely valuable for:

Detector Commissioning: Vary detector availability at inference (disable TOF, test TPC-only performance) without retraining. XGBoost cannot do this; it requires all features or retraining.
Per-Detector Importance Weights: FSE Phase 1 outputs explicit detector-mode gating values, enabling physics papers to quantify "which detectors matter for which particles?" Tree models provide only opaque feature importance.
Online Systems (HLT): Graceful degradation under partial detector unavailability. Trade computation for accuracy by selectively omitting expensive detector branches (e.g., TOF timing) at inference.
Multi-Experiment Comparison: FSE Phase 1 separates detector-mode learning from particle classification, enabling direct comparison of detector philosophies across ALICE, LHCb, Belle II.

Technical Specification

Training Configuration

Parameter	Value	Rationale
JAX Models Loss	Focal Loss (α=0.5, γ=2.5)	Handles class imbalance; α=0.5 emphasises rare classes more than standard
Class Weights	Balanced via sklearn	Equalises learning signal for minority particles (K, p, e)
Optimiser	Adam (lr=1e-4 to 5e-5)	Standard for neural networks; lower learning rate for DNN stability
Tree Models	XGBoost native	Objective: multi:softmax; max_depth=7; learning_rate=0.1
Batch Size	256	JAX XLA sweet spot for GPU memory + throughput
Max Epochs	100	Early stopping (patience=30)
Train/Test Split	80/20 stratified	Maintains identical class distribution across sets
Random Seed	231	Reproducible across runs
Bayesian Handling	Token-based (–0.25)	Explicit missing data signal; prevents confusion with class imbalance

Dataset

22 Training Features:

Category	Features
Momentum	pt, eta, phi
TPC	tpc_signal, tpc_nsigma_pi, tpc_nsigma_ka, tpc_nsigma_pr, tpc_nsigma_el
TOF	tof_beta, tof_nsigma_pi, tof_nsigma_ka, tof_nsigma_pr, tof_nsigma_el
Bayesian	bayes_prob_pi, bayes_prob_ka, bayes_prob_pr, bayes_prob_el, bayes_available
Track Quality	dca_xy, dca_z, has_tpc, has_tof

Data Statistics:

Metric	Value
Total tracks	20,027,670 (Pb–Pb, Run 544122)
After DPG selections	895,535 (Full)
Momentum range	0–∞ GeV/c
Class distribution	π (69–85%), K (5–15%), p (3–10%), e (0.4–2%)
Class imbalance ratio	14.6:1
Bayesian availability	8.2% real, 91.8% token-filled
TOF availability	8.5% (0–1), 40% (full spectrum)

JAX Performance

Training Speed (Neural Networks Only)

Framework	Time	Speedup
PyTorch	~70 min	1.0×
TensorFlow	~50 min	1.4×
JAX	~26 min	2.7×

Why JAX is 2.7× faster:

XLA Compilation: 40–50 per cent speedup via kernel fusion and memory optimisation.
JIT Compilation: 50–100 per cent speedup across 100 epochs (compile once, execute 99 times).
Automatic Vectorisation (vmap): Optimal GPU utilisation at batch size 256.

Features

Six complementary architectures (SimpleNN, DNN, FSE Phase 0/1, Random Forest, XGBoost)
Best-in-class accuracy: XGBoost 83–91% (vs 52–70% for neural networks)
Focal Loss training for class imbalance (α=0.5, γ=2.5)
Token-based Bayesian handling (–0.25 token for missing data)
Stratified train/test splits (maintains class distribution)
DPG track selections integrated (η, DCA, TPC clusters)
Detector masking (FSE Phase 0 & Phase 1)
Per-particle threshold optimisation
Batch normalisation (DNN)
Early stopping (patience=30)
JAX JIT compilation (2.7× faster than PyTorch for NN)
Comprehensive evaluation metrics (ROC, AUC, efficiency, purity, F1-score)
Model persistence (pickle-based checkpointing)
Momentum-specific training (0–1, 1–3, full spectrum)
Feature importance analysis (tree models)
Per-detector gating values (FSE Phase 1)
Production ready

References

Focal Loss (Lin et al., 2017) - Addresses class imbalance
ALICE PID ML (arXiv:2309.07768) - Physics baseline
Attention is All You Need (Vaswani et al., 2017) - FSE foundation
JAX Documentation - Framework reference
XGBoost Documentation - Tree model reference
scikit-learn Ensemble Methods - Random Forest reference

Contact and Support

Email: robert.forynski@cern.ch
GitHub Issues: Report bugs
Institution: CERN, ALICE Collaboration

Citation

@software{jax_pid_nn_2026,
  title={Particle Identification with Machine Learning for Pb-Pb Collisions in ALICE (Run 544122)},
  author={Forynski, Robert},
  year={2026},
  month={February},
  url={https://github.com/forynski/jax-pid-nn},
  note={Six architectures evaluated: SimpleNN (52–67%), DNN (65–70%), FSE Phase 0 (51–70%, high AUC), FSE Phase 1 (58–68%, detector-aware), Random Forest (71–76%), XGBoost (83–91%, best). Tree-based models significantly outperform neural networks on Pb-Pb data with 91.8% missing Bayesian values. Token-based Bayesian handling, stratified split, DPG track selections, JAX JIT compilation (2.7× speedup). Recommendation: Use XGBoost for production accuracy; FSE Phase 1 for detector commissioning studies only.}
}

Updated: 04 February 2026 | Status: Production Ready | Data: Pb-Pb Run 544122

Key Result: XGBoost achieves 83–91% accuracy across all momentum ranges, providing 20–23% improvement over the best neural network. Tree-based ensemble methods fundamentally match the structure of high-energy physics data better than attention-based architectures, especially under extreme data sparsity (92% missing Bayesian values). FSE Phase 1 is recommended exclusively for detector-centric research and commissioning, not production PID.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
LICENSE		LICENSE
README.md		README.md
jax-pid-nn.ipynb		jax-pid-nn.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

JAX-PID-NN: Particle Identification in Challenging Momentum Regions

Executive Summary

Headline Finding: Tree-Based Models Dominate

Why XGBoost Wins: Physics Structure vs Architectural Flexibility

Bayesian PID Availability: The Data Crisis

ML Inference on Real Raw Data (LHC23zzh Run 544122)

Deployment on 7.47M Real Tracks

Dataset Characteristics

ML Production Fractions (XGBoost Predictions)

Physics Validation on Real Data

ML vs Bayesian PID Agreement

Confidence Distribution (All Real Tracks)

Momentum-Dependent Production Fractions

Summary of Real-Data Inference

Key Findings

1. Tree-Based Models Dramatically Outperform Neural Networks

2. Neural Networks Show Consistent Performance Plateaus

3. FSE Phase 1 Detector-Aware Provides Stability, Not Superior Accuracy

4. Bayesian PID Baseline Outperformed on Real Data

Data Quality & Track Selections

Bayesian Availability (The Core Problem)

Track Quality Filtering

Actual Model Performance (January 2026, Monte Carlo Reconstructed LHC25f6 Pb-Pb Run 544122)

Full Spectrum (0–∞ GeV/c)

0–1 GeV/c (TPC Saturation, 84% of Data)

1–3 GeV/c (TOF Transition, 14% of Data)

Per-Class Performance (XGBoost, All Tracks)

0–1 GeV/c

1–3 GeV/c

Methodology: Two-Tier Comparison Framework

All Tracks (Includes 92% Synthetic-Filled Bayesian)

Real Bayesian Only (Excluding Synthetic)

Architecture Recommendations

By Use Case

FSE Phase 1: When It Wins Over XGBoost

Technical Specification

Training Configuration

Dataset

JAX Performance

Training Speed (Neural Networks Only)

Features

References

Contact and Support

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages