Methods like Img2ST-Net predict gene expression from H&E histology at ~50µm spot resolution (standard Visium). With Visium HD's 2µm bins, we can push to subcellular resolution — but this introduces severe challenges.
The sparsity problem: At 2µm, most bins contain zero UMI counts:
| Patient | UMI>0 Fraction | Resolution |
|---|---|---|
| P1 | 3.4% | 2µm (128×128 per patch) |
| P2 | 5.2% | 2µm (128×128 per patch) |
| P5 | 6.1% | 2µm (128×128 per patch) |
Computed from raw count matrices in processed_crc_raw_counts/
The standard approach — normalizing counts to z-scores and using MSE loss — fails at 2µm:
MSE encourages: minimize (prediction - target)²
With ~95% zeros in ground truth:
→ Predicting 0 everywhere minimizes loss!
→ Model learns flat, uninformative predictions
This is regression to the mean: the model hedges by predicting low values everywhere.
Spatial transcriptomics produces count data (non-negative integers). The Poisson distribution naturally models count data:
Poisson NLL Loss: L = λ - k × log(λ)
where k = observed counts, λ = predicted rate
Why this works:
- Model outputs
log(λ)(rate parameter) - Predicting λ→0 when k>0 → infinite loss (can't explain observed counts)
- Predicting high λ when k=0 → moderate penalty (expected sometimes)
The model is forced to predict high values where counts exist, not hedge with averages.
Both models trained on P1+P2, tested on P5:
| Model | Loss | Data | 8µm PCC | 4µm PCC | 2µm PCC |
|---|---|---|---|---|---|
| v6.3b | MSE | z-scored | 0.442 | 0.325 | 0.193 |
| v7 | Poisson | raw counts | 0.526 | 0.461 | 0.355 |
| Improvement | 1.19× | 1.42× | 1.84× |
Metrics from results/*/best_metrics.json, computed as masked PCC averaged over 50 genes
Poisson loss provides 1.84× better correlation at 2µm resolution.
PIGR gene: WSI mosaic of 36 high-signal patches (UMI > 50). MSE produces noisy predictions (PCC=0.09, SSIM=0.04). Poisson captures tissue structure (PCC=0.26, SSIM=0.14). Gray regions = non-tissue mask.
- Metrics: Masked PCC and masked SSIM (only tissue regions, mask coverage ≥92%)
- SSIM: Windowed (7×7), computed on normalized 0-1 range within tissue mask
- Resolution: Predictions at native 2µm (128×128), coarser via sum-pooling
- Test set: Patient P5 (570 patches), trained on P1+P2
| Gene | 8µm PCC | 8µm SSIM | 4µm PCC | 4µm SSIM | 2µm PCC | 2µm SSIM |
|---|---|---|---|---|---|---|
| MT-CYB | 0.775 | 0.652 | 0.679 | 0.716 | 0.501 | 0.842 |
| MT-CO2 | 0.775 | 0.681 | 0.681 | 0.726 | 0.505 | 0.823 |
| MT-ATP6 | 0.769 | 0.709 | 0.688 | 0.754 | 0.527 | 0.845 |
| MT-CO3 | 0.768 | 0.686 | 0.676 | 0.755 | 0.500 | 0.867 |
| MT-ND4 | 0.745 | 0.652 | 0.660 | 0.715 | 0.496 | 0.830 |
| CEACAM5 | 0.657 | 0.688 | 0.510 | 0.710 | 0.318 | 0.799 |
| PIGR | 0.643 | 0.770 | 0.532 | 0.802 | 0.364 | 0.857 |
From results/v7_poisson_testP5_20251221_085015/wsi_figures/visualization_metrics.json
Key insight: SSIM increases at finer resolutions — structural patterns are preserved even when exact count matching (PCC) is harder.
| Resolution | Mean PCC | Mean SSIM | Coverage |
|---|---|---|---|
| 8µm | 0.526 | 0.171 | N/A |
| 4µm | 0.461 | 0.290 | 92.3% |
| 2µm | 0.355 | 0.548 | 92.1% |
From best_metrics.json
Single patch: 2µm predictions better capture gland ring structures, blocky/low-res at 8µm.
H&E Image (224×224 pixels, ~256µm patch)
↓
Virchow2 Encoder (frozen, 632M params)
↓ [1280-dim patch embeddings]
Hist2ST Decoder (CNN + Transformer + GNN)
↓
log(λ) predictions (128×128 × 50 genes)
↓ exp()
Expected counts (λ) at 2µm resolution
| Parameter | Value |
|---|---|
| Loss | Poisson NLL |
| Multi-scale | 2µm → 4µm → 8µm → 16µm (sum-pooling) |
| Loss balancing | GradNorm (α=1.5, lr=0.025) |
| Epochs | 30 (early stopped at 21, best at epoch 11) |
| Batch size | 8 × 4 gradient accumulation |
| Learning rate | 5e-5 with 2-epoch warmup |
| Optimizer | AdamW |
Training only at 2µm fails even with Poisson loss — the signal is too sparse. We use count-conserving sum-pooling:
# Sum-pooling preserves total counts
labels_4um = F.avg_pool2d(labels_2um, 2, 2) * 4 # 64×64
labels_8um = F.avg_pool2d(labels_2um, 4, 4) * 16 # 32×32
labels_16um = F.avg_pool2d(labels_2um, 8, 8) * 64 # 16×16GradNorm dynamically weights losses so coarser (cleaner) signals guide early training.
Predicting WSI 2µm-resolution gene expression from H&E histology using Virchow2 + Poisson Loss
The subcellular resolution improvement is less striking visually at the WSI level, but 2um wins at the structural SSIM metric across genes
MT-ATP6 gene expression: H&E input → Model predictions at 8µm, 4µm, and 2µm resolution vs. ground truth Visium HD data.
--- ### Mitochondrial Genespython scripts/train_poisson_v7.py \
--data_dir /path/to/processed_crc_raw_counts \
--raw_data_dir /path/to/crc_hd \
--test_patient P5 \
--epochs 40 \
--batch_size 8 \
--lr 5e-5 \
--use_gradnorm# WSI multi-scale figures
python scripts/visualize_v7_multiscale.py \
--model_dir results/v7_poisson_testP5_YYYYMMDD_HHMMSS \
--genes MT-ATP6 PIGR CEACAM5
# Patch-level comparisons (with PCC and SSIM)
python scripts/visualize_v7_patches.py \
--model_dir results/v7_poisson_testP5_YYYYMMDD_HHMMSS \
--genes MT-ATP6 PIGRTraining uses the 10x Genomics Visium HD CRC dataset:
- Resolution: 2µm bins (vs 8µm standard Visium)
- Genes: Top 50 by variance
- Patients: P1, P2 (train), P5 (test)
- Preprocessing: Raw UMI counts, no normalization
- Single fold: Results are P1+P2 → P5 only; LOOCV needed for generalization claims
- CRC only: Tested on colorectal cancer; other tissue types may differ
- Top 50 genes: High-variance genes selected; rare transcripts not evaluated
MIT License









