comprisk — a Python toolkit for competing risks. v0.3 ships a scalable, scikit-learn-compatible competing-risks random survival forest; v0.4 adds Fine-Gray subdistribution-hazard regression, a stand-alone Aalen-Johansen cumulative-incidence estimator, Gray's K-sample test, and cause-specific Cox PH (see Roadmap). Designed to remove the Python → R workflow split that applied researchers currently endure for competing-risks survival analysis.
Status: alpha. API and internals may change before v1.0. Renamed from
crforestin 0.3.1 —pip install comprisk,from comprisk import CompetingRiskForest.
- Forest today, regression next. v0.3 ships the only native Python competing-risks RSF: cause-specific log-rank splitting + composite CR log-rank, Aalen-Johansen CIF, Nelson-Aalen CHF, Wolbers + Uno IPCW concordance, OOB Breiman VIMP, Ishwaran minimal-depth variable selection. See the Roadmap for what v0.4 adds.
- 10–22× faster than randomForestSRC on real EHR data (CHF 14–22×, SEER 11.6×; full tables in docs/benchmarks.md), with C ≈ 0.85 on both libraries. ~95× faster than rfSRC built without OpenMP (default R-on-macOS).
- Order-of-magnitude faster than scikit-survival (16.6× at n = 5k, 544× at n = 50k), without disabling CIF/CHF outputs.
- Bit-identical to randomForestSRC with
equivalence="rfsrc"— reproduces the per-tree mtry/nsplit RNG stream for paper-grade reproducibility, sensitivity checks, and rfSRC-baseline migrations.
| comprisk | randomForestSRC | scikit-survival | |
|---|---|---|---|
| Language | Python | R | Python |
| Native competing risks | ✓ | ✓ | ✗ (single-event only) |
| Aalen–Johansen CIF output | ✓ | ✓ | n/a |
| Cumulative hazard at scale | ✓ | ✓ | ✗¹ |
| OOB permutation VIMP | ✓ | ✓ | ✗ |
| Bit-identical reproducibility mode | ✓ (equivalence="rfsrc") |
— | n/a |
| Scales to n = 10⁶ | ✓ (63 s on i7) | memory-bound past n ≈ 500 000 on consumer hardware | ✗¹ / OOM² |
| Default parallelism | ✓ (n_jobs=-1) |
OpenMP (build-dependent; macOS Apple clang lacks it) | ✓ |
| GPU preview | ✓ (CUDA 12) | ✗ | ✗ |
¹ sksurv RandomSurvivalForest(low_memory=True) is the only mode that
scales beyond ~10k samples, but it disables predict_cumulative_hazard_function
and predict_survival_function (raises NotImplementedError).
² sksurv low_memory=False exposes CHF / survival outputs but stores per-leaf
full CHF arrays; peak RSS reaches 16.8 GB at n = 5k on synthetic, OOMs
(> 21.5 GB) at n = 10k on a 24 GB host.
pip install comprisk # or: uv add comprisk
pip install "comprisk[gpu]" # or: uv add 'comprisk[gpu]'Requires Python ≥ 3.10. Core dependencies: numpy, scipy, pandas, joblib, numba, scikit-learn. GPU extra adds cupy + CUDA 12 runtime libs (preview; faster only at low feature count today, full rewrite scheduled for v1.1).
import numpy as np
from comprisk import CompetingRiskForest
# Toy competing-risks data: 500 subjects, 6 features, 2 causes (+ censoring).
rng = np.random.default_rng(42)
n = 500
X = rng.normal(size=(n, 6))
time = rng.exponential(2.0, size=n) + 0.1
event = rng.choice([0, 1, 2], size=n, p=[0.4, 0.4, 0.2]) # 0 = censored
# Fit. Defaults: n_estimators=100, max_features="sqrt", logrankCR, n_jobs=-1.
forest = CompetingRiskForest(n_estimators=100, random_state=42).fit(X, time, event)
# Per-subject risk score for cause 1 (suitable for Wolbers C-index).
risk = forest.predict_risk(X[:5], cause=1)
# Aalen-Johansen cumulative incidence over the forest's chosen time grid.
cif = forest.predict_cif(X[:5]) # (5, n_causes, n_times)
cif_at = forest.predict_cif(X[:5], times=[1.0, 2.0, 5.0])
# Cause-specific Wolbers concordance.
print("C-index, cause 1:", forest.score(X, time, event, cause=1))
# OOB permutation VIMP, scored with Uno IPCW.
vimp = forest.compute_importance(random_state=42)
print(vimp.sort_values("composite_vimp", ascending=False).head())Explain cause-specific CIF predictions with exact TreeSHAP (Lundberg 2018).
Output shape is (n_samples, n_features, n_times, n_causes):
shap, base = forest.shap_values(X[:10])
# shap[0, :, 0, 0] -> feature attributions for subject 0, first time, cause 1
# base[0, 0] -> expected CIF baseline for that (time, cause)
# Additivity check: attributions + baseline reconstruct the CIF
reconstructed = shap.sum(axis=1) + base # (n, n_times, n_causes)
assert np.allclose(reconstructed.transpose(0, 2, 1),
forest.predict_cif(X[:10]))
# Rank features by mean absolute SHAP (global importance)
mean_abs = np.abs(shap).mean(axis=(0, 2, 3))
top_features = np.argsort(mean_abs)[::-1][:5]
# Slice for a fixed (time, cause) — compatible with shap.summary_plot
shap_slice = shap[:, :, -1, 0] # last timepoint, cause 1 (n, p)Backed by Lundberg (2018) Algorithm 2; bit-exact to shap.TreeExplainer
at any fixed (cause, time) slice. Wall time scales linearly with
n_explain and n_times — pass a focused times= grid (clinical
horizons) rather than the default full event-time grid.
Indicative wall (10-core commodity machine, p = 58, ntree = 100, n_times = 10):
~40 s at n_train = 10k, n_explain = 200; ~3.5 min at n_train = 10k, n_explain = 1000.
Thread parallelism saturates near n_jobs ≈ 4 (memory-bandwidth bound).
Rank features by Ishwaran's minimal-depth criterion and apply the forest- averaged null-distribution threshold from Ishwaran et al. (2010, JASA, Theorem 1 + Section 3):
forest = CompetingRiskForest(n_estimators=200, random_state=0).fit(X, time, event)
vs = forest.minimal_depth()
selected = vs.loc[vs["selected"], "feature"].tolist()Variables with mean minimal depth below the threshold are flagged as
informative. Pass return_extra=True to additionally inspect quartiles
and per-feature usage rates across trees.
Note on rfSRC compatibility: this implements the paper's forest-averaged
threshold (Section 3); randomForestSRC::max.subtree defaults to a
tree-averaged threshold, so the threshold scalar differs. Per-feature
mean minimal depth values are bit-equivalent under matched fit config
(equivalence='rfsrc', bootstrap=False, min_samples_split=2*nodesize, min_samples_leaf=1, max_depth=None).
See docs/quickstart.md for the full walkthrough — data format, prediction shapes, cross-validation, GPU, and migrating from rfSRC.
scikit-learn drop-in.
CompetingRiskForestis a real sklearn estimator (BaseEstimator,clone()-friendly, picklable).cross_val_score,KFold,Pipelinework without a wrapper — passSurv.from_arrays(event, time)as theyargument, or use the legacy 3-argfit(X, time, event)form. Full example in docs/quickstart.md § Cross-validation.
comprisk is intentionally CR-focused. For non-CR survival methods (general Cox PH, AFT, parametric, deep-survival, Kaplan-Meier as a standalone API), use lifelines or scikit-survival.
| Version | Module | Status |
|---|---|---|
| v0.3 | CompetingRiskForest (CR-RSF) |
Shipped |
| v0.4 | FineGrayRegression (subdistribution hazard) |
Planned (Q3-Q4 2026) |
| v0.4 | CumulativeIncidence (stand-alone Aalen-Johansen) |
Planned (Q3-Q4 2026) |
| v0.4 | gray_test (Gray's K-sample log-rank) |
Planned (Q3-Q4 2026) |
| v0.4 | CauseSpecificCox (CR-aware censoring) |
Planned (Q3-Q4 2026) |
| v1.0 | API freeze + JMLR MLOSS submission | Planned |
| v1.1 | Full GPU rewrite | Planned |
Headline numbers — full tables, methodology, and reproducibility scripts in docs/benchmarks.md.
vs randomForestSRC, matched-pair on real EHR data:
| Cohort | n × p | Hardware | comprisk | rfSRC OMP-on | Speedup |
|---|---|---|---|---|---|
| CHF (cardio) | 75k × 58 | Apple M4 / i7-14700K / HPC | 5.6–9.4 s | 84.8–207.3 s | 14–22× |
| SEER breast (oncology) | 238k × 17 | HPC Xeon Gold 6148 | 7.0 s | 81.6 s | 11.6× |
Both libraries fit similarly well at every tested workload (HF / cancer-specific C ≈ 0.85). The 10–22× cross-dataset band tracks feature count: rfSRC's per-split exhaustive scan scales with p, so the gap narrows on lower-p cohorts. ~95× speedup vs rfSRC built without OpenMP (default R-on-macOS install).
vs scikit-survival, paired on i7-14700K — synthetic 2-cause Weibull, p = 58, both libraries at their best config:
| n | sksurv low_memory=True |
comprisk | speedup |
|---|---|---|---|
| 5 000 | 18.2 s | 1.10 s | 16.6× |
| 50 000 | 2935 s (49 min) | 5.40 s | 544× |
The gap widens super-linearly (sksurv ≈ n^2.2; comprisk ≈ n^0.7).
comprisk also provides Aalen-Johansen CIF + Nelson-Aalen CHF that
sksurv low_memory=True raises NotImplementedError for.
Scaling on a consumer desktop: n = 10⁶ in 63 s on i7-14700K,
14.5 GB RSS. Reproducible via
validation/spikes/lambda/exp5_paper_scale_bench.py.
Full parameter list in src/comprisk/forest.py;
usage by task in docs/quickstart.md. Two splitrules
are available: logrankCR (composite competing-risks log-rank, default)
and logrank (cause-specific).
- Quickstart — common tasks with runnable code
- PRD — what comprisk aims to be at v1.0
- Equivalence vs rfSRC — cross-library validation methodology
- References — algorithmic provenance (Park-Miller, Bays-Durham, Wolbers 2009, Uno 2011, Cole & Hernán 2008, Breiman 2001, Ishwaran 2008/2014, etc.)
Requires uv.
uv venv
uv pip install -e ".[dev]"
uv run pre-commit install
uv run pytest
uv run ruff check .
uv run ruff format --check .Apache-2.0. See LICENSE and NOTICE.
@software{yang_comprisk_2026,
author = {Yang, Sunny and Zhao, Wanqi},
title = {{comprisk: a Python toolkit for competing risks}},
year = {2026},
publisher = {Zenodo},
version = {0.3.1},
doi = {10.5281/zenodo.19876282},
url = {https://doi.org/10.5281/zenodo.19876282},
}DOI is concept-level (always resolves to the latest version). GitHub's
"Cite this repository" button generates a version-specific record from
CITATION.cff. Algorithmic references in
docs/REFERENCES.md.