Skip to content

sunnyadn/comprisk

Repository files navigation

comprisk

PyPI version CI DOI

comprisk — a Python toolkit for competing risks. v0.3 ships a scalable, scikit-learn-compatible competing-risks random survival forest; v0.4 adds Fine-Gray subdistribution-hazard regression, a stand-alone Aalen-Johansen cumulative-incidence estimator, Gray's K-sample test, and cause-specific Cox PH (see Roadmap). Designed to remove the Python → R workflow split that applied researchers currently endure for competing-risks survival analysis.

Status: alpha. API and internals may change before v1.0. Renamed from crforest in 0.3.1pip install comprisk, from comprisk import CompetingRiskForest.

Highlights

  • Forest today, regression next. v0.3 ships the only native Python competing-risks RSF: cause-specific log-rank splitting + composite CR log-rank, Aalen-Johansen CIF, Nelson-Aalen CHF, Wolbers + Uno IPCW concordance, OOB Breiman VIMP, Ishwaran minimal-depth variable selection. See the Roadmap for what v0.4 adds.
  • 10–22× faster than randomForestSRC on real EHR data (CHF 14–22×, SEER 11.6×; full tables in docs/benchmarks.md), with C ≈ 0.85 on both libraries. ~95× faster than rfSRC built without OpenMP (default R-on-macOS).
  • Order-of-magnitude faster than scikit-survival (16.6× at n = 5k, 544× at n = 50k), without disabling CIF/CHF outputs.
  • Bit-identical to randomForestSRC with equivalence="rfsrc" — reproduces the per-tree mtry/nsplit RNG stream for paper-grade reproducibility, sensitivity checks, and rfSRC-baseline migrations.

comprisk vs alternatives

comprisk randomForestSRC scikit-survival
Language Python R Python
Native competing risks ✗ (single-event only)
Aalen–Johansen CIF output n/a
Cumulative hazard at scale ✗¹
OOB permutation VIMP
Bit-identical reproducibility mode ✓ (equivalence="rfsrc") n/a
Scales to n = 10⁶ ✓ (63 s on i7) memory-bound past n ≈ 500 000 on consumer hardware ✗¹ / OOM²
Default parallelism ✓ (n_jobs=-1) OpenMP (build-dependent; macOS Apple clang lacks it)
GPU preview ✓ (CUDA 12)

¹ sksurv RandomSurvivalForest(low_memory=True) is the only mode that scales beyond ~10k samples, but it disables predict_cumulative_hazard_function and predict_survival_function (raises NotImplementedError). ² sksurv low_memory=False exposes CHF / survival outputs but stores per-leaf full CHF arrays; peak RSS reaches 16.8 GB at n = 5k on synthetic, OOMs (> 21.5 GB) at n = 10k on a 24 GB host.

Install

pip install comprisk          # or:  uv add comprisk
pip install "comprisk[gpu]"   # or:  uv add 'comprisk[gpu]'

Requires Python ≥ 3.10. Core dependencies: numpy, scipy, pandas, joblib, numba, scikit-learn. GPU extra adds cupy + CUDA 12 runtime libs (preview; faster only at low feature count today, full rewrite scheduled for v1.1).

Quickstart

import numpy as np
from comprisk import CompetingRiskForest

# Toy competing-risks data: 500 subjects, 6 features, 2 causes (+ censoring).
rng = np.random.default_rng(42)
n = 500
X = rng.normal(size=(n, 6))
time = rng.exponential(2.0, size=n) + 0.1
event = rng.choice([0, 1, 2], size=n, p=[0.4, 0.4, 0.2])  # 0 = censored

# Fit. Defaults: n_estimators=100, max_features="sqrt", logrankCR, n_jobs=-1.
forest = CompetingRiskForest(n_estimators=100, random_state=42).fit(X, time, event)

# Per-subject risk score for cause 1 (suitable for Wolbers C-index).
risk = forest.predict_risk(X[:5], cause=1)

# Aalen-Johansen cumulative incidence over the forest's chosen time grid.
cif = forest.predict_cif(X[:5])                       # (5, n_causes, n_times)
cif_at = forest.predict_cif(X[:5], times=[1.0, 2.0, 5.0])

# Cause-specific Wolbers concordance.
print("C-index, cause 1:", forest.score(X, time, event, cause=1))

# OOB permutation VIMP, scored with Uno IPCW.
vimp = forest.compute_importance(random_state=42)
print(vimp.sort_values("composite_vimp", ascending=False).head())

TreeSHAP explanations

Explain cause-specific CIF predictions with exact TreeSHAP (Lundberg 2018). Output shape is (n_samples, n_features, n_times, n_causes):

shap, base = forest.shap_values(X[:10])
# shap[0, :, 0, 0]  -> feature attributions for subject 0, first time, cause 1
# base[0, 0]        -> expected CIF baseline for that (time, cause)

# Additivity check: attributions + baseline reconstruct the CIF
reconstructed = shap.sum(axis=1) + base  # (n, n_times, n_causes)
assert np.allclose(reconstructed.transpose(0, 2, 1),
                   forest.predict_cif(X[:10]))

# Rank features by mean absolute SHAP (global importance)
mean_abs = np.abs(shap).mean(axis=(0, 2, 3))
top_features = np.argsort(mean_abs)[::-1][:5]

# Slice for a fixed (time, cause) — compatible with shap.summary_plot
shap_slice = shap[:, :, -1, 0]   # last timepoint, cause 1  (n, p)

Backed by Lundberg (2018) Algorithm 2; bit-exact to shap.TreeExplainer at any fixed (cause, time) slice. Wall time scales linearly with n_explain and n_times — pass a focused times= grid (clinical horizons) rather than the default full event-time grid.

Indicative wall (10-core commodity machine, p = 58, ntree = 100, n_times = 10): ~40 s at n_train = 10k, n_explain = 200; ~3.5 min at n_train = 10k, n_explain = 1000. Thread parallelism saturates near n_jobs ≈ 4 (memory-bandwidth bound).

Variable selection

Rank features by Ishwaran's minimal-depth criterion and apply the forest- averaged null-distribution threshold from Ishwaran et al. (2010, JASA, Theorem 1 + Section 3):

forest = CompetingRiskForest(n_estimators=200, random_state=0).fit(X, time, event)
vs = forest.minimal_depth()
selected = vs.loc[vs["selected"], "feature"].tolist()

Variables with mean minimal depth below the threshold are flagged as informative. Pass return_extra=True to additionally inspect quartiles and per-feature usage rates across trees.

Note on rfSRC compatibility: this implements the paper's forest-averaged threshold (Section 3); randomForestSRC::max.subtree defaults to a tree-averaged threshold, so the threshold scalar differs. Per-feature mean minimal depth values are bit-equivalent under matched fit config (equivalence='rfsrc', bootstrap=False, min_samples_split=2*nodesize, min_samples_leaf=1, max_depth=None).

See docs/quickstart.md for the full walkthrough — data format, prediction shapes, cross-validation, GPU, and migrating from rfSRC.

scikit-learn drop-in. CompetingRiskForest is a real sklearn estimator (BaseEstimator, clone()-friendly, picklable). cross_val_score, KFold, Pipeline work without a wrapper — pass Surv.from_arrays(event, time) as the y argument, or use the legacy 3-arg fit(X, time, event) form. Full example in docs/quickstart.md § Cross-validation.

Roadmap

comprisk is intentionally CR-focused. For non-CR survival methods (general Cox PH, AFT, parametric, deep-survival, Kaplan-Meier as a standalone API), use lifelines or scikit-survival.

Version Module Status
v0.3 CompetingRiskForest (CR-RSF) Shipped
v0.4 FineGrayRegression (subdistribution hazard) Planned (Q3-Q4 2026)
v0.4 CumulativeIncidence (stand-alone Aalen-Johansen) Planned (Q3-Q4 2026)
v0.4 gray_test (Gray's K-sample log-rank) Planned (Q3-Q4 2026)
v0.4 CauseSpecificCox (CR-aware censoring) Planned (Q3-Q4 2026)
v1.0 API freeze + JMLR MLOSS submission Planned
v1.1 Full GPU rewrite Planned

Benchmarks

Headline numbers — full tables, methodology, and reproducibility scripts in docs/benchmarks.md.

vs randomForestSRC, matched-pair on real EHR data:

Cohort n × p Hardware comprisk rfSRC OMP-on Speedup
CHF (cardio) 75k × 58 Apple M4 / i7-14700K / HPC 5.6–9.4 s 84.8–207.3 s 14–22×
SEER breast (oncology) 238k × 17 HPC Xeon Gold 6148 7.0 s 81.6 s 11.6×

Both libraries fit similarly well at every tested workload (HF / cancer-specific C ≈ 0.85). The 10–22× cross-dataset band tracks feature count: rfSRC's per-split exhaustive scan scales with p, so the gap narrows on lower-p cohorts. ~95× speedup vs rfSRC built without OpenMP (default R-on-macOS install).

vs scikit-survival, paired on i7-14700K — synthetic 2-cause Weibull, p = 58, both libraries at their best config:

n sksurv low_memory=True comprisk speedup
5 000 18.2 s 1.10 s 16.6×
50 000 2935 s (49 min) 5.40 s 544×

The gap widens super-linearly (sksurv ≈ n^2.2; comprisk ≈ n^0.7). comprisk also provides Aalen-Johansen CIF + Nelson-Aalen CHF that sksurv low_memory=True raises NotImplementedError for.

Scaling on a consumer desktop: n = 10⁶ in 63 s on i7-14700K, 14.5 GB RSS. Reproducible via validation/spikes/lambda/exp5_paper_scale_bench.py.

API

Full parameter list in src/comprisk/forest.py; usage by task in docs/quickstart.md. Two splitrules are available: logrankCR (composite competing-risks log-rank, default) and logrank (cause-specific).

Documentation

  • Quickstart — common tasks with runnable code
  • PRD — what comprisk aims to be at v1.0
  • Equivalence vs rfSRC — cross-library validation methodology
  • References — algorithmic provenance (Park-Miller, Bays-Durham, Wolbers 2009, Uno 2011, Cole & Hernán 2008, Breiman 2001, Ishwaran 2008/2014, etc.)

Development

Requires uv.

uv venv
uv pip install -e ".[dev]"
uv run pre-commit install
uv run pytest
uv run ruff check .
uv run ruff format --check .

License

Apache-2.0. See LICENSE and NOTICE.

Citation

@software{yang_comprisk_2026,
  author    = {Yang, Sunny and Zhao, Wanqi},
  title     = {{comprisk: a Python toolkit for competing risks}},
  year      = {2026},
  publisher = {Zenodo},
  version   = {0.3.1},
  doi       = {10.5281/zenodo.19876282},
  url       = {https://doi.org/10.5281/zenodo.19876282},
}

DOI is concept-level (always resolves to the latest version). GitHub's "Cite this repository" button generates a version-specific record from CITATION.cff. Algorithmic references in docs/REFERENCES.md.

About

Python toolkit for competing risks: forest (RSF) today; Fine-Gray + Aalen-Johansen + Gray's test + cause-specific Cox in v0.4. Scales to n=10⁶ in ~1 min, 10–22× faster than randomForestSRC on real EHR data, sklearn-compatible.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors