Statistical inference on persistence diagrams for the JuliaTDA ecosystem: bootstrap confidence sets and two-sample permutation tests for persistent homology.
Persistent homology turns a point cloud into a persistence diagram: a multiset of birth-death pairs, one per topological feature (connected components in dimension 0, loops in dimension 1, voids in dimension 2, and so on). The persistence of a feature — its death minus its birth — measures how long it survives across scales, and is the usual proxy for how "real" it is. But raw persistence is descriptive, not inferential: it tells you a feature is long-lived, not whether it is statistically distinguishable from sampling noise. Two questions go unanswered. Given one sample, which features are signal and which are noise? Given two samples, do they have the same topology?
Statistical TDA answers both. For a single sample, the bootstrap confidence
set of Fasy et al. (2014) resamples the data, measures how much the
persistence diagram moves under resampling, and converts that into a
confidence band of half-width c around the diagonal of the diagram. By the
stability theorem, the true diagram lies within bottleneck distance c of the
observed one with the chosen confidence level, so any feature whose
persistence exceeds 2c is distinguishable from noise. For two samples, the
permutation test of Robinson & Turner (2017) uses a Wasserstein distance
between the two diagrams as a test statistic and calibrates it by repeatedly
pooling and relabelling the points, yielding an exact, distribution-free
p-value for the null hypothesis that the two clouds share the same topology.
This capability is the strategic differentiator of the Julia TDA stack. In R,
the TDA and TDApplied packages have offered bootstrap confidence sets and
topological hypothesis tests for years; in Python, GUDHI, giotto-tda and
scikit-tda have no comparable inferential layer. PersistenceInference.jl
makes Julia the second ecosystem, after R, with proper statistical inference
on persistence diagrams — and the only one of the two built on a fast,
composable, multiple-dispatch foundation.
PersistenceInference.jl builds on the JuliaTDA forks of
Ripserer.jl and
PersistenceDiagrams.jl.
Until everything is registered, develop the dependencies from their URLs:
using Pkg
Pkg.develop(url = "https://github.com/JuliaTDA/Ripserer.jl")
Pkg.develop(url = "https://github.com/JuliaTDA/PersistenceDiagrams.jl")
Pkg.develop(url = "https://github.com/JuliaTDA/PersistenceInference.jl")The package depends only on Ripserer, PersistenceDiagrams, Statistics
and Random — deliberately lean. It accepts plain vectors of points (tuples
or SVectors), so it composes with MetricSpaces.EuclideanSpace (which is a
vector of SVector points) without taking a dependency on MetricSpaces.jl.
Resample a point cloud, measure how far each bootstrap diagram sits from the
reference diagram (per homology dimension), and read off the (1 - α) quantile
as the confidence-set radius c. Features with persistence above 2c are
significant.
using PersistenceInference, Random
# a noisy circle: should have exactly one significant loop (H1)
rng = Random.Xoshiro(42)
n = 60
θ = range(0, 2π; length = n + 1)[1:n]
X = [(cos(t) + 0.05 * randn(rng), sin(t) + 0.05 * randn(rng)) for t in θ]
result = bootstrap_diagram(X; n_boot = 100, alpha = 0.05, dim_max = 1, rng = rng)
result # pretty summary with per-dimension radii
result.radius[2] # confidence half-width c for H1
significant(result, 1) # the H1 features distinguishable from noisesignificant(result, dim) returns the intervals of the reference diagram in
homology dimension dim whose persistence exceeds 2 * result.radius[dim + 1].
A subsample_size keyword switches to the faster m-out-of-n bootstrap, and
distance lets you use Wasserstein(q) instead of the default Bottleneck().
Test whether two point clouds have the same topology in a given dimension. The
statistic is the q-Wasserstein loss W_q^q between the two diagrams; the
null distribution is generated by pooling the points and re-splitting them at
random.
using PersistenceInference, Random
circle(n, σ, rng) = [(cos(t) + σ*randn(rng), sin(t) + σ*randn(rng))
for t in range(0, 2π; length = n+1)[1:n]]
blob(n, rng) = [(2*rand(rng) - 1, 2*rand(rng) - 1) for _ in 1:n]
rng = Random.Xoshiro(1)
X = circle(30, 0.05, rng)
Y = blob(30, rng)
result = permutation_test(X, Y; n_perm = 999, dim = 1, q = 2, rng = rng)
result.pvalue # small: circle and blob differ topologically in H1The p-value is (1 + #{perm stat ≥ observed}) / (1 + n_perm), so it is always
in (0, 1] and valid for any number of permutations.
Cost.
permutation_testrecomputes persistent homology for every permutation, so its cost isO(n_perm × ripserer). Ripserer is fast on small clouds; still, keep the cloud sizes andn_permmodest. Passprogress=truefor sparse progress output on longer runs.
- B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan & A. Singh (2014). Confidence sets for persistence diagrams. The Annals of Statistics, 42(6), 2301–2339. https://doi.org/10.1214/14-AOS1252
- A. Robinson & K. Turner (2017). Hypothesis testing for topological data analysis. Journal of Applied and Computational Topology, 1(2), 241–261. https://doi.org/10.1007/s41468-017-0008-7
- B. Phipson & G. K. Smyth (2010). Permutation P-values should never be
zero. Statistical Applications in Genetics and Molecular Biology, 9(1).
(Motivates the
+1correction in the permutation p-value.)
Part of the JuliaTDA ecosystem. This package closes the single biggest feature gap identified in the 2026 ecosystem review (§3 A.1): statistical inference on persistence diagrams. With it, Julia joins R as one of only two language ecosystems offering bootstrap confidence sets and topological hypothesis tests out of the box.