Skip to content

JuliaTDA/PersistenceInference.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PersistenceInference.jl

CI

Statistical inference on persistence diagrams for the JuliaTDA ecosystem: bootstrap confidence sets and two-sample permutation tests for persistent homology.

What is statistical TDA?

Persistent homology turns a point cloud into a persistence diagram: a multiset of birth-death pairs, one per topological feature (connected components in dimension 0, loops in dimension 1, voids in dimension 2, and so on). The persistence of a feature — its death minus its birth — measures how long it survives across scales, and is the usual proxy for how "real" it is. But raw persistence is descriptive, not inferential: it tells you a feature is long-lived, not whether it is statistically distinguishable from sampling noise. Two questions go unanswered. Given one sample, which features are signal and which are noise? Given two samples, do they have the same topology?

Statistical TDA answers both. For a single sample, the bootstrap confidence set of Fasy et al. (2014) resamples the data, measures how much the persistence diagram moves under resampling, and converts that into a confidence band of half-width c around the diagonal of the diagram. By the stability theorem, the true diagram lies within bottleneck distance c of the observed one with the chosen confidence level, so any feature whose persistence exceeds 2c is distinguishable from noise. For two samples, the permutation test of Robinson & Turner (2017) uses a Wasserstein distance between the two diagrams as a test statistic and calibrates it by repeatedly pooling and relabelling the points, yielding an exact, distribution-free p-value for the null hypothesis that the two clouds share the same topology.

This capability is the strategic differentiator of the Julia TDA stack. In R, the TDA and TDApplied packages have offered bootstrap confidence sets and topological hypothesis tests for years; in Python, GUDHI, giotto-tda and scikit-tda have no comparable inferential layer. PersistenceInference.jl makes Julia the second ecosystem, after R, with proper statistical inference on persistence diagrams — and the only one of the two built on a fast, composable, multiple-dispatch foundation.

Installation

PersistenceInference.jl builds on the JuliaTDA forks of Ripserer.jl and PersistenceDiagrams.jl. Until everything is registered, develop the dependencies from their URLs:

using Pkg
Pkg.develop(url = "https://github.com/JuliaTDA/Ripserer.jl")
Pkg.develop(url = "https://github.com/JuliaTDA/PersistenceDiagrams.jl")
Pkg.develop(url = "https://github.com/JuliaTDA/PersistenceInference.jl")

The package depends only on Ripserer, PersistenceDiagrams, Statistics and Random — deliberately lean. It accepts plain vectors of points (tuples or SVectors), so it composes with MetricSpaces.EuclideanSpace (which is a vector of SVector points) without taking a dependency on MetricSpaces.jl.

The two functions

bootstrap_diagram — confidence sets (Fasy et al. 2014)

Resample a point cloud, measure how far each bootstrap diagram sits from the reference diagram (per homology dimension), and read off the (1 - α) quantile as the confidence-set radius c. Features with persistence above 2c are significant.

using PersistenceInference, Random

# a noisy circle: should have exactly one significant loop (H1)
rng = Random.Xoshiro(42)
n = 60
θ = range(0, 2π; length = n + 1)[1:n]
X = [(cos(t) + 0.05 * randn(rng), sin(t) + 0.05 * randn(rng)) for t in θ]

result = bootstrap_diagram(X; n_boot = 100, alpha = 0.05, dim_max = 1, rng = rng)

result                      # pretty summary with per-dimension radii
result.radius[2]            # confidence half-width c for H1
significant(result, 1)      # the H1 features distinguishable from noise

significant(result, dim) returns the intervals of the reference diagram in homology dimension dim whose persistence exceeds 2 * result.radius[dim + 1]. A subsample_size keyword switches to the faster m-out-of-n bootstrap, and distance lets you use Wasserstein(q) instead of the default Bottleneck().

permutation_test — two-sample testing (Robinson & Turner 2017)

Test whether two point clouds have the same topology in a given dimension. The statistic is the q-Wasserstein loss W_q^q between the two diagrams; the null distribution is generated by pooling the points and re-splitting them at random.

using PersistenceInference, Random

circle(n, σ, rng) = [(cos(t) + σ*randn(rng), sin(t) + σ*randn(rng))
                     for t in range(0, 2π; length = n+1)[1:n]]
blob(n, rng)      = [(2*rand(rng) - 1, 2*rand(rng) - 1) for _ in 1:n]

rng = Random.Xoshiro(1)
X = circle(30, 0.05, rng)
Y = blob(30, rng)

result = permutation_test(X, Y; n_perm = 999, dim = 1, q = 2, rng = rng)

result.pvalue   # small: circle and blob differ topologically in H1

The p-value is (1 + #{perm stat ≥ observed}) / (1 + n_perm), so it is always in (0, 1] and valid for any number of permutations.

Cost. permutation_test recomputes persistent homology for every permutation, so its cost is O(n_perm × ripserer). Ripserer is fast on small clouds; still, keep the cloud sizes and n_perm modest. Pass progress=true for sparse progress output on longer runs.

References

  • B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan & A. Singh (2014). Confidence sets for persistence diagrams. The Annals of Statistics, 42(6), 2301–2339. https://doi.org/10.1214/14-AOS1252
  • A. Robinson & K. Turner (2017). Hypothesis testing for topological data analysis. Journal of Applied and Computational Topology, 1(2), 241–261. https://doi.org/10.1007/s41468-017-0008-7
  • B. Phipson & G. K. Smyth (2010). Permutation P-values should never be zero. Statistical Applications in Genetics and Molecular Biology, 9(1). (Motivates the +1 correction in the permutation p-value.)

Positioning

Part of the JuliaTDA ecosystem. This package closes the single biggest feature gap identified in the 2026 ecosystem review (§3 A.1): statistical inference on persistence diagrams. With it, Julia joins R as one of only two language ecosystems offering bootstrap confidence sets and topological hypothesis tests out of the box.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages