Skip to content

Add Bootstrap Wald Tests for 3-Step LCA Distal Outcomes#74

Open
CravingCrates wants to merge 2 commits into
Labo-Lacourse:masterfrom
CravingCrates:master
Open

Add Bootstrap Wald Tests for 3-Step LCA Distal Outcomes#74
CravingCrates wants to merge 2 commits into
Labo-Lacourse:masterfrom
CravingCrates:master

Conversation

@CravingCrates
Copy link
Copy Markdown

⚠️ BIG DISCLAIMER: VIBE CODED ⚠️

This PR was largely "vibe coded" and heavily relies on conceptual implementations of statistical tests. It requires a thorough review by someone with a strong background in statistics/math to double-check the core logic, matrix alignments, and multiple testing corrections. It might make for a great addition to the package, but please do not merge without mathematical validation. I don't have a statistics background or enough knowledge on the math to really evaluate this.

What this new code does

This PR introduces the WaldTest3Step class to bootstrap.py, which provides a non-parametric bootstrap procedure to compute standard errors, confidence intervals, and Wald tests for 3-step Latent Class Analysis (LCA) distal outcome models.

I love this package and use it a lot to do analysis, but one thing that bugged me is that the hypothesis testing is still left to the reader and many commercial statistic applications solve this issue, so I wanted to try and make a small contribution.
Feel free to deny it, if it's just too much work to review or doesn't fit into the scope of the project right now. No hard feelings!

Key features include:

  1. Bootstrapped Estimates: Automatically extracts point estimates and distributions for outcome variables per latent class using the structural component of a fitted StepMix model.
  2. Pairwise Wald Tests: Tests the difference in parameters ($\theta_j = \theta_k$) between all combinations of latent classes, calculating $z$, $\chi^2$, and p-values based on the bootstrap covariance matrix.
  3. Omnibus Wald Tests: Runs a global test to reject the null hypothesis that all class means/probabilities are equal across the $K$ classes for a given outcome.
  4. Multiple Comparison Corrections: Supports Bonferroni-Holm and Benjamini-Hochberg (FDR) corrections for the pairwise test p-values.
  5. Formatted Summary: Provides a clean summary() method to quickly view estimates, omnibus stats, and pairwise test results with significance stars.

Example usage:

from stepmix.stepmix import StepMix
from stepmix.datasets import data_bakk_complete

# 1. Fetch some sample data
X, Y, _ = data_bakk_complete(n_samples=1000, sep_level=0.9, random_state=0)

# 2. Configure your original StepMix model
model = StepMix(
    n_components=3, 
    n_steps=3,
    measurement="bernoulli", 
    structural="gaussian_unit",
    random_state=0, 
    verbose=0
)

# 3. Fit as normal
model.fit(X, Y)

# 4. This computes inference estimates internally using the `stepmix.bootstrap` API
wt = model.wald_test(
    X, 
    Y, 
    n_repetitions=500, 
    correction="BH",   # "BH" or "Bonferroni" or None
    random_state=42
)

# 5. Get your structured results
wt.summary()

The output could look like this:

Fitting StepMix...
Initializations (n_init) : 100%|██████████████| 1/1 [00:00<00:00, 44.92it/s, max_LL=-2.96e+3, max_avg_LL=-2.96]

Bootstrapping estimator...
Bootstrap Repetitions    : 100%|█| 500/500 [00:14<00:00, 34.26it/s, max_LL=-5.69e+3, median_LL=-5.87e+3, min_LL
======================================================================
  3-Step LCA Outcome Analysis – Wald Tests & P-values
======================================================================
  Latent classes (K)   : 3
  Bootstrap reps (B)   : 500
  CI level             : 95%
  Significance codes   : *** p<0.001 | ** p<0.01 | * p<0.05 | . p<0.1

======================================================================
  OUTCOME PARAMETER ESTIMATES (per latent class)
======================================================================

  Variable: feature_0
     Class    Estimate         SE       CI lo       CI hi
    ------------------------------------------------------------
         0      1.9040     0.0635      1.7756      2.0241
         1      4.0676     0.0511      3.9597      4.1614
         2      2.8834     0.0663      2.7565      3.0183

  Variable: feature_1
     Class    Estimate         SE       CI lo       CI hi
    ------------------------------------------------------------
         0      0.7733     0.0572      0.6723      0.8815
         1      0.0222     0.0534     -0.0703      0.1277
         2     -0.9903     0.0586     -1.0988     -0.8808

======================================================================
  OMNIBUS WALD TEST  (H₀: all class means / probs are equal)
======================================================================

        Variable          χ²    df     p-value   sig
  --------------------------------------------------
       feature_0    730.2155     2      0.0000  ***
       feature_1    461.9245     2      0.0000  ***

======================================================================
  PAIRWISE WALD TESTS  (H₀: θⱼ = θₖ  for each class pair j < k)
  Multiple-comparison correction: BH (FDR)
======================================================================

  Variable: feature_0
       j     k          θⱼ          θₖ           Δ       SE(Δ)         z     χ²(1)    p-value      p_adj   sig
    ----------------------------------------------------------------------------------------------------
       0     1      1.9040      4.0676     -2.1636      0.0803  -26.9297  725.2110     0.0000     0.0000  ***
       0     2      1.9040      2.8834     -0.9794      0.0871  -11.2444  126.4369     0.0000     0.0000  ***
       1     2      4.0676      2.8834      1.1842      0.0848   13.9715  195.2034     0.0000     0.0000  ***

  Variable: feature_1
       j     k          θⱼ          θₖ           Δ       SE(Δ)         z     χ²(1)    p-value      p_adj   sig
    ----------------------------------------------------------------------------------------------------
       0     1      0.7733      0.0222      0.7512      0.0792    9.4895   90.0502     0.0000     0.0000  ***
       0     2      0.7733     -0.9903      1.7636      0.0823   21.4164  458.6634     0.0000     0.0000  ***
       1     2      0.0222     -0.9903      1.0125      0.0796   12.7268  161.9703     0.0000     0.0000  ***

======================================================================

Introduce WaldTest3Step to perform non-parametric bootstrap inference for 3-step LCA structural models: computes bootstrap estimates, SEs, percentile CIs, pairwise Wald tests (with Bonferroni / BH corrections), and omnibus Wald tests. Add helper routines (_fdr_bh, _bonferroni, _build_contrast_matrix, _stars) and internal logic to build result tables and pretty-print summaries. Add a StepMix.wald_test convenience method to run the procedure from a fitted model. Include a comprehensive test suite (test/test_bootstrap_inference.py) covering helper functions, construction checks, continuous/binary/soft-assignment integrations, multiple-comparison corrections, and statistical sanity checks.
Copilot AI review requested due to automatic review settings May 15, 2026 18:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new bootstrap-based inference utility (WaldTest3Step) for 3-step StepMix distal outcome models, plus a StepMix.wald_test() convenience wrapper and a comprehensive test suite to validate the resulting estimates, confidence intervals, and Wald test outputs.

Changes:

  • Added WaldTest3Step to stepmix.bootstrap to compute bootstrapped SE/CI plus pairwise and omnibus Wald tests (with optional multiple-testing corrections).
  • Added StepMix.wald_test() as a convenience API to run the bootstrap + return a fitted WaldTest3Step.
  • Added a new test_bootstrap_inference.py module with unit tests for helpers and integration tests for continuous, binary, and soft-assignment workflows.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
stepmix/bootstrap.py Adds Wald-test inference class + multiple-testing correction helpers built on top of existing bootstrap() output.
stepmix/stepmix.py Adds wald_test() convenience method that instantiates and fits WaldTest3Step.
test/test_bootstrap_inference.py Introduces a new (large) integration test suite for bootstrap inference + helper corrections.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TestBinaryOutcome,
TestSoftAssignment,
TestCorrections,
TestConvenienceFunction,
Comment thread stepmix/bootstrap.py
Comment on lines +361 to +362
adjusted = np.full(n, np.nan)
adjusted[finite] = np.minimum(1.0, p_values[finite] * n)
Comment thread stepmix/bootstrap.py
Comment on lines +478 to +479
``None`` → no correction; ``"Bonferroni"`` → Bonferroni-Holm;
``"BH"`` → Benjamini-Hochberg FDR.
Comment thread stepmix/bootstrap.py
Comment on lines +328 to +355
def _fdr_bh(p_values: np.ndarray) -> np.ndarray:
"""Benjamini-Hochberg FDR correction (two-stage).

Parameters
----------
p_values : 1-D array of raw p-values (NaN is preserved).

Returns
-------
adjusted : 1-D array of BH-corrected p-values.
"""
n = len(p_values)
finite = np.isfinite(p_values)
adjusted = np.full(n, np.nan)
if finite.sum() == 0:
return adjusted

idx = np.where(finite)[0]
p = p_values[idx]
order = np.argsort(p)
ranks = np.empty_like(order)
ranks[order] = np.arange(1, len(p) + 1)
adj = np.minimum(1.0, p * len(p) / ranks)
# Enforce monotonicity (step-down)
for i in range(len(adj) - 2, -1, -1):
adj[order[i]] = min(adj[order[i]], adj[order[i + 1]])
adjusted[idx] = adj
return adjusted
Comment thread stepmix/bootstrap.py
Comment on lines +578 to +598
def _get_boot_matrix(self, model_name: str, param: str, variable: str) -> np.ndarray:
"""Return a (n_bootstrap × K) matrix of draws for a given variable.

Rows = bootstrap replications, columns = latent classes (0 … K-1).
"""
sm_df = self._get_structural_boot()
K = self.model.n_components
n_reps = sm_df["rep"].nunique()

mat = np.full((n_reps, K), np.nan)
for k in range(K):
mask = (
(sm_df.index.get_level_values("model_name") == model_name)
& (sm_df.index.get_level_values("param") == param)
& (sm_df.index.get_level_values("class_no") == k)
& (sm_df.index.get_level_values("variable") == variable)
)
sub = sm_df.loc[mask].sort_values("rep")
if len(sub) > 0:
mat[:, k] = sub["value"].values[:n_reps]
return mat # (B, K)
Comment thread stepmix/stepmix.py
Comment on lines +1155 to +1185
def wald_test(self, X, Y=None, n_repetitions=500, ci_level=0.95, correction=None, progress_bar=True, random_state=None):
"""Runs non-parametric bootstrap over the full 3-step procedure and returns Wald test results.

Parameters
----------
X : array-like of shape (n_samples, n_features)
Measurement data (same as used to fit the model).
Y : array-like of shape (n_samples, n_outcome_features)
Outcome data (same as used to fit the structural model).
n_repetitions : int, default=500
Number of bootstrap replications. >= 500 recommended.
ci_level : float, default=0.95
Confidence level for bootstrap percentile CIs. E.g. 0.95 -> 95 % CI.
correction : {None, "Bonferroni", "BH"}, default=None
Multiple-comparison correction applied to pairwise p-values.
progress_bar : bool, default=True
Show a tqdm progress bar.
random_state : int, default=None
Random seed for the bootstrap.

Returns
-------
wt : WaldTest3Step
A fitted object providing `.summary()`, `.pairwise_`, and `.omnibus_`.
"""
from stepmix.bootstrap import WaldTest3Step
wt = WaldTest3Step(self, ci_level=ci_level)
wt.fit_bootstrap(
X, Y, n_repetitions=n_repetitions, correction=correction,
progress_bar=progress_bar, random_state=random_state
)
Comment thread stepmix/stepmix.py
Comment on lines +1155 to +1164
def wald_test(self, X, Y=None, n_repetitions=500, ci_level=0.95, correction=None, progress_bar=True, random_state=None):
"""Runs non-parametric bootstrap over the full 3-step procedure and returns Wald test results.

Parameters
----------
X : array-like of shape (n_samples, n_features)
Measurement data (same as used to fit the model).
Y : array-like of shape (n_samples, n_outcome_features)
Outcome data (same as used to fit the structural model).
n_repetitions : int, default=500
Comment thread stepmix/stepmix.py
Comment on lines +1155 to +1185
def wald_test(self, X, Y=None, n_repetitions=500, ci_level=0.95, correction=None, progress_bar=True, random_state=None):
"""Runs non-parametric bootstrap over the full 3-step procedure and returns Wald test results.

Parameters
----------
X : array-like of shape (n_samples, n_features)
Measurement data (same as used to fit the model).
Y : array-like of shape (n_samples, n_outcome_features)
Outcome data (same as used to fit the structural model).
n_repetitions : int, default=500
Number of bootstrap replications. >= 500 recommended.
ci_level : float, default=0.95
Confidence level for bootstrap percentile CIs. E.g. 0.95 -> 95 % CI.
correction : {None, "Bonferroni", "BH"}, default=None
Multiple-comparison correction applied to pairwise p-values.
progress_bar : bool, default=True
Show a tqdm progress bar.
random_state : int, default=None
Random seed for the bootstrap.

Returns
-------
wt : WaldTest3Step
A fitted object providing `.summary()`, `.pairwise_`, and `.omnibus_`.
"""
from stepmix.bootstrap import WaldTest3Step
wt = WaldTest3Step(self, ci_level=ci_level)
wt.fit_bootstrap(
X, Y, n_repetitions=n_repetitions, correction=correction,
progress_bar=progress_bar, random_state=random_state
)
Comment thread test/test_bootstrap_inference.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@CravingCrates
Copy link
Copy Markdown
Author

@copilot apply changes based on the comments in this thread

@sachaMorin
Copy link
Copy Markdown
Collaborator

Thanks! We'll merge if/when we find someone to review this thoroughly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants