Add Bootstrap Wald Tests for 3-Step LCA Distal Outcomes by CravingCrates · Pull Request #74 · Labo-Lacourse/stepmix

CravingCrates · 2026-05-15T18:27:03Z

⚠️ BIG DISCLAIMER: VIBE CODED ⚠️

This PR was largely "vibe coded" and heavily relies on conceptual implementations of statistical tests. It requires a thorough review by someone with a strong background in statistics/math to double-check the core logic, matrix alignments, and multiple testing corrections. It might make for a great addition to the package, but please do not merge without mathematical validation. I don't have a statistics background or enough knowledge on the math to really evaluate this.

What this new code does

This PR introduces the WaldTest3Step class to bootstrap.py, which provides a non-parametric bootstrap procedure to compute standard errors, confidence intervals, and Wald tests for 3-step Latent Class Analysis (LCA) distal outcome models.

I love this package and use it a lot to do analysis, but one thing that bugged me is that the hypothesis testing is still left to the reader and many commercial statistic applications solve this issue, so I wanted to try and make a small contribution.
Feel free to deny it, if it's just too much work to review or doesn't fit into the scope of the project right now. No hard feelings!

Key features include:

Bootstrapped Estimates: Automatically extracts point estimates and distributions for outcome variables per latent class using the structural component of a fitted StepMix model.
Pairwise Wald Tests: Tests the difference in parameters ($\theta_j = \theta_k$) between all combinations of latent classes, calculating $z$, $\chi^2$, and p-values based on the bootstrap covariance matrix.
Omnibus Wald Tests: Runs a global test to reject the null hypothesis that all class means/probabilities are equal across the $K$ classes for a given outcome.
Multiple Comparison Corrections: Supports Bonferroni-Holm and Benjamini-Hochberg (FDR) corrections for the pairwise test p-values.
Formatted Summary: Provides a clean summary() method to quickly view estimates, omnibus stats, and pairwise test results with significance stars.

Example usage:

from stepmix.stepmix import StepMix
from stepmix.datasets import data_bakk_complete

# 1. Fetch some sample data
X, Y, _ = data_bakk_complete(n_samples=1000, sep_level=0.9, random_state=0)

# 2. Configure your original StepMix model
model = StepMix(
    n_components=3, 
    n_steps=3,
    measurement="bernoulli", 
    structural="gaussian_unit",
    random_state=0, 
    verbose=0
)

# 3. Fit as normal
model.fit(X, Y)

# 4. This computes inference estimates internally using the `stepmix.bootstrap` API
wt = model.wald_test(
    X, 
    Y, 
    n_repetitions=500, 
    correction="BH",   # "BH" or "Bonferroni" or None
    random_state=42
)

# 5. Get your structured results
wt.summary()

The output could look like this:

Fitting StepMix...
Initializations (n_init) : 100%|██████████████| 1/1 [00:00<00:00, 44.92it/s, max_LL=-2.96e+3, max_avg_LL=-2.96]

Bootstrapping estimator...
Bootstrap Repetitions    : 100%|█| 500/500 [00:14<00:00, 34.26it/s, max_LL=-5.69e+3, median_LL=-5.87e+3, min_LL
======================================================================
  3-Step LCA Outcome Analysis – Wald Tests & P-values
======================================================================
  Latent classes (K)   : 3
  Bootstrap reps (B)   : 500
  CI level             : 95%
  Significance codes   : *** p<0.001 | ** p<0.01 | * p<0.05 | . p<0.1

======================================================================
  OUTCOME PARAMETER ESTIMATES (per latent class)
======================================================================

  Variable: feature_0
     Class    Estimate         SE       CI lo       CI hi
    ------------------------------------------------------------
         0      1.9040     0.0635      1.7756      2.0241
         1      4.0676     0.0511      3.9597      4.1614
         2      2.8834     0.0663      2.7565      3.0183

  Variable: feature_1
     Class    Estimate         SE       CI lo       CI hi
    ------------------------------------------------------------
         0      0.7733     0.0572      0.6723      0.8815
         1      0.0222     0.0534     -0.0703      0.1277
         2     -0.9903     0.0586     -1.0988     -0.8808

======================================================================
  OMNIBUS WALD TEST  (H₀: all class means / probs are equal)
======================================================================

        Variable          χ²    df     p-value   sig
  --------------------------------------------------
       feature_0    730.2155     2      0.0000  ***
       feature_1    461.9245     2      0.0000  ***

======================================================================
  PAIRWISE WALD TESTS  (H₀: θⱼ = θₖ  for each class pair j < k)
  Multiple-comparison correction: BH (FDR)
======================================================================

  Variable: feature_0
       j     k          θⱼ          θₖ           Δ       SE(Δ)         z     χ²(1)    p-value      p_adj   sig
    ----------------------------------------------------------------------------------------------------
       0     1      1.9040      4.0676     -2.1636      0.0803  -26.9297  725.2110     0.0000     0.0000  ***
       0     2      1.9040      2.8834     -0.9794      0.0871  -11.2444  126.4369     0.0000     0.0000  ***
       1     2      4.0676      2.8834      1.1842      0.0848   13.9715  195.2034     0.0000     0.0000  ***

  Variable: feature_1
       j     k          θⱼ          θₖ           Δ       SE(Δ)         z     χ²(1)    p-value      p_adj   sig
    ----------------------------------------------------------------------------------------------------
       0     1      0.7733      0.0222      0.7512      0.0792    9.4895   90.0502     0.0000     0.0000  ***
       0     2      0.7733     -0.9903      1.7636      0.0823   21.4164  458.6634     0.0000     0.0000  ***
       1     2      0.0222     -0.9903      1.0125      0.0796   12.7268  161.9703     0.0000     0.0000  ***

======================================================================

Introduce WaldTest3Step to perform non-parametric bootstrap inference for 3-step LCA structural models: computes bootstrap estimates, SEs, percentile CIs, pairwise Wald tests (with Bonferroni / BH corrections), and omnibus Wald tests. Add helper routines (_fdr_bh, _bonferroni, _build_contrast_matrix, _stars) and internal logic to build result tables and pretty-print summaries. Add a StepMix.wald_test convenience method to run the procedure from a fitted model. Include a comprehensive test suite (test/test_bootstrap_inference.py) covering helper functions, construction checks, continuous/binary/soft-assignment integrations, multiple-comparison corrections, and statistical sanity checks.

Copilot

Pull request overview

This PR adds a new bootstrap-based inference utility (WaldTest3Step) for 3-step StepMix distal outcome models, plus a StepMix.wald_test() convenience wrapper and a comprehensive test suite to validate the resulting estimates, confidence intervals, and Wald test outputs.

Changes:

Added WaldTest3Step to stepmix.bootstrap to compute bootstrapped SE/CI plus pairwise and omnibus Wald tests (with optional multiple-testing corrections).
Added StepMix.wald_test() as a convenience API to run the bootstrap + return a fitted WaldTest3Step.
Added a new test_bootstrap_inference.py module with unit tests for helpers and integration tests for continuous, binary, and soft-assignment workflows.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
`stepmix/bootstrap.py`	Adds Wald-test inference class + multiple-testing correction helpers built on top of existing `bootstrap()` output.
`stepmix/stepmix.py`	Adds `wald_test()` convenience method that instantiates and fits `WaldTest3Step`.
`test/test_bootstrap_inference.py`	Introduces a new (large) integration test suite for bootstrap inference + helper corrections.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        TestBinaryOutcome,
+        TestSoftAssignment,
+        TestCorrections,
+        TestConvenienceFunction,


+    adjusted = np.full(n, np.nan)
+    adjusted[finite] = np.minimum(1.0, p_values[finite] * n)


+            ``None`` → no correction; ``"Bonferroni"`` → Bonferroni-Holm;
+            ``"BH"`` → Benjamini-Hochberg FDR.


+def _fdr_bh(p_values: np.ndarray) -> np.ndarray:
+    """Benjamini-Hochberg FDR correction (two-stage).
+
+    Parameters
+    ----------
+    p_values : 1-D array of raw p-values (NaN is preserved).
+
+    Returns
+    -------
+    adjusted : 1-D array of BH-corrected p-values.
+    """
+    n = len(p_values)
+    finite = np.isfinite(p_values)
+    adjusted = np.full(n, np.nan)
+    if finite.sum() == 0:
+        return adjusted
+
+    idx = np.where(finite)[0]
+    p = p_values[idx]
+    order = np.argsort(p)
+    ranks = np.empty_like(order)
+    ranks[order] = np.arange(1, len(p) + 1)
+    adj = np.minimum(1.0, p * len(p) / ranks)
+    # Enforce monotonicity (step-down)
+    for i in range(len(adj) - 2, -1, -1):
+        adj[order[i]] = min(adj[order[i]], adj[order[i + 1]])
+    adjusted[idx] = adj
+    return adjusted


+    def _get_boot_matrix(self, model_name: str, param: str, variable: str) -> np.ndarray:
+        """Return a (n_bootstrap × K) matrix of draws for a given variable.
+
+        Rows = bootstrap replications, columns = latent classes (0 … K-1).
+        """
+        sm_df = self._get_structural_boot()
+        K = self.model.n_components
+        n_reps = sm_df["rep"].nunique()
+
+        mat = np.full((n_reps, K), np.nan)
+        for k in range(K):
+            mask = (
+                (sm_df.index.get_level_values("model_name") == model_name)
+                & (sm_df.index.get_level_values("param") == param)
+                & (sm_df.index.get_level_values("class_no") == k)
+                & (sm_df.index.get_level_values("variable") == variable)
+            )
+            sub = sm_df.loc[mask].sort_values("rep")
+            if len(sub) > 0:
+                mat[:, k] = sub["value"].values[:n_reps]
+        return mat  # (B, K)


+    def wald_test(self, X, Y=None, n_repetitions=500, ci_level=0.95, correction=None, progress_bar=True, random_state=None):
+        """Runs non-parametric bootstrap over the full 3-step procedure and returns Wald test results.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Measurement data (same as used to fit the model).
+        Y : array-like of shape (n_samples, n_outcome_features)
+            Outcome data (same as used to fit the structural model).
+        n_repetitions : int, default=500
+            Number of bootstrap replications.  >= 500 recommended.
+        ci_level : float, default=0.95
+            Confidence level for bootstrap percentile CIs.  E.g. 0.95 -> 95 % CI.
+        correction : {None, "Bonferroni", "BH"}, default=None
+            Multiple-comparison correction applied to pairwise p-values.
+        progress_bar : bool, default=True
+            Show a tqdm progress bar.
+        random_state : int, default=None
+            Random seed for the bootstrap.
+
+        Returns
+        -------
+        wt : WaldTest3Step
+            A fitted object providing `.summary()`, `.pairwise_`, and `.omnibus_`.
+        """
+        from stepmix.bootstrap import WaldTest3Step
+        wt = WaldTest3Step(self, ci_level=ci_level)
+        wt.fit_bootstrap(
+            X, Y, n_repetitions=n_repetitions, correction=correction, 
+            progress_bar=progress_bar, random_state=random_state
+        )


+    def wald_test(self, X, Y=None, n_repetitions=500, ci_level=0.95, correction=None, progress_bar=True, random_state=None):
+        """Runs non-parametric bootstrap over the full 3-step procedure and returns Wald test results.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Measurement data (same as used to fit the model).
+        Y : array-like of shape (n_samples, n_outcome_features)
+            Outcome data (same as used to fit the structural model).
+        n_repetitions : int, default=500


+    def wald_test(self, X, Y=None, n_repetitions=500, ci_level=0.95, correction=None, progress_bar=True, random_state=None):
+        """Runs non-parametric bootstrap over the full 3-step procedure and returns Wald test results.
+
+        Parameters
+        ----------
+        X : array-like of shape (n_samples, n_features)
+            Measurement data (same as used to fit the model).
+        Y : array-like of shape (n_samples, n_outcome_features)
+            Outcome data (same as used to fit the structural model).
+        n_repetitions : int, default=500
+            Number of bootstrap replications.  >= 500 recommended.
+        ci_level : float, default=0.95
+            Confidence level for bootstrap percentile CIs.  E.g. 0.95 -> 95 % CI.
+        correction : {None, "Bonferroni", "BH"}, default=None
+            Multiple-comparison correction applied to pairwise p-values.
+        progress_bar : bool, default=True
+            Show a tqdm progress bar.
+        random_state : int, default=None
+            Random seed for the bootstrap.
+
+        Returns
+        -------
+        wt : WaldTest3Step
+            A fitted object providing `.summary()`, `.pairwise_`, and `.omnibus_`.
+        """
+        from stepmix.bootstrap import WaldTest3Step
+        wt = WaldTest3Step(self, ci_level=ci_level)
+        wt.fit_bootstrap(
+            X, Y, n_repetitions=n_repetitions, correction=correction, 
+            progress_bar=progress_bar, random_state=random_state
+        )


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

CravingCrates · 2026-05-15T18:36:07Z

@copilot apply changes based on the comments in this thread

sachaMorin · 2026-05-20T17:14:45Z

Thanks! We'll merge if/when we find someone to review this thoroughly.

Copilot AI review requested due to automatic review settings May 15, 2026 18:27

Copilot started reviewing on behalf of CravingCrates May 15, 2026 18:27 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Potential fix for pull request finding

d50f198

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bootstrap Wald Tests for 3-Step LCA Distal Outcomes#74

Add Bootstrap Wald Tests for 3-Step LCA Distal Outcomes#74
CravingCrates wants to merge 2 commits into
Labo-Lacourse:masterfrom
CravingCrates:master

CravingCrates commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

CravingCrates commented May 15, 2026

Uh oh!

sachaMorin commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		adjusted = np.full(n, np.nan)
		adjusted[finite] = np.minimum(1.0, p_values[finite] * n)

		``None`` → no correction; ``"Bonferroni"`` → Bonferroni-Holm;
		``"BH"`` → Benjamini-Hochberg FDR.

Conversation

CravingCrates commented May 15, 2026

⚠️ BIG DISCLAIMER: VIBE CODED ⚠️

What this new code does

Example usage:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

CravingCrates commented May 15, 2026

Uh oh!

sachaMorin commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants