Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery by NicolasRouquette · Pull Request #1 · lean-dojo/TorchLean

NicolasRouquette · 2026-06-01T15:39:25Z

Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery

Motivation

The end goal of this line of work is to formalize
Computational Hypergraph Discovery (CHD)
(paper)
on top of TorchLean. CHD is a Gaussian-process / kernel-ridge method that recovers the dependency
structure of a system by repeatedly solving regularized kernel systems and testing the resulting
variances. Crucially, every quantity CHD inspects is a function of the full symmetric
eigendecomposition of a kernel matrix K:

solve_variationnal forms the regularized solve (K + γI)⁻¹ b;
find_gamma selects the ridge/noise level from spectral summaries (trace/determinant);
the Z_test thresholds variances against a calibrated null distribution.

Before this PR, TorchLean's only eigen-routine was a power-iteration stub that recovers just the
largest eigenpair — nowhere near enough to stand CHD on. You cannot formalize CHD without first
having a verified linear-algebra foundation centered on the full symmetric eigendecomposition. This
PR builds exactly that foundation, and then carries it forward through the CHD-specific consequences
(regularized inverse, PSD kernels, the discovery decision layer, and the Z_test calibration) that
sit directly on top of it.

The guiding principle throughout is honest scope: finite constructions get exact theorems,
iterative routines get exact invariants plus an a-posteriori residual certificate, and genuinely
research-grade gaps (e.g. Jacobi convergence, which Mathlib v4.30.0 does not have) are documented as
open rather than papered over. There is no sorry, no admit, and no new axiom anywhere in
NN/.

What this PR adds

1. Executable reference factorizations (spec layer)

NN/Spec/Core/Tensor/Factorizations.lean — shape-indexed,
Context-polymorphic reference implementations that run over native Float:

choleskySpec (Cholesky), qrSpec (QR via modified Gram–Schmidt),
symEigJacobiSpec (full symmetric eigendecomposition via cyclic Jacobi),
svdSpec (SVD built on the symmetric eig).

Each has runtime #eval assertLt … reconstruction checks. A strict-array @[implemented_by] path
makes the Cholesky / ridge-solve #evals run in seconds (the earlier functional representation
OOM'd / ran for hours).

2. Correctness theorems (proof layer), organized by what can honestly be proved

Specification consequences — the CHD foundation. From the predicate IsSymEig A Λ V
(orthogonal V, A = V·diag(Λ)·Vᵀ), proved independently of any algorithm:
- IsSymEig.add_smul_inv: (K+γI)⁻¹ = V·diag(1/(λᵢ+γ))·Vᵀ — the core solve_variationnal
  identity, from orthogonality of V alone (holds for any eigendecomposition the solver returns);
- IsSymEig.trace_eq / IsSymEig.det_eq / IsSymEig.isHermitian — the scalar summaries behind
  find_gamma and the evidence terms;
- IsSVD.gram_isSymEig: an SVD of A is an eigendecomposition of the Gram matrix K = AᵀA
  (eigenvalues σᵢ²), connecting the SVD spec to the eig foundation.
Exact invariants of the algorithms (no convergence/rounding caveat):
- givens_normSq (c²+s²=1) ⟹ each Jacobi sweep is an orthogonal similarity, so
  trace_orthogonal_conj / det_orthogonal_conj preserve the spectrum exactly at every step;
- choleskyFn_lower_triangular, Rmat_upper_triangular — structural triangularity by construction.
Exact reconstruction for the finite constructions (over ℝ, under success hypotheses):
- A = L·Lᵀ (Cholesky) and A = Q·R, QᵀQ = 1 (QR, bridged to Mathlib gramSchmidt).
Residual certificate for the iterative routines (in place of a convergence proof Mathlib
cannot yet support): an exact identity bounding reconstruction error by off-diagonal mass, with
the numeric assertLt examples as concrete instances. Plus the classical per-rotation off-diagonal
decrease and linear-rate lemmas.

3. CHD-specific layers built on the foundation

Kernel PSD-ness: the linear, quadratic, and Gaussian-mode CHD kernels are proved positive
semidefinite (discharging the hypothesis the spectral facts need).
Kernel-ridge solve: the SPD Cholesky positive-pivot keystone makes the ridge solve
unconditional; the inverse form closes the solve_variationnal loop.
Discovery decision layer: the CHD ancestor/edge decision is proved sound and complete.
Z_test distributional layer + asymptotic calibration (steps a–d):
(a) i.i.d.-draws scaffold, (b) pointwise Glivenko–Cantelli via the SLLN,
(c) pointwise DKW-at-a-point via Hoeffding, (d) quantile transfer (empirical-percentile
consistency: empQuantile_tendsto). Each step ships with positive and negative executable
examples.

4. Blueprint + examples

New Verso chapter
Ch4_Verification/Factorizations.lean
documenting the CHD motivation and the exact-vs-iterative-vs-certificate scope split, registered in
Guide.lean and cross-linked from ScientificMLVerification.lean.
A full NN/Examples/Factorization/ suite (Cholesky, QR, SymEig, SVD, the three kernels,
ridge-solve, variational, Jacobi decrease/rate, discovery) with positive and negative #eval
checks.

Scope honesty — what is not claimed

A = V·diag(λ)·Vᵀ is not asserted a-priori for the floating-point Jacobi/SVD output: those are
iterative, and Mathlib v4.30.0 has no Jacobi/QR-eig convergence theory. That gap is captured as a
residual certificate, never as sorry.
The remaining open items are documented in the blueprint: uniform Glivenko–Cantelli / sharp
DKW–Massart constant, the triangular-array wiring of the order-statistic percentiles at the moving
level p_N → 1/20, and the exchangeability rank rate k/(N+1).

Verification

lake build is sorry/admit-free across NN/; no new axioms. (omega appears only for
pure-ℕ index/length arithmetic — a sound decision procedure, not a hole.)
All NN/Examples/Factorization/* #eval assertLt / assertTrue checks pass, confirming the
residual-certificate hypotheses hold on the concrete instances.
The Verso blueprint builds (cd blueprint && lake build) and renders the new chapter.

Stats

~22 commits; ~4.7k lines of new proofs across NN/Proofs/Tensor/Basic/Factorizations*.lean, plus the
spec layer, examples, and blueprint chapter.

Motivation: formalizing Computational Hypergraph Discovery (CHD) in TorchLean. CHD (prototype: https://github.com/TheoBourdais/ComputationalHypergraphDiscovery) is a Gaussian-process / kernel method: it fits a kernel ridge regression per node and prunes "ancestors" by a noise/activation criterion. Essentially every statistically meaningful quantity it computes — the regression solution, the gamma (noise) selection, and the Z-test — is derived from the *full* symmetric eigendecomposition of the kernel matrix K (all eigenvalues AND eigenvectors). TorchLean's spec layer lacked this. The only eigendecomposition available (`Spec.eigendecompSpec`) is a power-iteration stub that recovers just the largest eigenpair, and there were no Cholesky / QR / SVD routines at all. That makes CHD inexpressible. This commit adds real reference implementations so the kernel linear algebra CHD depends on can be written in Lean. New: NN/Spec/Core/Tensor/Factorizations.lean (namespace Spec), executable over Float / ℝ, shape-indexed like the rest of the spec layer: - choleskySpec : A = L · Lᵀ for SPD A (lower-triangular L) - qrSpec/qrQSpec/qrRSpec : A = Q · R via modified Gram–Schmidt - symEigJacobiSpec : FULL symmetric eigendecomposition via cyclic Jacobi (all n eigenpairs) — the replacement for the largest-only stub, which is left untouched so PCA keeps building - svdSpec : A = U · diag(σ) · Vᵀ via the eig of Aᵀ·A The iterative Jacobi loop runs over a strict `Array (Array α)` representation (converted to/from Spec.Tensor only at the boundary): threading the functional `Fin n → Fin n → α` representation through one matrix product per rotation builds deep closure chains that blow up under evaluation, whereas arrays are strict values and keep execution cheap. Examples: NN/Examples/Factorization/{Common,Cholesky,QR,SymEig,SVD}.lean (+ the NN.Examples.Factorization umbrella). Each reconstructs the input from its factors and asserts (compiled `assertLt` via #eval, which fails the build) that the maximum reconstruction error is below tolerance. All reconstruct to 0.000000; SymEig recovers eigenvalues {1.3249, 2.4608, 5.2143} and SVD singular values {5, 3, 0} as expected. These are executable reference defs (matching the bar of the existing determinantSpec / inverseSpec / eigendecompSpec); formal correctness theorems are a planned follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Formal verification companion to the spec-layer factorizations added in e0d08ac. The motivation is Computational Hypergraph Discovery (https://github.com/TheoBourdais/ComputationalHypergraphDiscovery), a Gaussian- process / kernel-ridge method whose numerical core is the full symmetric eigendecomposition of a kernel matrix K (solve_variationnal, find_gamma, Z_test). This commit gives that core a verified linear-algebra foundation. New: NN/Proofs/Tensor/Basic/Factorizations.lean (sorry-free, over ℝ via Mathlib). A refinement architecture: specification predicates on Mathlib matrices, with the CHD consequences proved from the spec independent of the float algorithm. - Specifications: IsCholesky, IsQR, IsSymEig, IsSVD. - CHD foundation (consumed by solve_variationnal / find_gamma / Z_test): * IsSymEig.add_smul_inv — the regularized inverse (K + γI)⁻¹ = V·diag(1/(λ+γ))·Vᵀ, proved from orthogonality of V. * IsSymEig.trace_eq / det_eq (trace K = Σλ, det K = Πλ), isHermitian. * IsSVD.gram_isSymEig — an SVD of A is an eigendecomposition of the Gram matrix AᵀA with eigenvalues σ², the form CHD actually builds. - Exact algorithm invariants: trace_orthogonal_conj / det_orthogonal_conj (every Jacobi sweep is a spectrum-preserving orthogonal similarity), givens_normSq (c²+s²=1), choleskyFn_lower_triangular (via a reusable List.foldl indexing lemma getD_foldl_finRange). - Residual certificate (Tier D): symEig_reconstruction_residual and symEig_frobenius_residual prove the reconstruction error equals the off-diagonal mass of the rotated matrix exactly; isSymEig_of_diagonal closes the loop in the zero-residual limit. This replaces an impossible a-priori convergence proof (Mathlib v4.30.0 has no Jacobi convergence theory, and Float never diagonalizes exactly), matching the runtime assertLt checks. Scope honesty: the exact algebraic reconstruction of the finite float folds (A=L·Lᵀ, A=QR) is documented as the remaining increment (it needs a prefix-fold induction plus per-pivot positivity); the spec-level facts CHD relies on do not depend on it. Blueprint: new chapter Ch4_Verification/Factorizations.lean ("Matrix Factorizations for Kernel Methods"), registered in Guide.lean and cross-linked from ScientificMLVerification. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Prove the exact algebraic reconstruction of the finite executable Cholesky factorization over ℝ, the increment documented in Basic/Factorizations.lean. `isCholesky_of_pos`: for symmetric A with positive executable pivots (0 < choleskyFn A j j — the success condition over ℝ), L = choleskyFn A is a genuine Cholesky factor (lower-triangular, A = L·Lᵀ), satisfying the `IsCholesky` spec. Tensor-level corollary `choleskySpec_reconstruction`. Method: a general snoc-fold read lemma (`getD_foldl_snoc_read`) reads the j-th built column as the step function on the length-j prefix; `prefix_eq_map` identifies that prefix with the first j columns of L; `take_map_sum_eq` rewrites the code's List.foldl sums as masked Finset partial sums. Positive pivots discharge the √-radicand and divisor side conditions; symmetry of A lifts the lower-triangular reconstruction to the full matrix. Blueprint: new "Exact Cholesky reconstruction" section; "What remains" narrowed to QR (dual-list GSState structure-fold + the orthonormality invariant). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Prove the exact algebraic reconstruction of the finite executable QR (modified Gram–Schmidt) factorization over ℝ, extending the Cholesky work. `qr_mul_eq`: for A : Fin m → Fin n → ℝ whose executable R-pivots are positive (0 < Rmat A j j — full column rank, the success condition), the factors satisfy A = Q·R with R upper-triangular (`Rmat_upper_triangular`). `qrSpec_reconstruction` is the tensor-level corollary. Method: gramSchmidtFn threads a GSState that snocs onto both the Q-list and the R-list at once. The appended values depend only on the Q-history, so the Q-list is a single-list snoc-fold (`gs_proj_qs`, read by `getD_foldl_snoc_read`) and the R-list is the Q-prefix tail `rTail` (read by `gs_fold_split` + `rTail_getD`). The orthogonalization sum v = a − Σ rₖⱼqₖ (a List.zip fold) collapses to a map-fold (`cross_fold_eq`) and then a masked Finset partial sum (`take_map_sum_eq`); the positive pivot cancels the v/rⱼⱼ normalization exactly. Not done (documented, not sorry): orthonormality Qᵀ Q = 1, which rests on the Gram–Schmidt orthogonality invariant Mathlib only has for its own gramSchmidt. Blueprint: new "Exact QR reconstruction" section; "What remains" narrowed to the orthonormality invariant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Closes the last open finite-fold property: orthonormality of the executable Gram–Schmidt Q factor. Rather than re-derive the orthogonality induction, the new file NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean unifies the executable variant with Mathlib's gramSchmidt. Reading columns of A as EuclideanSpace ℝ (Fin m) vectors (gsCol), Qcol_bridge proves by strong induction that the j-th executable Q column equals gramSchmidtNormed ℝ (gsCol A) j; orthonormality then follows from Mathlib's gramSchmidtNormed_orthonormal'. Yields Q_orthonormal (qₐ·q_b = δₐᵦ), QT_mul_Q_eq_one, the full IsQR predicate isQR_of_pos, and the tensor-level qrSpec_orthonormal. Three reusable connectors over ℝ — dotFn_eq_inner, normFn_eq_norm, proj_normalize — are stated generally enough to lift into a future Mathlib matrix-level QR contribution. Blueprint chapter and reconstruction docstring updated: only the iterative Jacobi/SVD convergence now remains (residual certificate only). Sorry-free; NN.Examples.Factorization still reconstructs at err 0.000000, with the empirical Qᵀ·Q = I check now formally backed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Discharge the two hypotheses the symmetric-eigendecomposition residual certificate assumed (`Vᵀ V = 1` and `A = V·Af·Vᵀ`) for the real `symEigJacobiSpec` output, so the certificate holds outright. NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean (sorry-free, warning-free): - toM bridge from `Array (Array ℝ)` to Mathlib `Matrix`; toM_matMul/tr/id show the array ops realise the matrix ops (unconditionally). - givens_orthogonal: each Givens rotation with c²+s²=1 is orthogonal (Jᵀ J = 1), via the three column shapes and a 9-case dot-product split. - JacInv loop invariant preserved by jacInv_rotate/_sweep/_run (List.foldlRecOn over jacobiPairs; base case (A, I)). - jacobi_orthogonal, jacobi_similarity (no hypotheses) ⟹ unconditional symEigJacobi_{reconstruction,frobenius}_residual and symEigJacobi_isSymEig_of_diagonal, with worked examples. Blueprint Ch4: new "Faithfulness of the Jacobi run" section; "What remains" narrowed from the iterative properties to just the convergence rate. Examples (positive + negative controls, compiled #eval assertions): - Cholesky: indefinite A correctly fails (NaN; uses summed Frobenius error). - QR: rank-deficient A reconstructs but Qᵀ Q ≠ I (full rank needed). - SVD: Vᵀ V = I; permuted σ fails to reconstruct. - SymEig: orthogonality exact at 1 sweep; off-diagonal residual asymptotic; exact residual certificate verified numerically (lhs = rhs, |Δ| = 0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…xamples Add the classical Jacobi progress identity as an exact, finite theorem over ℝ: conjugating a symmetric A by the Givens rotation that annihilates pivot (p,q) drops the squared off-diagonal mass by exactly 2·A[p,q]² (`jacobi_off_decrease`). New: NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean - frobSq/diagSq/offSq mass machinery + frobSq_eq_diagSq_add_offSq. - frobSq_orthogonal_conj: orthogonal similarity preserves total Frobenius mass, so driving the off-diagonal down ≡ driving the diagonal up. - givens_conj_pp/qq/pq/other: explicit conjugation entries via bilinear support lemmas; the 2×2 block-Frobenius identity is frobSq_orthogonal_conj specialised to Fin 2 (no hand-tuned linear_combination coefficients). - jacobi_off_decrease: the per-rotation decrease, under symmetry + the pivot annihilation (the defining equation the Golub–Van Loan angle solves; the explicit pivot is givens_conj_pq). Examples: NN/Examples/Factorization/JacobiDecrease.lean (+ Common helpers) - Positive: one rotation takes off-diagonal mass 6 → 4 = 6 − 2·1², pivot annihilated to 0, total mass conserved at 35 — all to |Δ| = 0. - Negative controls: a wrong-angle (orthogonal) rotation misses the decrease; a non-orthogonal conjugation breaks Frobenius-mass invariance. Blueprint: new "Per-rotation progress" section; "What remains" narrowed to the aggregate cyclic rate (Forsythe–Henrici/Schönhage), since per-rotation progress is now exact. Sorry-free; builds green (NN.Proofs.Tensor.Basic, NN.Examples.Factorization, blueprint); repo_lint clean on all new/changed files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…amples Add the aggregate-rate development for the largest-pivot Jacobi strategy, building on the per-rotation decrease (Tier 2): - offSq_le_count_mul_max: the largest off-diagonal pivot carries at least the average share of the mass, ‖offDiag A‖² ≤ (n²−n)·A[p,q]². - jacobi_off_decrease_classical: substituting that into the exact per-rotation decrease yields a genuine linear contraction by 1 − 2/(n²−n) < 1. - geom_bound_of_contraction / tendsto_zero_of_contraction: a fixed-factor contraction iterates to ρᵏ·a₀ and (with offSq_nonneg, factor < 1) tends to 0, so the classical algorithm provably converges geometrically. Stated for an arbitrary per-step factor, so a future cyclic per-sweep bound plugs in directly. Honest scope: the cyclic ordering the solver uses does not satisfy the largest-pivot hypothesis (the research-grade Forsythe–Henrici/Schönhage rate); that gap stays captured by the exact a-posteriori residual certificate, never by sorry. Reviewer examples (NN/Examples/Factorization/JacobiRate.lean): largest pivot meets the rate (mass 50.04 → 0.04, far under the guaranteed 33.36); negative control — a tiny non-largest pivot misses the guaranteed factor. Blueprint gains an "Aggregate rate" section and a precise restatement of what remains. sorry-free, warning-free; repo_lint shows no new violations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ditional Add the deferred keystone `choleskyFn_diag_pos_of_posDef`: for a positive-definite A, every executable Cholesky pivot is strictly positive (equivalently the radicand A[j,j] − Σ_{k<j} L[j,k]² > 0 at every step). Proof avoids matrix inverses entirely. Strong induction on the pivot index: - `choleskyFn_dot_eq_local` — localized reconstruction needing only the smaller pivot's positivity (which is all the original `choleskyFn_dot_eq` ever uses), powering the induction. - The Schur-complement witness z is built by reusing the already-proven back-substitution `triSolveUpperFn_mulVec` on the leading block (z m = 1, z annihilates the leading columns of L). - `double_sum_gram` collapses the Gram part of zᵀAz to L[m,m]²; the residual part reduces to the single (m,m) entry, giving zᵀAz = radicand exactly. - `Matrix.PosDef.dotProduct_mulVec_pos` forces zᵀAz > 0, hence radicand > 0, hence the pivot √radicand > 0. Unconditional corollaries (no pivot hypothesis): - `solveRidgeFn_mulVec_of_posSemidef` — for PSD K and γ > 0, solveRidgeFn solves (K+γ·I)·x = b exactly. The fully discharged verified core of CHD `solve_variationnal`. - `solveRidgeSpec_mulVec_of_posSemidef` — tensor-level form. Also lands the spec-layer solve defs (triSolveLowerFn/triSolveUpperFn/ cholSolveFn/addScaledIdFn/solveRidgeFn/solveRidgeSpec), the registration import, positive/negative `#eval` examples exhibiting the keystone dichotomy (SPD K+γ·I → all pivots > 0; singular K → a zero pivot), and the blueprint update (the direct kernel-ridge solve route now has nothing left to prove). sorry/omega/admit-free across all new and changed source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two capstones on top of the positive-pivot keystone, making the verified solve_variationnal match the form CHD actually specifies. Proofs (NN/Proofs/Tensor/Basic/FactorizationsSolve.lean): * cholesky_posDef — for any PosDef A, the executable choleskyFn IS a genuine Cholesky factor (lower-triangular, A = L·Lᵀ, strictly positive diagonal), with no pivot/symmetry/success hypothesis. Combines the keystone choleskyFn_diag_pos_of_posDef with isCholesky_of_pos. * solveRidgeFn_eq_inv_mulVec (+ tensor-level solveRidgeSpec_eq_inv_mulVec) — solveRidgeFn K γ b = (K + γ·I)⁻¹ b, the closed form CHD solve_variationnal specifies. Invertibility from Matrix.PosDef.isUnit; the solve identity (K + γ·I)·x = b then pins x uniquely to the inverse. No inverse is ever formed by the algorithm. Examples (NN/Examples/Factorization/RidgeSolve.lean), 8 #eval checks green: * Capstone reconstruction: SPD K + γ·I gives L·Lᵀ = K + γ·I to machine precision; negative control an indefinite matrix hits √(negative) = NaN and fails — PosDef (not mere symmetry) is necessary. (Documented subtlety: the singular PSD K still reconstructs with a zero pivot; the zero pivot breaks only the solve, the dichotomy the keystone isolates — so the reconstruction negative control uses a genuinely indefinite matrix.) * Inverse form: columns built by solveRidgeSpec K γ eⱼ assemble into (K + γ·I)⁻¹, and (K + γ·I)·(K + γ·I)⁻¹ = I; negative control γ = 0 on the singular K diverges (NaN). Docs: blueprint Ch4 Factorizations chapter gains a capstone paragraph and a closed-loop note in "What remains"; the two example module docstrings updated. sorry/admit/omega-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…_test Tier A's predicate foundation (IsSymEig, add_smul_inv, trace_eq, det_eq) already existed; this adds the three concrete CHD routines built on it, mirroring interpolatory.py's eigendecomposition route, with their exact algebra proved over ℝ from the IsSymEig specification (no appeal to Jacobi convergence). Spec (NN/Spec/Core/Tensor/Factorizations.lean): executable eig-form mirrors — projFn (Pga = Vᵀga), ridgeCoeffFn (rᵢ = γ/(λᵢ+γ)), variationalSolveFn, varNoiseFn, plus tensor wrappers variationalSolveSpec / varNoiseSpec. Proofs (new NN/Proofs/Tensor/Basic/FactorizationsVariational.lean, sorry/omega-free): - variationalSolveFn_eq_neg_inv_mulVec: the eig-form solve_variationnal IS -(K+γI)⁻¹·ga (from add_smul_inv). - variationalSolveFn_eq_neg_solveRidgeFn: eig route = Cholesky route — two independent implementations of solve_variationnal agree on the one closed form. - IsSymEig.eigenvalues_nonneg: PSD ⟹ λ ≥ 0 (via VᵀAV PSD-congruence), discharging λᵢ+γ ≠ 0 from γ > 0. - varNoiseFn_eq_ratio: the noise / find_gamma loss / Z_test statistic is the spectral ratio Σ(Pgaᵢ·rᵢ)² / Σ Pgaᵢ²·rᵢ. - ridgeCoeffFn_pos/le_one, varNoiseFn_nonneg/le_one: for a PSD spectrum and γ > 0 the noise is a genuine fraction in [0,1]. - projFn_mulVec_self, varNoiseFn_projFn_mulVec: Z_test spectral invariance — feeding ga = V·z drops V, so the statistic depends on the kernel only through its spectrum. Examples (new NN/Examples/Factorization/Variational.lean): 8 green #eval checks on an SPD kernel — (K+γI)·yb = -ga and yb = -solveRidgeSpec to machine precision, noise ∈ [0,1], spectral invariance noise(V·z)=noise(z); negative controls: wrong eigenvectors break the solve (residual 3.72), γ = -0.7 pushes noise to -7.19. Blueprint: new "CHD routines" section + updated "What remains". Scope honesty: only the deterministic algebra is proved; Z_test's Gaussian sampling/percentiles are statistical, not algebraic, and are exercised numerically rather than proved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…semidefinite Every verified CHD solve/find_gamma/Z_test theorem assumes hK : (Matrix.of K).PosSemidef, but CHD builds K from data (Modes/kernels.py) — that hypothesis was never discharged. This proves it for the linear mode, the same move as the positive-pivot keystone: turn an assumed precondition into a theorem. Spec (NN/Spec/Core/Tensor/Factorizations.lean): maskColsFn (which_dim column masking), linearKernelFn (K[i,j] = 1 + scale·⟨Φi,Φj⟩ = 𝟙𝟙ᵀ + scale·ΦΦᵀ, mirroring LinearMode.vectorized_kernel), and the tensor wrapper linearKernelSpec. Proofs (new NN/Proofs/Tensor/Basic/FactorizationsKernels.lean, sorry/omega-free): - linearKernelFn_posSemidef: K is PSD for scale ≥ 0 — 𝟙𝟙ᵀ is a rank-one Gram (PSD), ΦΦᵀ is a Gram (posSemidef_self_mul_conjTranspose), scale ≥ 0 keeps it PSD (PosSemidef.smul), PosSemidef.add closes the sum. - linearKernelFn_symm: symmetry, a corollary of PosSemidef.isHermitian. - linearKernelSpec_posSemidef: the tensor-level statement the solve theorems consume — so solveRidgeSpec (linearKernelSpec X w scale) γ b is now an unconditional exact solve for γ > 0, no PSD hypothesis left to assume. Examples (new NN/Examples/Factorization/LinearKernel.lean): 6 green #eval checks — K = Kᵀ, matches the CHD LinearMode formula, all Jacobi eigenvalues ≥ 0 (feature masking preserved), the PSD kernel feeds an exact ridge solve (residual 0); negative control: scale = -1 makes 𝟙𝟙ᵀ − ΦΦᵀ indefinite (two negative eigenvalues). Blueprint: new "Building the kernel" section; flags the follow-ons — quadratic via the Schur product theorem (PosSemidef.hadamard, in Mathlib) and Gaussian via Schoenberg (Bochner not in Mathlib v4.30.0, the new research-grade item). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Discharge the standing PosSemidef hypothesis for the second of CHD's non-interpolatory kernels. The quadratic mode K[i,j] = scale·(alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale) expands algebraically to K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ), a sum of three PSD pieces: the all-ones Gram, a nonnegative multiple of the data Gram Φ·Φᵀ, and a nonnegative multiple of its Hadamard square — PSD by the Schur product theorem (PosSemidef.hadamard). So K is PSD for scale ≥ 0 and alpha ≥ 0. - Spec: quadraticKernelFn / quadraticKernelSpec (NN/Spec/.../Factorizations.lean), the exact CHD QuadraticMode.vectorized_kernel. - Proof: quadraticKernelFn_posSemidef + _symm + tensor-level quadraticKernelSpec_posSemidef (FactorizationsKernels), sorry/omega-free. - Example: NN/Examples/Factorization/QuadraticKernel.lean — 8 green checks (symmetry, CHD-formula match, all eigenvalues ≥ 0 with masking preserved, exact ridge solve), two genuine negative controls (alpha = −1 → 2 negative eigenvalues, scale = −1 → 3). - Blueprint: dedicated quadratic-mode section (Ch4 Factorizations). Gaussian mode (Schoenberg/Bochner, absent from Mathlib v4.30.0) remains the research-grade follow-on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Discharge the third and last CHD kernel mode's PosSemidef hypothesis, without Bochner/Schoenberg (absent from Mathlib v4.30.0), via an elementary Hadamard-exponential route that reuses the Schur product theorem already used for the quadratic mode. Spec (Factorizations.lean): gaussianKernelFn/gaussianKernelSpec, K[i,j] = scale·∏_dim (1 + w[dim]·exp(−(X[i,dim]−X[j,dim])²/2l²)), matching CHD GaussianMode (foldl product + squared-diff-as-product to stay polymorphic over the law-free Context). Proof (FactorizationsKernels.lean), sorry/admit/omega-free: - posSemidef_of_tendsto — the PSD cone is closed under entrywise limits (the one genuinely new, upstreamable lemma: the quadratic form is continuous in the entries, {≥0} is closed). - posSemidef_map_exp — entrywise exp of a PSD matrix is PSD via the Hadamard-power series exp∘G = Σ G^∘k/k!. - posSemidef_gaussianCol — a single Gaussian exp(−c(yᵢ−yⱼ)²) is PSD by diagonal congruence of the entrywise exponential of the rank-one Gram. - gaussianKernelFn_posSemidef — each feature factor 𝟙𝟙ᵀ + w·Gaussian is PSD, product over features via the Schur product theorem; PSD for scale ≥ 0 and a nonnegative mask w ≥ 0. Plus _symm and the tensor-level gaussianKernelSpec_posSemidef. Example (GaussianKernel.lean): 7/7 #eval checks — symmetric, matches the CHD GaussianMode formula, PSD (no negative eigenvalue), masked PSD, exact ridge solve; negative controls scale=−1 and a negative mask weight w=[−2,0] both correctly rejected. Blueprint Ch4: replaced the "research-grade" Gaussian paragraph with a section documenting the discharge; all three modes now PSD-verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CHD's outer hypergraph-discovery loop (decision.py, _GraphDiscoveryMain.py) turns the verified `noise` statistic into graph-structure decisions. This proves each of those decisions correct, over the same executable specs that run on Float. Spec (NN/Spec/Core/Tensor/Factorizations.lean), mirroring the Python verbatim: - argMinFn / argMaxFn — the np.argmin / np.argmax folds - kernelChooserFn — MinNoiseKernelChooser (valid = noise < Z_low; the `2` sentinel written `1+1` to stay Context-polymorphic) - modeIncrementFn / modeChooserFn — MaxIncrementModeChooser - allPrunedFn — the np.all(active_modes == 0) stopping rule Proofs (NN/Proofs/Tensor/Basic/FactorizationsDecision.lean, new, sorry-free). A single generic fold-selection lemma (foldl_select) underwrites both argmin and argmax; the comparison proofs bridge the Context order test to the real `<` (gtBool_eq_decide): - argMinFn_le / argMaxFn_le — the prune step removes a least-activated ancestor - kernelChooserFn_eq_some / _eq_none — MinNoiseKernelChooser is sound AND complete: returns `some s` with s valid and of least noise among all valid kernels exactly when a valid kernel exists, else `none`. Its `2`-sentinel correctness rests directly on the verified varNoiseFn_le_one (hypothesis `hbound`), so the decision is a proved selection over a statistic whose [0,1] range was itself proved. - modeChooserFn_ge — MaxIncrementModeChooser reports the largest noise-jump iteration - allPrunedFn_iff — the stopping test holds iff every ancestor is pruned Examples (NN/Examples/Factorization/Discovery.lean, new): 13 #eval checks with negative controls. The end-to-end block eigendecomposes an SPD kernel and runs a find_gamma sweep feeding the verified varNoiseSpec at several gamma straight into argMinFn (noises [0.004, 0.040, 0.287], all in [0,1], argmin = 0). Blueprint Ch4 gains a discovery-decision-layer section and an updated closing summary. Registrations: Basic.lean (proofs), Factorization.lean (examples). Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic green (2705 jobs); banned-tactic sweep clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The cyclic-Jacobi eigensolver was migrated to strict `Array (Array α)`, but `choleskyColsFn` and the triangular solves were left on the functional `Fin n → Fin n → α` representation. There each Cholesky column is a closure that re-evaluates every previous column, so reading the full factor `L` is exponential — and the spec is compiled without `precompileModules`, so any `#eval` of `solveRidgeSpec`/`choleskySpec` runs that closure in the interpreter. A single 4×4 ridge solve cost ~310 s; the QuadraticKernel example took ~645 s to build, and every ridge/Cholesky-using example was similarly slow. Add two strict, array-backed runtime implementations and register them with `@[implemented_by]`: * `choleskyColsImpl` → `choleskyColsFn` — materializes each column as an `Array α`, so a back-reference `L[i,k]` is an O(1) lookup. * `solveRidgeImpl` → `solveRidgeFn` — factors `K + γ·I = L·Lᵀ` and runs both triangular substitutions entirely over `Array`s, building no `Fin n → α` accumulator closures. `@[implemented_by]` swaps only the compiled/interpreted runtime code; the functional definitions — and every correctness proof that reasons about them (`FactorizationsSolve`, `FactorizationsReconstruction`, …) — are untouched. The examples' residual checks `(K+γ·I)·x − b ≈ 0` / `A = L·Lᵀ` numerically validate that the array path agrees with the proven definition. Result: QuadraticKernel 645 s → 6.8 s; a full clean rebuild of `NN.Examples.Factorization` + `NN.Proofs.Tensor.Basic` is 18.8 s (2705 jobs). No proof changes; sorry/admit/omega-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CHD's Z_test (interpolatory.py) decides edge significance by comparing the observed `noise` against the null distribution of the same statistic under random data: draw N samples, score each one's `noise`, sort, and read off the 5th/95th percentiles as Z_low/Z_high; an edge is significant when `noise < Z_low`. Spec layer (Factorizations.lean): `sampleNoisesFn` (each draw scored by the *same* `varNoiseFn`), `leBool`, the order statistic `kthSmallestFn` (mergeSort + index), `zLowIdx`/`zHighIdx` (⌊0.05·N⌋ / ⌊0.95·N⌋), `zLowFn`/`zHighFn`, `zSignificantFn`, and tensor wrappers `zLowSpec`/`zHighSpec` — mirroring Z_test verbatim and running on Float. Proofs (FactorizationsDecision, sorry-free): the order-statistic toolkit (`kthSmallestFn_mem`/`_nonneg`/`_le_one`/`_mono`, bridged to Mathlib's `sortedLE_mergeSort` via `leBool_eq_le`) and the Z_test guarantees — `sampleNoisesFn_nonneg`/`_le_one` (every null sample inherits the verified `noise ∈ [0,1]` bound), `zLowFn`/`zHighFn ∈ [0,1]`, `zLowFn_le_zHighFn` (Z_low ≤ Z_high by order-statistic monotonicity), and `zTest_admits_edge` (`noise < Z_low` ⟹ `MinNoiseKernelChooser` returns `some 0`), the chooser's `noise ≤ 1` precondition again being `varNoiseFn_le_one`. The keystone: the same [0,1] bound governs both the data noise and the whole null distribution, so the test is well-posed. Examples (Discovery): builds the null distribution from a real eigendecomposition, checks 0 ≤ Z_low ≤ Z_high ≤ 1, shows dominant-eigenvector data clears the lower tail (significant), and rejects a high noise and a noise at the upper tail (negative controls). Blueprint: new "The Z_test: a null-distribution significance threshold" section and updated scope summary — only the distributional half (Gaussian draws + calibrated percentile, needs Mathlib.Probability) remains. Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic 2705 jobs green; blueprint 4949 jobs; sorry/admit/omega-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Close the provable half of the Z_test's distributional content (the part left open after the well-posed thresholds), in a new sorry-free module NN/Proofs/Tensor/Basic/FactorizationsZTest.lean: * Finite-sample calibration (counting, no probability theory): a sorted list has at most k entries below its k-th element, so via countP permutation- invariance over mergeSort the threshold's own empirical false-positive rate is bounded exactly — at most ⌊N/20⌋ ≈ 5% of the N null draws fall below Z_low (zLow_null_exceedance_le) and ≈ 5% rise above Z_high (zHigh_null_exceedance_le). This is the exact, non-asymptotic guarantee. * The Gaussian null law (measure theory): modelling the draws as i.i.d. standard Gaussian (nullGaussian = Measure.pi (gaussianReal 0 1)), the per-draw noise is measurable (measurable_noiseMap), so its pushforward noiseLaw is a probability measure (IsProbabilityMeasure) concentrated on [0,1] (noiseLaw_Icc_eq_one) — the verified varNoiseFn ∈ [0,1] bound lifted to the law. sampleNoisesFn_eq_noiseMap ties CHD's executable statistic to the model. Scope honesty: the asymptotic frontier — empirical→true quantile convergence (Glivenko–Cantelli/DKW) and the exchangeability rank rate k/(N+1) — needs an empirical-process theory absent from Mathlib v4.30.0, and is flagged rather than stubbed with sorry. Discovery.lean adds positive/negative #eval checks (exactly 1/20 below Z_low, 0 above Z_high, 19/20 below the slack Z_high as a negative control); the blueprint Ch4 chapter gains a "Z_test distributional layer" section and the "What remains" note now scopes only the asymptotic half. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Lift the single null draw (FactorizationsZTest) to the i.i.d. *sequence* `nullSeqGaussian n := Measure.infinitePi (fun _ : ℕ => nullGaussian n)` on `ℕ → (Fin n → ℝ)` — the infinite product, since `Measure.pi` is finite-index only — and `nullNoise Λ V γ i ω := noiseMap Λ V γ (ω i)`, the same measurable statistic read off coordinate `i`. Proven sorry-free in the new `NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean`: - nullNoise_iIndepFun (← iIndepFun_infinitePi), pairwise nullNoise_pairwise_indepFun in the Pairwise (·⟂ᵢ[μ]·) shape SLLN takes; - nullNoise_hasLaw / nullNoise_identDistrib — each draw has the common law noiseLaw (via measurePreserving_eval_infinitePi + HasLaw.comp); - nullNoise_mem_Icc ([0,1]-valued) and integrable_nullNoise. That is exactly the hint/hindep/hident triple `strong_law_ae_real` and the Hoeffding tail consume, so plan steps (b)–(d) become applications of an in-place scaffold. The *uniform* GC / DKW–Massart sharp constant and the exchangeability rank rate k/(N+1) stay research-grade (flagged, not sorry'd). Discovery.lean exercises the computable shadow — the empirical CDF F̂_N(t) = #{i<N : noiseᵢ ≤ t}/N — checked a bona fide CDF (in [0,1], monotone, saturating to 1, 0 below support) with a non-degeneracy negative control; six new #eval checks pass. Aggregate example docstring and the Ch4 blueprint (distributional-layer + "What remains") updated to record the scaffold. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ia the SLLN Build on the step-(a) i.i.d. scaffold (FactorizationsZAsymptotic) to prove the pointwise empirical-CDF consistency theorem, sorry-free over Mathlib v4.30.0: - nullBelow Λ V γ t i ω := (Set.Iic t).indicator 1 (nullNoise Λ V γ i ω), the threshold indicators 1{noiseᵢ ≤ t}, and empCDF, their normalized prefix sum. - nullBelow_pairwise_indepFun / nullBelow_identDistrib / integrable_nullBelow: the indicators inherit the scaffold's i.i.d.-bounded-integrable structure (indicator composed onto each independent, identically-distributed draw). - integral_nullBelow_zero: their common mean is exactly cdf (noiseLaw Λ V γ) t (HasLaw.integral_comp + integral_indicator_one + cdf_eq_real), so empCDF is the Monte-Carlo estimator of the null CDF. - empCDF_tendsto_cdf: strong_law_ae_real on the hint/hindep/hident triple gives ∀ᵐ ω, Tendsto (fun N => empCDF … N t ω) atTop (𝓝 (cdf noiseLaw t)) — pointwise Glivenko–Cantelli at each fixed t. Discovery.lean adds the matching #eval block exercising the computable shadow: the growing-prefix running mean F̂_N settling toward the full-sample estimate of cdf noiseLaw t (each prefix a valid [0,1] CDF value, cdf 1 = 1 attained at every N, with a non-degeneracy negative control). Aggregate docstring and the Ch4 blueprint updated to describe step (b); the uniform GC / DKW–Massart sharp constant and exchangeability rank rate remain flagged research-grade (no sorry). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Hoeffding Prove the finite-sample companion of step (b)'s almost-sure limit: at a fixed threshold t, the empirical CDF of the i.i.d. null draws concentrates around the true null CDF at the sharp Hoeffding rate. FactorizationsZAsymptotic.lean: - nullBelow_subgaussian / nullBelow_neg_subgaussian: the [0,1]-bounded threshold indicators are sub-Gaussian with variance proxy 1/4 (hasSubgaussianMGF_of_mem_Icc), centered at mean cdf noiseLaw t (integral_nullBelow_eq). - hoeffding_avg_ge: Mathlib's measure_sum_ge_le_of_iIndepFun with ε ↦ N·ε turns the proxy sum N/4 into the sharp exponent -2Nε². - empCDF_upper_tail / empCDF_lower_tail: P(±(empCDF − cdf) ≥ ε) ≤ exp(-2Nε²). - empCDF_concentration: two-sided DKW-at-a-point P(|empCDF − cdf| ≥ ε) ≤ 2·exp(-2Nε²) via measureReal_union_le + le_abs. Discovery.lean: matching #eval block exercising the bound's computable shadows — the tail function 2·exp(-2Nε²) (twice one-sided, decreasing in N and ε, non-vacuous < 1 once 2Nε² > ln 2, trivially 2 at ε = 0) and the observed prefix deviation it governs (every prefix of ≥ 3 draws keeps F̂_N within ε = 0.3 of the full sample; negative control: N = 1, 2 deviate by 0.5 > ε). Docs: aggregate Factorization docstring and Ch4 blueprint updated; uniform DKW–Massart and quantile-transfer step (d) remain flagged research-grade, never sorry'd. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…percentile consistency) Inverts steps (b)-(c)'s empirical-CDF convergence into consistency of the empirical percentiles the Z_test chooser thresholds against, sorry-free over Mathlib v4.30.0. NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean: - empCDF_mono: empirical CDF monotone in the threshold. - StraddlesQuantile / IsLowerQuantile: the honest population straddle hypothesis (continuity + strict monotonicity through p at q) and the defining property of a lower empirical p-quantile. - empCDF_eventually_straddle: the transfer engine -- pointwise consistency at q +/- eps makes the empirical CDF eventually straddle p. - empQuantile_tendsto: any lower empirical p-quantile converges a.s. to q (sandwich pinned into [q-eps, q+eps], intersected over eps = 1/(m+1) via ae_all_iff). NN/Examples/Factorization/Discovery.lean: matching #eval block -- lower p-quantile reaches level p, monotone in p, in [0,1], median converges (<= 0.02 for N >= 3); negative controls for non-vacuity and straddle-hypothesis sensitivity. Docs: aggregate Factorization.lean bullet, Ch4 Verso blueprint, and CHDTorch plan updated to mark the four-tier asymptotic-calibration plan complete, leaving only the uniform DKW-Massart constant, the zLowFn/zHighFn triangular-array wiring, and the exchangeability rank rate as research-grade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

NicolasRouquette · 2026-06-03T04:48:49Z

Closing this in favor of a slimmed-down, refocused contribution.

Since opening it, I realized this branch had grown to mix three separable things:

generic matrix-factorization linear algebra,
the Computational Hypergraph Discovery (CHD) specific layers, and
a KernelFlows port.

Following the question of whether this belongs in TorchLean or in a repo of its own, I've split them:

For TorchLean — only the generic, domain-independent factorizations (Cholesky, QR,
symmetric eigendecomposition, SVD, generalized eigenproblem). Pure linear algebra, no CHD
anywhere, useful to anyone doing GP / kernel / PCA / least-squares work. Proposed in a new,
much smaller PR Add verified matrix factorizations: Cholesky and QR (exact) #2 for the exact Cholesky/QR part and a subsequent PR for the iterative
eigensolver/SVD part. This will help keep the PR reviews to a reasonable, coherent scope.
Separate repository (CHDTorch) — all the CHD- and KernelFlows-specific work (discovery,
the Z-test, kernel losses / gradients / optimizers) now lives in its own Lean package that
takes TorchLean as a library dependency, with zero changes to TorchLean's source.

So this PR is superseded; continuing in #2

NicolasRouquette and others added 22 commits June 1, 2026 08:31

NicolasRouquette mentioned this pull request Jun 3, 2026

Add verified matrix factorizations: Cholesky and QR (exact) #2

Open

NicolasRouquette closed this Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery#1

Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery#1
NicolasRouquette wants to merge 22 commits into
lean-dojo:mainfrom
NicolasRouquette:add-matrix-factorizations

NicolasRouquette commented Jun 1, 2026

Uh oh!

NicolasRouquette commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NicolasRouquette commented Jun 1, 2026

Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery

Motivation

What this PR adds

1. Executable reference factorizations (spec layer)

2. Correctness theorems (proof layer), organized by what can honestly be proved

3. CHD-specific layers built on the foundation

4. Blueprint + examples

Scope honesty — what is not claimed

Verification

Stats

Uh oh!

NicolasRouquette commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant