Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery#1
Conversation
Motivation: formalizing Computational Hypergraph Discovery (CHD) in TorchLean. CHD (prototype: https://github.com/TheoBourdais/ComputationalHypergraphDiscovery) is a Gaussian-process / kernel method: it fits a kernel ridge regression per node and prunes "ancestors" by a noise/activation criterion. Essentially every statistically meaningful quantity it computes — the regression solution, the gamma (noise) selection, and the Z-test — is derived from the *full* symmetric eigendecomposition of the kernel matrix K (all eigenvalues AND eigenvectors). TorchLean's spec layer lacked this. The only eigendecomposition available (`Spec.eigendecompSpec`) is a power-iteration stub that recovers just the largest eigenpair, and there were no Cholesky / QR / SVD routines at all. That makes CHD inexpressible. This commit adds real reference implementations so the kernel linear algebra CHD depends on can be written in Lean. New: NN/Spec/Core/Tensor/Factorizations.lean (namespace Spec), executable over Float / ℝ, shape-indexed like the rest of the spec layer: - choleskySpec : A = L · Lᵀ for SPD A (lower-triangular L) - qrSpec/qrQSpec/qrRSpec : A = Q · R via modified Gram–Schmidt - symEigJacobiSpec : FULL symmetric eigendecomposition via cyclic Jacobi (all n eigenpairs) — the replacement for the largest-only stub, which is left untouched so PCA keeps building - svdSpec : A = U · diag(σ) · Vᵀ via the eig of Aᵀ·A The iterative Jacobi loop runs over a strict `Array (Array α)` representation (converted to/from Spec.Tensor only at the boundary): threading the functional `Fin n → Fin n → α` representation through one matrix product per rotation builds deep closure chains that blow up under evaluation, whereas arrays are strict values and keep execution cheap. Examples: NN/Examples/Factorization/{Common,Cholesky,QR,SymEig,SVD}.lean (+ the NN.Examples.Factorization umbrella). Each reconstructs the input from its factors and asserts (compiled `assertLt` via #eval, which fails the build) that the maximum reconstruction error is below tolerance. All reconstruct to 0.000000; SymEig recovers eigenvalues {1.3249, 2.4608, 5.2143} and SVD singular values {5, 3, 0} as expected. These are executable reference defs (matching the bar of the existing determinantSpec / inverseSpec / eigendecompSpec); formal correctness theorems are a planned follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Formal verification companion to the spec-layer factorizations added in e0d08ac. The motivation is Computational Hypergraph Discovery (https://github.com/TheoBourdais/ComputationalHypergraphDiscovery), a Gaussian- process / kernel-ridge method whose numerical core is the full symmetric eigendecomposition of a kernel matrix K (solve_variationnal, find_gamma, Z_test). This commit gives that core a verified linear-algebra foundation. New: NN/Proofs/Tensor/Basic/Factorizations.lean (sorry-free, over ℝ via Mathlib). A refinement architecture: specification predicates on Mathlib matrices, with the CHD consequences proved from the spec independent of the float algorithm. - Specifications: IsCholesky, IsQR, IsSymEig, IsSVD. - CHD foundation (consumed by solve_variationnal / find_gamma / Z_test): * IsSymEig.add_smul_inv — the regularized inverse (K + γI)⁻¹ = V·diag(1/(λ+γ))·Vᵀ, proved from orthogonality of V. * IsSymEig.trace_eq / det_eq (trace K = Σλ, det K = Πλ), isHermitian. * IsSVD.gram_isSymEig — an SVD of A is an eigendecomposition of the Gram matrix AᵀA with eigenvalues σ², the form CHD actually builds. - Exact algorithm invariants: trace_orthogonal_conj / det_orthogonal_conj (every Jacobi sweep is a spectrum-preserving orthogonal similarity), givens_normSq (c²+s²=1), choleskyFn_lower_triangular (via a reusable List.foldl indexing lemma getD_foldl_finRange). - Residual certificate (Tier D): symEig_reconstruction_residual and symEig_frobenius_residual prove the reconstruction error equals the off-diagonal mass of the rotated matrix exactly; isSymEig_of_diagonal closes the loop in the zero-residual limit. This replaces an impossible a-priori convergence proof (Mathlib v4.30.0 has no Jacobi convergence theory, and Float never diagonalizes exactly), matching the runtime assertLt checks. Scope honesty: the exact algebraic reconstruction of the finite float folds (A=L·Lᵀ, A=QR) is documented as the remaining increment (it needs a prefix-fold induction plus per-pivot positivity); the spec-level facts CHD relies on do not depend on it. Blueprint: new chapter Ch4_Verification/Factorizations.lean ("Matrix Factorizations for Kernel Methods"), registered in Guide.lean and cross-linked from ScientificMLVerification. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prove the exact algebraic reconstruction of the finite executable Cholesky factorization over ℝ, the increment documented in Basic/Factorizations.lean. `isCholesky_of_pos`: for symmetric A with positive executable pivots (0 < choleskyFn A j j — the success condition over ℝ), L = choleskyFn A is a genuine Cholesky factor (lower-triangular, A = L·Lᵀ), satisfying the `IsCholesky` spec. Tensor-level corollary `choleskySpec_reconstruction`. Method: a general snoc-fold read lemma (`getD_foldl_snoc_read`) reads the j-th built column as the step function on the length-j prefix; `prefix_eq_map` identifies that prefix with the first j columns of L; `take_map_sum_eq` rewrites the code's List.foldl sums as masked Finset partial sums. Positive pivots discharge the √-radicand and divisor side conditions; symmetry of A lifts the lower-triangular reconstruction to the full matrix. Blueprint: new "Exact Cholesky reconstruction" section; "What remains" narrowed to QR (dual-list GSState structure-fold + the orthonormality invariant). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prove the exact algebraic reconstruction of the finite executable QR (modified Gram–Schmidt) factorization over ℝ, extending the Cholesky work. `qr_mul_eq`: for A : Fin m → Fin n → ℝ whose executable R-pivots are positive (0 < Rmat A j j — full column rank, the success condition), the factors satisfy A = Q·R with R upper-triangular (`Rmat_upper_triangular`). `qrSpec_reconstruction` is the tensor-level corollary. Method: gramSchmidtFn threads a GSState that snocs onto both the Q-list and the R-list at once. The appended values depend only on the Q-history, so the Q-list is a single-list snoc-fold (`gs_proj_qs`, read by `getD_foldl_snoc_read`) and the R-list is the Q-prefix tail `rTail` (read by `gs_fold_split` + `rTail_getD`). The orthogonalization sum v = a − Σ rₖⱼqₖ (a List.zip fold) collapses to a map-fold (`cross_fold_eq`) and then a masked Finset partial sum (`take_map_sum_eq`); the positive pivot cancels the v/rⱼⱼ normalization exactly. Not done (documented, not sorry): orthonormality Qᵀ Q = 1, which rests on the Gram–Schmidt orthogonality invariant Mathlib only has for its own gramSchmidt. Blueprint: new "Exact QR reconstruction" section; "What remains" narrowed to the orthonormality invariant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the last open finite-fold property: orthonormality of the executable Gram–Schmidt Q factor. Rather than re-derive the orthogonality induction, the new file NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean unifies the executable variant with Mathlib's gramSchmidt. Reading columns of A as EuclideanSpace ℝ (Fin m) vectors (gsCol), Qcol_bridge proves by strong induction that the j-th executable Q column equals gramSchmidtNormed ℝ (gsCol A) j; orthonormality then follows from Mathlib's gramSchmidtNormed_orthonormal'. Yields Q_orthonormal (qₐ·q_b = δₐᵦ), QT_mul_Q_eq_one, the full IsQR predicate isQR_of_pos, and the tensor-level qrSpec_orthonormal. Three reusable connectors over ℝ — dotFn_eq_inner, normFn_eq_norm, proj_normalize — are stated generally enough to lift into a future Mathlib matrix-level QR contribution. Blueprint chapter and reconstruction docstring updated: only the iterative Jacobi/SVD convergence now remains (residual certificate only). Sorry-free; NN.Examples.Factorization still reconstructs at err 0.000000, with the empirical Qᵀ·Q = I check now formally backed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discharge the two hypotheses the symmetric-eigendecomposition residual
certificate assumed (`Vᵀ V = 1` and `A = V·Af·Vᵀ`) for the real
`symEigJacobiSpec` output, so the certificate holds outright.
NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean (sorry-free, warning-free):
- toM bridge from `Array (Array ℝ)` to Mathlib `Matrix`; toM_matMul/tr/id
show the array ops realise the matrix ops (unconditionally).
- givens_orthogonal: each Givens rotation with c²+s²=1 is orthogonal
(Jᵀ J = 1), via the three column shapes and a 9-case dot-product split.
- JacInv loop invariant preserved by jacInv_rotate/_sweep/_run
(List.foldlRecOn over jacobiPairs; base case (A, I)).
- jacobi_orthogonal, jacobi_similarity (no hypotheses) ⟹ unconditional
symEigJacobi_{reconstruction,frobenius}_residual and
symEigJacobi_isSymEig_of_diagonal, with worked examples.
Blueprint Ch4: new "Faithfulness of the Jacobi run" section; "What remains"
narrowed from the iterative properties to just the convergence rate.
Examples (positive + negative controls, compiled #eval assertions):
- Cholesky: indefinite A correctly fails (NaN; uses summed Frobenius error).
- QR: rank-deficient A reconstructs but Qᵀ Q ≠ I (full rank needed).
- SVD: Vᵀ V = I; permuted σ fails to reconstruct.
- SymEig: orthogonality exact at 1 sweep; off-diagonal residual asymptotic;
exact residual certificate verified numerically (lhs = rhs, |Δ| = 0).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xamples Add the classical Jacobi progress identity as an exact, finite theorem over ℝ: conjugating a symmetric A by the Givens rotation that annihilates pivot (p,q) drops the squared off-diagonal mass by exactly 2·A[p,q]² (`jacobi_off_decrease`). New: NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean - frobSq/diagSq/offSq mass machinery + frobSq_eq_diagSq_add_offSq. - frobSq_orthogonal_conj: orthogonal similarity preserves total Frobenius mass, so driving the off-diagonal down ≡ driving the diagonal up. - givens_conj_pp/qq/pq/other: explicit conjugation entries via bilinear support lemmas; the 2×2 block-Frobenius identity is frobSq_orthogonal_conj specialised to Fin 2 (no hand-tuned linear_combination coefficients). - jacobi_off_decrease: the per-rotation decrease, under symmetry + the pivot annihilation (the defining equation the Golub–Van Loan angle solves; the explicit pivot is givens_conj_pq). Examples: NN/Examples/Factorization/JacobiDecrease.lean (+ Common helpers) - Positive: one rotation takes off-diagonal mass 6 → 4 = 6 − 2·1², pivot annihilated to 0, total mass conserved at 35 — all to |Δ| = 0. - Negative controls: a wrong-angle (orthogonal) rotation misses the decrease; a non-orthogonal conjugation breaks Frobenius-mass invariance. Blueprint: new "Per-rotation progress" section; "What remains" narrowed to the aggregate cyclic rate (Forsythe–Henrici/Schönhage), since per-rotation progress is now exact. Sorry-free; builds green (NN.Proofs.Tensor.Basic, NN.Examples.Factorization, blueprint); repo_lint clean on all new/changed files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…amples Add the aggregate-rate development for the largest-pivot Jacobi strategy, building on the per-rotation decrease (Tier 2): - offSq_le_count_mul_max: the largest off-diagonal pivot carries at least the average share of the mass, ‖offDiag A‖² ≤ (n²−n)·A[p,q]². - jacobi_off_decrease_classical: substituting that into the exact per-rotation decrease yields a genuine linear contraction by 1 − 2/(n²−n) < 1. - geom_bound_of_contraction / tendsto_zero_of_contraction: a fixed-factor contraction iterates to ρᵏ·a₀ and (with offSq_nonneg, factor < 1) tends to 0, so the classical algorithm provably converges geometrically. Stated for an arbitrary per-step factor, so a future cyclic per-sweep bound plugs in directly. Honest scope: the cyclic ordering the solver uses does not satisfy the largest-pivot hypothesis (the research-grade Forsythe–Henrici/Schönhage rate); that gap stays captured by the exact a-posteriori residual certificate, never by sorry. Reviewer examples (NN/Examples/Factorization/JacobiRate.lean): largest pivot meets the rate (mass 50.04 → 0.04, far under the guaranteed 33.36); negative control — a tiny non-largest pivot misses the guaranteed factor. Blueprint gains an "Aggregate rate" section and a precise restatement of what remains. sorry-free, warning-free; repo_lint shows no new violations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ditional
Add the deferred keystone `choleskyFn_diag_pos_of_posDef`: for a
positive-definite A, every executable Cholesky pivot is strictly positive
(equivalently the radicand A[j,j] − Σ_{k<j} L[j,k]² > 0 at every step).
Proof avoids matrix inverses entirely. Strong induction on the pivot index:
- `choleskyFn_dot_eq_local` — localized reconstruction needing only the
smaller pivot's positivity (which is all the original `choleskyFn_dot_eq`
ever uses), powering the induction.
- The Schur-complement witness z is built by reusing the already-proven
back-substitution `triSolveUpperFn_mulVec` on the leading block (z m = 1,
z annihilates the leading columns of L).
- `double_sum_gram` collapses the Gram part of zᵀAz to L[m,m]²; the residual
part reduces to the single (m,m) entry, giving zᵀAz = radicand exactly.
- `Matrix.PosDef.dotProduct_mulVec_pos` forces zᵀAz > 0, hence radicand > 0,
hence the pivot √radicand > 0.
Unconditional corollaries (no pivot hypothesis):
- `solveRidgeFn_mulVec_of_posSemidef` — for PSD K and γ > 0, solveRidgeFn
solves (K+γ·I)·x = b exactly. The fully discharged verified core of CHD
`solve_variationnal`.
- `solveRidgeSpec_mulVec_of_posSemidef` — tensor-level form.
Also lands the spec-layer solve defs (triSolveLowerFn/triSolveUpperFn/
cholSolveFn/addScaledIdFn/solveRidgeFn/solveRidgeSpec), the registration
import, positive/negative `#eval` examples exhibiting the keystone dichotomy
(SPD K+γ·I → all pivots > 0; singular K → a zero pivot), and the blueprint
update (the direct kernel-ridge solve route now has nothing left to prove).
sorry/omega/admit-free across all new and changed source.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two capstones on top of the positive-pivot keystone, making the verified solve_variationnal match the form CHD actually specifies. Proofs (NN/Proofs/Tensor/Basic/FactorizationsSolve.lean): * cholesky_posDef — for any PosDef A, the executable choleskyFn IS a genuine Cholesky factor (lower-triangular, A = L·Lᵀ, strictly positive diagonal), with no pivot/symmetry/success hypothesis. Combines the keystone choleskyFn_diag_pos_of_posDef with isCholesky_of_pos. * solveRidgeFn_eq_inv_mulVec (+ tensor-level solveRidgeSpec_eq_inv_mulVec) — solveRidgeFn K γ b = (K + γ·I)⁻¹ b, the closed form CHD solve_variationnal specifies. Invertibility from Matrix.PosDef.isUnit; the solve identity (K + γ·I)·x = b then pins x uniquely to the inverse. No inverse is ever formed by the algorithm. Examples (NN/Examples/Factorization/RidgeSolve.lean), 8 #eval checks green: * Capstone reconstruction: SPD K + γ·I gives L·Lᵀ = K + γ·I to machine precision; negative control an indefinite matrix hits √(negative) = NaN and fails — PosDef (not mere symmetry) is necessary. (Documented subtlety: the singular PSD K still reconstructs with a zero pivot; the zero pivot breaks only the solve, the dichotomy the keystone isolates — so the reconstruction negative control uses a genuinely indefinite matrix.) * Inverse form: columns built by solveRidgeSpec K γ eⱼ assemble into (K + γ·I)⁻¹, and (K + γ·I)·(K + γ·I)⁻¹ = I; negative control γ = 0 on the singular K diverges (NaN). Docs: blueprint Ch4 Factorizations chapter gains a capstone paragraph and a closed-loop note in "What remains"; the two example module docstrings updated. sorry/admit/omega-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_test Tier A's predicate foundation (IsSymEig, add_smul_inv, trace_eq, det_eq) already existed; this adds the three concrete CHD routines built on it, mirroring interpolatory.py's eigendecomposition route, with their exact algebra proved over ℝ from the IsSymEig specification (no appeal to Jacobi convergence). Spec (NN/Spec/Core/Tensor/Factorizations.lean): executable eig-form mirrors — projFn (Pga = Vᵀga), ridgeCoeffFn (rᵢ = γ/(λᵢ+γ)), variationalSolveFn, varNoiseFn, plus tensor wrappers variationalSolveSpec / varNoiseSpec. Proofs (new NN/Proofs/Tensor/Basic/FactorizationsVariational.lean, sorry/omega-free): - variationalSolveFn_eq_neg_inv_mulVec: the eig-form solve_variationnal IS -(K+γI)⁻¹·ga (from add_smul_inv). - variationalSolveFn_eq_neg_solveRidgeFn: eig route = Cholesky route — two independent implementations of solve_variationnal agree on the one closed form. - IsSymEig.eigenvalues_nonneg: PSD ⟹ λ ≥ 0 (via VᵀAV PSD-congruence), discharging λᵢ+γ ≠ 0 from γ > 0. - varNoiseFn_eq_ratio: the noise / find_gamma loss / Z_test statistic is the spectral ratio Σ(Pgaᵢ·rᵢ)² / Σ Pgaᵢ²·rᵢ. - ridgeCoeffFn_pos/le_one, varNoiseFn_nonneg/le_one: for a PSD spectrum and γ > 0 the noise is a genuine fraction in [0,1]. - projFn_mulVec_self, varNoiseFn_projFn_mulVec: Z_test spectral invariance — feeding ga = V·z drops V, so the statistic depends on the kernel only through its spectrum. Examples (new NN/Examples/Factorization/Variational.lean): 8 green #eval checks on an SPD kernel — (K+γI)·yb = -ga and yb = -solveRidgeSpec to machine precision, noise ∈ [0,1], spectral invariance noise(V·z)=noise(z); negative controls: wrong eigenvectors break the solve (residual 3.72), γ = -0.7 pushes noise to -7.19. Blueprint: new "CHD routines" section + updated "What remains". Scope honesty: only the deterministic algebra is proved; Z_test's Gaussian sampling/percentiles are statistical, not algebraic, and are exercised numerically rather than proved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…semidefinite Every verified CHD solve/find_gamma/Z_test theorem assumes hK : (Matrix.of K).PosSemidef, but CHD builds K from data (Modes/kernels.py) — that hypothesis was never discharged. This proves it for the linear mode, the same move as the positive-pivot keystone: turn an assumed precondition into a theorem. Spec (NN/Spec/Core/Tensor/Factorizations.lean): maskColsFn (which_dim column masking), linearKernelFn (K[i,j] = 1 + scale·⟨Φi,Φj⟩ = 𝟙𝟙ᵀ + scale·ΦΦᵀ, mirroring LinearMode.vectorized_kernel), and the tensor wrapper linearKernelSpec. Proofs (new NN/Proofs/Tensor/Basic/FactorizationsKernels.lean, sorry/omega-free): - linearKernelFn_posSemidef: K is PSD for scale ≥ 0 — 𝟙𝟙ᵀ is a rank-one Gram (PSD), ΦΦᵀ is a Gram (posSemidef_self_mul_conjTranspose), scale ≥ 0 keeps it PSD (PosSemidef.smul), PosSemidef.add closes the sum. - linearKernelFn_symm: symmetry, a corollary of PosSemidef.isHermitian. - linearKernelSpec_posSemidef: the tensor-level statement the solve theorems consume — so solveRidgeSpec (linearKernelSpec X w scale) γ b is now an unconditional exact solve for γ > 0, no PSD hypothesis left to assume. Examples (new NN/Examples/Factorization/LinearKernel.lean): 6 green #eval checks — K = Kᵀ, matches the CHD LinearMode formula, all Jacobi eigenvalues ≥ 0 (feature masking preserved), the PSD kernel feeds an exact ridge solve (residual 0); negative control: scale = -1 makes 𝟙𝟙ᵀ − ΦΦᵀ indefinite (two negative eigenvalues). Blueprint: new "Building the kernel" section; flags the follow-ons — quadratic via the Schur product theorem (PosSemidef.hadamard, in Mathlib) and Gaussian via Schoenberg (Bochner not in Mathlib v4.30.0, the new research-grade item). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discharge the standing PosSemidef hypothesis for the second of CHD's non-interpolatory kernels. The quadratic mode K[i,j] = scale·(alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale) expands algebraically to K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ), a sum of three PSD pieces: the all-ones Gram, a nonnegative multiple of the data Gram Φ·Φᵀ, and a nonnegative multiple of its Hadamard square — PSD by the Schur product theorem (PosSemidef.hadamard). So K is PSD for scale ≥ 0 and alpha ≥ 0. - Spec: quadraticKernelFn / quadraticKernelSpec (NN/Spec/.../Factorizations.lean), the exact CHD QuadraticMode.vectorized_kernel. - Proof: quadraticKernelFn_posSemidef + _symm + tensor-level quadraticKernelSpec_posSemidef (FactorizationsKernels), sorry/omega-free. - Example: NN/Examples/Factorization/QuadraticKernel.lean — 8 green checks (symmetry, CHD-formula match, all eigenvalues ≥ 0 with masking preserved, exact ridge solve), two genuine negative controls (alpha = −1 → 2 negative eigenvalues, scale = −1 → 3). - Blueprint: dedicated quadratic-mode section (Ch4 Factorizations). Gaussian mode (Schoenberg/Bochner, absent from Mathlib v4.30.0) remains the research-grade follow-on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discharge the third and last CHD kernel mode's PosSemidef hypothesis,
without Bochner/Schoenberg (absent from Mathlib v4.30.0), via an
elementary Hadamard-exponential route that reuses the Schur product
theorem already used for the quadratic mode.
Spec (Factorizations.lean): gaussianKernelFn/gaussianKernelSpec,
K[i,j] = scale·∏_dim (1 + w[dim]·exp(−(X[i,dim]−X[j,dim])²/2l²)),
matching CHD GaussianMode (foldl product + squared-diff-as-product to
stay polymorphic over the law-free Context).
Proof (FactorizationsKernels.lean), sorry/admit/omega-free:
- posSemidef_of_tendsto — the PSD cone is closed under entrywise limits
(the one genuinely new, upstreamable lemma: the quadratic form is
continuous in the entries, {≥0} is closed).
- posSemidef_map_exp — entrywise exp of a PSD matrix is PSD via the
Hadamard-power series exp∘G = Σ G^∘k/k!.
- posSemidef_gaussianCol — a single Gaussian exp(−c(yᵢ−yⱼ)²) is PSD by
diagonal congruence of the entrywise exponential of the rank-one Gram.
- gaussianKernelFn_posSemidef — each feature factor 𝟙𝟙ᵀ + w·Gaussian is
PSD, product over features via the Schur product theorem; PSD for
scale ≥ 0 and a nonnegative mask w ≥ 0. Plus _symm and the
tensor-level gaussianKernelSpec_posSemidef.
Example (GaussianKernel.lean): 7/7 #eval checks — symmetric, matches
the CHD GaussianMode formula, PSD (no negative eigenvalue), masked PSD,
exact ridge solve; negative controls scale=−1 and a negative mask
weight w=[−2,0] both correctly rejected.
Blueprint Ch4: replaced the "research-grade" Gaussian paragraph with a
section documenting the discharge; all three modes now PSD-verified.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CHD's outer hypergraph-discovery loop (decision.py, _GraphDiscoveryMain.py)
turns the verified `noise` statistic into graph-structure decisions. This
proves each of those decisions correct, over the same executable specs that
run on Float.
Spec (NN/Spec/Core/Tensor/Factorizations.lean), mirroring the Python verbatim:
- argMinFn / argMaxFn — the np.argmin / np.argmax folds
- kernelChooserFn — MinNoiseKernelChooser (valid = noise < Z_low; the `2`
sentinel written `1+1` to stay Context-polymorphic)
- modeIncrementFn / modeChooserFn — MaxIncrementModeChooser
- allPrunedFn — the np.all(active_modes == 0) stopping rule
Proofs (NN/Proofs/Tensor/Basic/FactorizationsDecision.lean, new, sorry-free).
A single generic fold-selection lemma (foldl_select) underwrites both argmin
and argmax; the comparison proofs bridge the Context order test to the real `<`
(gtBool_eq_decide):
- argMinFn_le / argMaxFn_le — the prune step removes a least-activated ancestor
- kernelChooserFn_eq_some / _eq_none — MinNoiseKernelChooser is sound AND
complete: returns `some s` with s valid and of least noise among all valid
kernels exactly when a valid kernel exists, else `none`. Its `2`-sentinel
correctness rests directly on the verified varNoiseFn_le_one (hypothesis
`hbound`), so the decision is a proved selection over a statistic whose
[0,1] range was itself proved.
- modeChooserFn_ge — MaxIncrementModeChooser reports the largest noise-jump
iteration
- allPrunedFn_iff — the stopping test holds iff every ancestor is pruned
Examples (NN/Examples/Factorization/Discovery.lean, new): 13 #eval checks with
negative controls. The end-to-end block eigendecomposes an SPD kernel and runs
a find_gamma sweep feeding the verified varNoiseSpec at several gamma straight
into argMinFn (noises [0.004, 0.040, 0.287], all in [0,1], argmin = 0).
Blueprint Ch4 gains a discovery-decision-layer section and an updated closing
summary. Registrations: Basic.lean (proofs), Factorization.lean (examples).
Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic green (2705 jobs);
banned-tactic sweep clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The cyclic-Jacobi eigensolver was migrated to strict `Array (Array α)`, but `choleskyColsFn` and the triangular solves were left on the functional `Fin n → Fin n → α` representation. There each Cholesky column is a closure that re-evaluates every previous column, so reading the full factor `L` is exponential — and the spec is compiled without `precompileModules`, so any `#eval` of `solveRidgeSpec`/`choleskySpec` runs that closure in the interpreter. A single 4×4 ridge solve cost ~310 s; the QuadraticKernel example took ~645 s to build, and every ridge/Cholesky-using example was similarly slow. Add two strict, array-backed runtime implementations and register them with `@[implemented_by]`: * `choleskyColsImpl` → `choleskyColsFn` — materializes each column as an `Array α`, so a back-reference `L[i,k]` is an O(1) lookup. * `solveRidgeImpl` → `solveRidgeFn` — factors `K + γ·I = L·Lᵀ` and runs both triangular substitutions entirely over `Array`s, building no `Fin n → α` accumulator closures. `@[implemented_by]` swaps only the compiled/interpreted runtime code; the functional definitions — and every correctness proof that reasons about them (`FactorizationsSolve`, `FactorizationsReconstruction`, …) — are untouched. The examples' residual checks `(K+γ·I)·x − b ≈ 0` / `A = L·Lᵀ` numerically validate that the array path agrees with the proven definition. Result: QuadraticKernel 645 s → 6.8 s; a full clean rebuild of `NN.Examples.Factorization` + `NN.Proofs.Tensor.Basic` is 18.8 s (2705 jobs). No proof changes; sorry/admit/omega-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CHD's Z_test (interpolatory.py) decides edge significance by comparing the observed `noise` against the null distribution of the same statistic under random data: draw N samples, score each one's `noise`, sort, and read off the 5th/95th percentiles as Z_low/Z_high; an edge is significant when `noise < Z_low`. Spec layer (Factorizations.lean): `sampleNoisesFn` (each draw scored by the *same* `varNoiseFn`), `leBool`, the order statistic `kthSmallestFn` (mergeSort + index), `zLowIdx`/`zHighIdx` (⌊0.05·N⌋ / ⌊0.95·N⌋), `zLowFn`/`zHighFn`, `zSignificantFn`, and tensor wrappers `zLowSpec`/`zHighSpec` — mirroring Z_test verbatim and running on Float. Proofs (FactorizationsDecision, sorry-free): the order-statistic toolkit (`kthSmallestFn_mem`/`_nonneg`/`_le_one`/`_mono`, bridged to Mathlib's `sortedLE_mergeSort` via `leBool_eq_le`) and the Z_test guarantees — `sampleNoisesFn_nonneg`/`_le_one` (every null sample inherits the verified `noise ∈ [0,1]` bound), `zLowFn`/`zHighFn ∈ [0,1]`, `zLowFn_le_zHighFn` (Z_low ≤ Z_high by order-statistic monotonicity), and `zTest_admits_edge` (`noise < Z_low` ⟹ `MinNoiseKernelChooser` returns `some 0`), the chooser's `noise ≤ 1` precondition again being `varNoiseFn_le_one`. The keystone: the same [0,1] bound governs both the data noise and the whole null distribution, so the test is well-posed. Examples (Discovery): builds the null distribution from a real eigendecomposition, checks 0 ≤ Z_low ≤ Z_high ≤ 1, shows dominant-eigenvector data clears the lower tail (significant), and rejects a high noise and a noise at the upper tail (negative controls). Blueprint: new "The Z_test: a null-distribution significance threshold" section and updated scope summary — only the distributional half (Gaussian draws + calibrated percentile, needs Mathlib.Probability) remains. Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic 2705 jobs green; blueprint 4949 jobs; sorry/admit/omega-free. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close the provable half of the Z_test's distributional content (the part left open after the well-posed thresholds), in a new sorry-free module NN/Proofs/Tensor/Basic/FactorizationsZTest.lean: * Finite-sample calibration (counting, no probability theory): a sorted list has at most k entries below its k-th element, so via countP permutation- invariance over mergeSort the threshold's own empirical false-positive rate is bounded exactly — at most ⌊N/20⌋ ≈ 5% of the N null draws fall below Z_low (zLow_null_exceedance_le) and ≈ 5% rise above Z_high (zHigh_null_exceedance_le). This is the exact, non-asymptotic guarantee. * The Gaussian null law (measure theory): modelling the draws as i.i.d. standard Gaussian (nullGaussian = Measure.pi (gaussianReal 0 1)), the per-draw noise is measurable (measurable_noiseMap), so its pushforward noiseLaw is a probability measure (IsProbabilityMeasure) concentrated on [0,1] (noiseLaw_Icc_eq_one) — the verified varNoiseFn ∈ [0,1] bound lifted to the law. sampleNoisesFn_eq_noiseMap ties CHD's executable statistic to the model. Scope honesty: the asymptotic frontier — empirical→true quantile convergence (Glivenko–Cantelli/DKW) and the exchangeability rank rate k/(N+1) — needs an empirical-process theory absent from Mathlib v4.30.0, and is flagged rather than stubbed with sorry. Discovery.lean adds positive/negative #eval checks (exactly 1/20 below Z_low, 0 above Z_high, 19/20 below the slack Z_high as a negative control); the blueprint Ch4 chapter gains a "Z_test distributional layer" section and the "What remains" note now scopes only the asymptotic half. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lift the single null draw (FactorizationsZTest) to the i.i.d. *sequence*
`nullSeqGaussian n := Measure.infinitePi (fun _ : ℕ => nullGaussian n)` on
`ℕ → (Fin n → ℝ)` — the infinite product, since `Measure.pi` is finite-index
only — and `nullNoise Λ V γ i ω := noiseMap Λ V γ (ω i)`, the same measurable
statistic read off coordinate `i`. Proven sorry-free in the new
`NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean`:
- nullNoise_iIndepFun (← iIndepFun_infinitePi), pairwise
nullNoise_pairwise_indepFun in the Pairwise (·⟂ᵢ[μ]·) shape SLLN takes;
- nullNoise_hasLaw / nullNoise_identDistrib — each draw has the common law
noiseLaw (via measurePreserving_eval_infinitePi + HasLaw.comp);
- nullNoise_mem_Icc ([0,1]-valued) and integrable_nullNoise.
That is exactly the hint/hindep/hident triple `strong_law_ae_real` and the
Hoeffding tail consume, so plan steps (b)–(d) become applications of an
in-place scaffold. The *uniform* GC / DKW–Massart sharp constant and the
exchangeability rank rate k/(N+1) stay research-grade (flagged, not sorry'd).
Discovery.lean exercises the computable shadow — the empirical CDF
F̂_N(t) = #{i<N : noiseᵢ ≤ t}/N — checked a bona fide CDF (in [0,1], monotone,
saturating to 1, 0 below support) with a non-degeneracy negative control; six
new #eval checks pass. Aggregate example docstring and the Ch4 blueprint
(distributional-layer + "What remains") updated to record the scaffold.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ia the SLLN
Build on the step-(a) i.i.d. scaffold (FactorizationsZAsymptotic) to prove the
pointwise empirical-CDF consistency theorem, sorry-free over Mathlib v4.30.0:
- nullBelow Λ V γ t i ω := (Set.Iic t).indicator 1 (nullNoise Λ V γ i ω), the
threshold indicators 1{noiseᵢ ≤ t}, and empCDF, their normalized prefix sum.
- nullBelow_pairwise_indepFun / nullBelow_identDistrib / integrable_nullBelow:
the indicators inherit the scaffold's i.i.d.-bounded-integrable structure
(indicator composed onto each independent, identically-distributed draw).
- integral_nullBelow_zero: their common mean is exactly cdf (noiseLaw Λ V γ) t
(HasLaw.integral_comp + integral_indicator_one + cdf_eq_real), so empCDF is the
Monte-Carlo estimator of the null CDF.
- empCDF_tendsto_cdf: strong_law_ae_real on the hint/hindep/hident triple gives
∀ᵐ ω, Tendsto (fun N => empCDF … N t ω) atTop (𝓝 (cdf noiseLaw t)) — pointwise
Glivenko–Cantelli at each fixed t.
Discovery.lean adds the matching #eval block exercising the computable shadow:
the growing-prefix running mean F̂_N settling toward the full-sample estimate of
cdf noiseLaw t (each prefix a valid [0,1] CDF value, cdf 1 = 1 attained at every
N, with a non-degeneracy negative control). Aggregate docstring and the Ch4
blueprint updated to describe step (b); the uniform GC / DKW–Massart sharp
constant and exchangeability rank rate remain flagged research-grade (no sorry).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Hoeffding Prove the finite-sample companion of step (b)'s almost-sure limit: at a fixed threshold t, the empirical CDF of the i.i.d. null draws concentrates around the true null CDF at the sharp Hoeffding rate. FactorizationsZAsymptotic.lean: - nullBelow_subgaussian / nullBelow_neg_subgaussian: the [0,1]-bounded threshold indicators are sub-Gaussian with variance proxy 1/4 (hasSubgaussianMGF_of_mem_Icc), centered at mean cdf noiseLaw t (integral_nullBelow_eq). - hoeffding_avg_ge: Mathlib's measure_sum_ge_le_of_iIndepFun with ε ↦ N·ε turns the proxy sum N/4 into the sharp exponent -2Nε². - empCDF_upper_tail / empCDF_lower_tail: P(±(empCDF − cdf) ≥ ε) ≤ exp(-2Nε²). - empCDF_concentration: two-sided DKW-at-a-point P(|empCDF − cdf| ≥ ε) ≤ 2·exp(-2Nε²) via measureReal_union_le + le_abs. Discovery.lean: matching #eval block exercising the bound's computable shadows — the tail function 2·exp(-2Nε²) (twice one-sided, decreasing in N and ε, non-vacuous < 1 once 2Nε² > ln 2, trivially 2 at ε = 0) and the observed prefix deviation it governs (every prefix of ≥ 3 draws keeps F̂_N within ε = 0.3 of the full sample; negative control: N = 1, 2 deviate by 0.5 > ε). Docs: aggregate Factorization docstring and Ch4 blueprint updated; uniform DKW–Massart and quantile-transfer step (d) remain flagged research-grade, never sorry'd. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…percentile consistency) Inverts steps (b)-(c)'s empirical-CDF convergence into consistency of the empirical percentiles the Z_test chooser thresholds against, sorry-free over Mathlib v4.30.0. NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean: - empCDF_mono: empirical CDF monotone in the threshold. - StraddlesQuantile / IsLowerQuantile: the honest population straddle hypothesis (continuity + strict monotonicity through p at q) and the defining property of a lower empirical p-quantile. - empCDF_eventually_straddle: the transfer engine -- pointwise consistency at q +/- eps makes the empirical CDF eventually straddle p. - empQuantile_tendsto: any lower empirical p-quantile converges a.s. to q (sandwich pinned into [q-eps, q+eps], intersected over eps = 1/(m+1) via ae_all_iff). NN/Examples/Factorization/Discovery.lean: matching #eval block -- lower p-quantile reaches level p, monotone in p, in [0,1], median converges (<= 0.02 for N >= 3); negative controls for non-vacuity and straddle-hypothesis sensitivity. Docs: aggregate Factorization.lean bullet, Ch4 Verso blueprint, and CHDTorch plan updated to mark the four-tier asymptotic-calibration plan complete, leaving only the uniform DKW-Massart constant, the zLowFn/zHighFn triangular-array wiring, and the exchangeability rank rate as research-grade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Closing this in favor of a slimmed-down, refocused contribution. Since opening it, I realized this branch had grown to mix three separable things:
Following the question of whether this belongs in TorchLean or in a repo of its own, I've split them:
So this PR is superseded; continuing in #2 |
Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery
Motivation
The end goal of this line of work is to formalize
Computational Hypergraph Discovery (CHD)
(paper)
on top of TorchLean. CHD is a Gaussian-process / kernel-ridge method that recovers the dependency
structure of a system by repeatedly solving regularized kernel systems and testing the resulting
variances. Crucially, every quantity CHD inspects is a function of the full symmetric
eigendecomposition of a kernel matrix
K:solve_variationnalforms the regularized solve(K + γI)⁻¹ b;find_gammaselects the ridge/noise level from spectral summaries (trace/determinant);Z_testthresholds variances against a calibrated null distribution.Before this PR, TorchLean's only eigen-routine was a power-iteration stub that recovers just the
largest eigenpair — nowhere near enough to stand CHD on. You cannot formalize CHD without first
having a verified linear-algebra foundation centered on the full symmetric eigendecomposition. This
PR builds exactly that foundation, and then carries it forward through the CHD-specific consequences
(regularized inverse, PSD kernels, the discovery decision layer, and the
Z_testcalibration) thatsit directly on top of it.
The guiding principle throughout is honest scope: finite constructions get exact theorems,
iterative routines get exact invariants plus an a-posteriori residual certificate, and genuinely
research-grade gaps (e.g. Jacobi convergence, which Mathlib v4.30.0 does not have) are documented as
open rather than papered over. There is no
sorry, noadmit, and no new axiom anywhere inNN/.What this PR adds
1. Executable reference factorizations (spec layer)
NN/Spec/Core/Tensor/Factorizations.lean— shape-indexed,Context-polymorphic reference implementations that run over nativeFloat:choleskySpec(Cholesky),qrSpec(QR via modified Gram–Schmidt),symEigJacobiSpec(full symmetric eigendecomposition via cyclic Jacobi),svdSpec(SVD built on the symmetric eig).Each has runtime
#eval assertLt …reconstruction checks. A strict-array@[implemented_by]pathmakes the Cholesky / ridge-solve
#evals run in seconds (the earlier functional representationOOM'd / ran for hours).
2. Correctness theorems (proof layer), organized by what can honestly be proved
IsSymEig A Λ V(orthogonal
V,A = V·diag(Λ)·Vᵀ), proved independently of any algorithm:IsSymEig.add_smul_inv:(K+γI)⁻¹ = V·diag(1/(λᵢ+γ))·Vᵀ— the coresolve_variationnalidentity, from orthogonality of
Valone (holds for any eigendecomposition the solver returns);IsSymEig.trace_eq/IsSymEig.det_eq/IsSymEig.isHermitian— the scalar summaries behindfind_gammaand the evidence terms;IsSVD.gram_isSymEig: an SVD ofAis an eigendecomposition of the Gram matrixK = AᵀA(eigenvalues
σᵢ²), connecting the SVD spec to the eig foundation.givens_normSq(c²+s²=1) ⟹ each Jacobi sweep is an orthogonal similarity, sotrace_orthogonal_conj/det_orthogonal_conjpreserve the spectrum exactly at every step;choleskyFn_lower_triangular,Rmat_upper_triangular— structural triangularity by construction.A = L·Lᵀ(Cholesky) andA = Q·R,QᵀQ = 1(QR, bridged to MathlibgramSchmidt).cannot yet support): an exact identity bounding reconstruction error by off-diagonal mass, with
the numeric
assertLtexamples as concrete instances. Plus the classical per-rotation off-diagonaldecrease and linear-rate lemmas.
3. CHD-specific layers built on the foundation
semidefinite (discharging the hypothesis the spectral facts need).
unconditional; the inverse form closes the
solve_variationnalloop.Z_testdistributional layer + asymptotic calibration (steps a–d):(a) i.i.d.-draws scaffold, (b) pointwise Glivenko–Cantelli via the SLLN,
(c) pointwise DKW-at-a-point via Hoeffding, (d) quantile transfer (empirical-percentile
consistency:
empQuantile_tendsto). Each step ships with positive and negative executableexamples.
4. Blueprint + examples
Ch4_Verification/Factorizations.leandocumenting the CHD motivation and the exact-vs-iterative-vs-certificate scope split, registered in
Guide.leanand cross-linked fromScientificMLVerification.lean.NN/Examples/Factorization/suite (Cholesky, QR, SymEig, SVD, the three kernels,ridge-solve, variational, Jacobi decrease/rate, discovery) with positive and negative
#evalchecks.
Scope honesty — what is not claimed
A = V·diag(λ)·Vᵀis not asserted a-priori for the floating-point Jacobi/SVD output: those areiterative, and Mathlib v4.30.0 has no Jacobi/QR-eig convergence theory. That gap is captured as a
residual certificate, never as
sorry.DKW–Massart constant, the triangular-array wiring of the order-statistic percentiles at the moving
level
p_N → 1/20, and the exchangeability rank ratek/(N+1).Verification
lake buildissorry/admit-free acrossNN/; no new axioms. (omegaappears only forpure-ℕ index/length arithmetic — a sound decision procedure, not a hole.)
NN/Examples/Factorization/*#eval assertLt/assertTruechecks pass, confirming theresidual-certificate hypotheses hold on the concrete instances.
cd blueprint && lake build) and renders the new chapter.Stats
~22 commits; ~4.7k lines of new proofs across
NN/Proofs/Tensor/Basic/Factorizations*.lean, plus thespec layer, examples, and blueprint chapter.