Skip to content

Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery#1

Closed
NicolasRouquette wants to merge 22 commits into
lean-dojo:mainfrom
NicolasRouquette:add-matrix-factorizations
Closed

Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery#1
NicolasRouquette wants to merge 22 commits into
lean-dojo:mainfrom
NicolasRouquette:add-matrix-factorizations

Conversation

@NicolasRouquette

Copy link
Copy Markdown

Verified matrix factorizations: a foundation for formalizing Computational Hypergraph Discovery

Motivation

The end goal of this line of work is to formalize
Computational Hypergraph Discovery (CHD)
(paper)
on top of TorchLean. CHD is a Gaussian-process / kernel-ridge method that recovers the dependency
structure of a system by repeatedly solving regularized kernel systems and testing the resulting
variances. Crucially, every quantity CHD inspects is a function of the full symmetric
eigendecomposition of a kernel matrix K
:

  • solve_variationnal forms the regularized solve (K + γI)⁻¹ b;
  • find_gamma selects the ridge/noise level from spectral summaries (trace/determinant);
  • the Z_test thresholds variances against a calibrated null distribution.

Before this PR, TorchLean's only eigen-routine was a power-iteration stub that recovers just the
largest eigenpair — nowhere near enough to stand CHD on. You cannot formalize CHD without first
having a verified linear-algebra foundation centered on the full symmetric eigendecomposition.
This
PR builds exactly that foundation, and then carries it forward through the CHD-specific consequences
(regularized inverse, PSD kernels, the discovery decision layer, and the Z_test calibration) that
sit directly on top of it.

The guiding principle throughout is honest scope: finite constructions get exact theorems,
iterative routines get exact invariants plus an a-posteriori residual certificate, and genuinely
research-grade gaps (e.g. Jacobi convergence, which Mathlib v4.30.0 does not have) are documented as
open rather than papered over. There is no sorry, no admit, and no new axiom anywhere in
NN/.

What this PR adds

1. Executable reference factorizations (spec layer)

NN/Spec/Core/Tensor/Factorizations.lean — shape-indexed,
Context-polymorphic reference implementations that run over native Float:

  • choleskySpec (Cholesky), qrSpec (QR via modified Gram–Schmidt),
  • symEigJacobiSpec (full symmetric eigendecomposition via cyclic Jacobi),
  • svdSpec (SVD built on the symmetric eig).

Each has runtime #eval assertLt … reconstruction checks. A strict-array @[implemented_by] path
makes the Cholesky / ridge-solve #evals run in seconds (the earlier functional representation
OOM'd / ran for hours).

2. Correctness theorems (proof layer), organized by what can honestly be proved

  • Specification consequences — the CHD foundation. From the predicate IsSymEig A Λ V
    (orthogonal V, A = V·diag(Λ)·Vᵀ), proved independently of any algorithm:
    • IsSymEig.add_smul_inv: (K+γI)⁻¹ = V·diag(1/(λᵢ+γ))·Vᵀ — the core solve_variationnal
      identity, from orthogonality of V alone (holds for any eigendecomposition the solver returns);
    • IsSymEig.trace_eq / IsSymEig.det_eq / IsSymEig.isHermitian — the scalar summaries behind
      find_gamma and the evidence terms;
    • IsSVD.gram_isSymEig: an SVD of A is an eigendecomposition of the Gram matrix K = AᵀA
      (eigenvalues σᵢ²), connecting the SVD spec to the eig foundation.
  • Exact invariants of the algorithms (no convergence/rounding caveat):
    • givens_normSq (c²+s²=1) ⟹ each Jacobi sweep is an orthogonal similarity, so
      trace_orthogonal_conj / det_orthogonal_conj preserve the spectrum exactly at every step;
    • choleskyFn_lower_triangular, Rmat_upper_triangular — structural triangularity by construction.
  • Exact reconstruction for the finite constructions (over ℝ, under success hypotheses):
    • A = L·Lᵀ (Cholesky) and A = Q·R, QᵀQ = 1 (QR, bridged to Mathlib gramSchmidt).
  • Residual certificate for the iterative routines (in place of a convergence proof Mathlib
    cannot yet support): an exact identity bounding reconstruction error by off-diagonal mass, with
    the numeric assertLt examples as concrete instances. Plus the classical per-rotation off-diagonal
    decrease and linear-rate lemmas.

3. CHD-specific layers built on the foundation

  • Kernel PSD-ness: the linear, quadratic, and Gaussian-mode CHD kernels are proved positive
    semidefinite (discharging the hypothesis the spectral facts need).
  • Kernel-ridge solve: the SPD Cholesky positive-pivot keystone makes the ridge solve
    unconditional; the inverse form closes the solve_variationnal loop.
  • Discovery decision layer: the CHD ancestor/edge decision is proved sound and complete.
  • Z_test distributional layer + asymptotic calibration (steps a–d):
    (a) i.i.d.-draws scaffold, (b) pointwise Glivenko–Cantelli via the SLLN,
    (c) pointwise DKW-at-a-point via Hoeffding, (d) quantile transfer (empirical-percentile
    consistency: empQuantile_tendsto). Each step ships with positive and negative executable
    examples.

4. Blueprint + examples

  • New Verso chapter
    Ch4_Verification/Factorizations.lean
    documenting the CHD motivation and the exact-vs-iterative-vs-certificate scope split, registered in
    Guide.lean and cross-linked from ScientificMLVerification.lean.
  • A full NN/Examples/Factorization/ suite (Cholesky, QR, SymEig, SVD, the three kernels,
    ridge-solve, variational, Jacobi decrease/rate, discovery) with positive and negative #eval
    checks.

Scope honesty — what is not claimed

  • A = V·diag(λ)·Vᵀ is not asserted a-priori for the floating-point Jacobi/SVD output: those are
    iterative, and Mathlib v4.30.0 has no Jacobi/QR-eig convergence theory. That gap is captured as a
    residual certificate, never as sorry.
  • The remaining open items are documented in the blueprint: uniform Glivenko–Cantelli / sharp
    DKW–Massart constant, the triangular-array wiring of the order-statistic percentiles at the moving
    level p_N → 1/20, and the exchangeability rank rate k/(N+1).

Verification

  • lake build is sorry/admit-free across NN/; no new axioms. (omega appears only for
    pure-ℕ index/length arithmetic — a sound decision procedure, not a hole.)
  • All NN/Examples/Factorization/* #eval assertLt / assertTrue checks pass, confirming the
    residual-certificate hypotheses hold on the concrete instances.
  • The Verso blueprint builds (cd blueprint && lake build) and renders the new chapter.

Stats

~22 commits; ~4.7k lines of new proofs across NN/Proofs/Tensor/Basic/Factorizations*.lean, plus the
spec layer, examples, and blueprint chapter.

NicolasRouquette and others added 22 commits June 1, 2026 08:31
Motivation: formalizing Computational Hypergraph Discovery (CHD) in TorchLean.

CHD (prototype: https://github.com/TheoBourdais/ComputationalHypergraphDiscovery)
is a Gaussian-process / kernel method: it fits a kernel ridge regression per node
and prunes "ancestors" by a noise/activation criterion. Essentially every
statistically meaningful quantity it computes — the regression solution, the
gamma (noise) selection, and the Z-test — is derived from the *full* symmetric
eigendecomposition of the kernel matrix K (all eigenvalues AND eigenvectors).

TorchLean's spec layer lacked this. The only eigendecomposition available
(`Spec.eigendecompSpec`) is a power-iteration stub that recovers just the largest
eigenpair, and there were no Cholesky / QR / SVD routines at all. That makes CHD
inexpressible. This commit adds real reference implementations so the kernel
linear algebra CHD depends on can be written in Lean.

New: NN/Spec/Core/Tensor/Factorizations.lean (namespace Spec), executable over
Float / ℝ, shape-indexed like the rest of the spec layer:
- choleskySpec      : A = L · Lᵀ for SPD A (lower-triangular L)
- qrSpec/qrQSpec/qrRSpec : A = Q · R via modified Gram–Schmidt
- symEigJacobiSpec  : FULL symmetric eigendecomposition via cyclic Jacobi
                      (all n eigenpairs) — the replacement for the largest-only
                      stub, which is left untouched so PCA keeps building
- svdSpec           : A = U · diag(σ) · Vᵀ via the eig of Aᵀ·A

The iterative Jacobi loop runs over a strict `Array (Array α)` representation
(converted to/from Spec.Tensor only at the boundary): threading the functional
`Fin n → Fin n → α` representation through one matrix product per rotation builds
deep closure chains that blow up under evaluation, whereas arrays are strict
values and keep execution cheap.

Examples: NN/Examples/Factorization/{Common,Cholesky,QR,SymEig,SVD}.lean (+ the
NN.Examples.Factorization umbrella). Each reconstructs the input from its factors
and asserts (compiled `assertLt` via #eval, which fails the build) that the
maximum reconstruction error is below tolerance. All reconstruct to 0.000000;
SymEig recovers eigenvalues {1.3249, 2.4608, 5.2143} and SVD singular values
{5, 3, 0} as expected.

These are executable reference defs (matching the bar of the existing
determinantSpec / inverseSpec / eigendecompSpec); formal correctness theorems
are a planned follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Formal verification companion to the spec-layer factorizations added in e0d08ac.
The motivation is Computational Hypergraph Discovery
(https://github.com/TheoBourdais/ComputationalHypergraphDiscovery), a Gaussian-
process / kernel-ridge method whose numerical core is the full symmetric
eigendecomposition of a kernel matrix K (solve_variationnal, find_gamma, Z_test).
This commit gives that core a verified linear-algebra foundation.

New: NN/Proofs/Tensor/Basic/Factorizations.lean (sorry-free, over ℝ via Mathlib).
A refinement architecture: specification predicates on Mathlib matrices, with the
CHD consequences proved from the spec independent of the float algorithm.

- Specifications: IsCholesky, IsQR, IsSymEig, IsSVD.
- CHD foundation (consumed by solve_variationnal / find_gamma / Z_test):
  * IsSymEig.add_smul_inv — the regularized inverse
    (K + γI)⁻¹ = V·diag(1/(λ+γ))·Vᵀ, proved from orthogonality of V.
  * IsSymEig.trace_eq / det_eq (trace K = Σλ, det K = Πλ), isHermitian.
  * IsSVD.gram_isSymEig — an SVD of A is an eigendecomposition of the Gram
    matrix AᵀA with eigenvalues σ², the form CHD actually builds.
- Exact algorithm invariants: trace_orthogonal_conj / det_orthogonal_conj
  (every Jacobi sweep is a spectrum-preserving orthogonal similarity),
  givens_normSq (c²+s²=1), choleskyFn_lower_triangular (via a reusable
  List.foldl indexing lemma getD_foldl_finRange).
- Residual certificate (Tier D): symEig_reconstruction_residual and
  symEig_frobenius_residual prove the reconstruction error equals the
  off-diagonal mass of the rotated matrix exactly; isSymEig_of_diagonal closes
  the loop in the zero-residual limit. This replaces an impossible a-priori
  convergence proof (Mathlib v4.30.0 has no Jacobi convergence theory, and Float
  never diagonalizes exactly), matching the runtime assertLt checks.

Scope honesty: the exact algebraic reconstruction of the finite float folds
(A=L·Lᵀ, A=QR) is documented as the remaining increment (it needs a prefix-fold
induction plus per-pivot positivity); the spec-level facts CHD relies on do not
depend on it.

Blueprint: new chapter Ch4_Verification/Factorizations.lean ("Matrix
Factorizations for Kernel Methods"), registered in Guide.lean and cross-linked
from ScientificMLVerification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prove the exact algebraic reconstruction of the finite executable Cholesky
factorization over ℝ, the increment documented in Basic/Factorizations.lean.

`isCholesky_of_pos`: for symmetric A with positive executable pivots
(0 < choleskyFn A j j — the success condition over ℝ), L = choleskyFn A is a
genuine Cholesky factor (lower-triangular, A = L·Lᵀ), satisfying the
`IsCholesky` spec. Tensor-level corollary `choleskySpec_reconstruction`.

Method: a general snoc-fold read lemma (`getD_foldl_snoc_read`) reads the j-th
built column as the step function on the length-j prefix; `prefix_eq_map`
identifies that prefix with the first j columns of L; `take_map_sum_eq` rewrites
the code's List.foldl sums as masked Finset partial sums. Positive pivots
discharge the √-radicand and divisor side conditions; symmetry of A lifts the
lower-triangular reconstruction to the full matrix.

Blueprint: new "Exact Cholesky reconstruction" section; "What remains" narrowed
to QR (dual-list GSState structure-fold + the orthonormality invariant).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prove the exact algebraic reconstruction of the finite executable QR
(modified Gram–Schmidt) factorization over ℝ, extending the Cholesky work.

`qr_mul_eq`: for A : Fin m → Fin n → ℝ whose executable R-pivots are positive
(0 < Rmat A j j — full column rank, the success condition), the factors satisfy
A = Q·R with R upper-triangular (`Rmat_upper_triangular`). `qrSpec_reconstruction`
is the tensor-level corollary.

Method: gramSchmidtFn threads a GSState that snocs onto both the Q-list and the
R-list at once. The appended values depend only on the Q-history, so the Q-list
is a single-list snoc-fold (`gs_proj_qs`, read by `getD_foldl_snoc_read`) and the
R-list is the Q-prefix tail `rTail` (read by `gs_fold_split` + `rTail_getD`). The
orthogonalization sum v = a − Σ rₖⱼqₖ (a List.zip fold) collapses to a map-fold
(`cross_fold_eq`) and then a masked Finset partial sum (`take_map_sum_eq`); the
positive pivot cancels the v/rⱼⱼ normalization exactly.

Not done (documented, not sorry): orthonormality Qᵀ Q = 1, which rests on the
Gram–Schmidt orthogonality invariant Mathlib only has for its own gramSchmidt.

Blueprint: new "Exact QR reconstruction" section; "What remains" narrowed to the
orthonormality invariant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the last open finite-fold property: orthonormality of the executable
Gram–Schmidt Q factor. Rather than re-derive the orthogonality induction, the
new file NN/Proofs/Tensor/Basic/FactorizationsOrthonormal.lean unifies the
executable variant with Mathlib's gramSchmidt.

Reading columns of A as EuclideanSpace ℝ (Fin m) vectors (gsCol), Qcol_bridge
proves by strong induction that the j-th executable Q column equals
gramSchmidtNormed ℝ (gsCol A) j; orthonormality then follows from Mathlib's
gramSchmidtNormed_orthonormal'. Yields Q_orthonormal (qₐ·q_b = δₐᵦ),
QT_mul_Q_eq_one, the full IsQR predicate isQR_of_pos, and the tensor-level
qrSpec_orthonormal.

Three reusable connectors over ℝ — dotFn_eq_inner, normFn_eq_norm,
proj_normalize — are stated generally enough to lift into a future Mathlib
matrix-level QR contribution.

Blueprint chapter and reconstruction docstring updated: only the iterative
Jacobi/SVD convergence now remains (residual certificate only). Sorry-free;
NN.Examples.Factorization still reconstructs at err 0.000000, with the
empirical Qᵀ·Q = I check now formally backed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discharge the two hypotheses the symmetric-eigendecomposition residual
certificate assumed (`Vᵀ V = 1` and `A = V·Af·Vᵀ`) for the real
`symEigJacobiSpec` output, so the certificate holds outright.

NN/Proofs/Tensor/Basic/FactorizationsJacobi.lean (sorry-free, warning-free):
- toM bridge from `Array (Array ℝ)` to Mathlib `Matrix`; toM_matMul/tr/id
  show the array ops realise the matrix ops (unconditionally).
- givens_orthogonal: each Givens rotation with c²+s²=1 is orthogonal
  (Jᵀ J = 1), via the three column shapes and a 9-case dot-product split.
- JacInv loop invariant preserved by jacInv_rotate/_sweep/_run
  (List.foldlRecOn over jacobiPairs; base case (A, I)).
- jacobi_orthogonal, jacobi_similarity (no hypotheses) ⟹ unconditional
  symEigJacobi_{reconstruction,frobenius}_residual and
  symEigJacobi_isSymEig_of_diagonal, with worked examples.

Blueprint Ch4: new "Faithfulness of the Jacobi run" section; "What remains"
narrowed from the iterative properties to just the convergence rate.

Examples (positive + negative controls, compiled #eval assertions):
- Cholesky: indefinite A correctly fails (NaN; uses summed Frobenius error).
- QR: rank-deficient A reconstructs but Qᵀ Q ≠ I (full rank needed).
- SVD: Vᵀ V = I; permuted σ fails to reconstruct.
- SymEig: orthogonality exact at 1 sweep; off-diagonal residual asymptotic;
  exact residual certificate verified numerically (lhs = rhs, |Δ| = 0).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xamples

Add the classical Jacobi progress identity as an exact, finite theorem over ℝ:
conjugating a symmetric A by the Givens rotation that annihilates pivot (p,q)
drops the squared off-diagonal mass by exactly 2·A[p,q]²
(`jacobi_off_decrease`).

New: NN/Proofs/Tensor/Basic/FactorizationsJacobiDecrease.lean
- frobSq/diagSq/offSq mass machinery + frobSq_eq_diagSq_add_offSq.
- frobSq_orthogonal_conj: orthogonal similarity preserves total Frobenius mass,
  so driving the off-diagonal down ≡ driving the diagonal up.
- givens_conj_pp/qq/pq/other: explicit conjugation entries via bilinear support
  lemmas; the 2×2 block-Frobenius identity is frobSq_orthogonal_conj specialised
  to Fin 2 (no hand-tuned linear_combination coefficients).
- jacobi_off_decrease: the per-rotation decrease, under symmetry + the pivot
  annihilation (the defining equation the Golub–Van Loan angle solves; the
  explicit pivot is givens_conj_pq).

Examples: NN/Examples/Factorization/JacobiDecrease.lean (+ Common helpers)
- Positive: one rotation takes off-diagonal mass 6 → 4 = 6 − 2·1², pivot
  annihilated to 0, total mass conserved at 35 — all to |Δ| = 0.
- Negative controls: a wrong-angle (orthogonal) rotation misses the decrease;
  a non-orthogonal conjugation breaks Frobenius-mass invariance.

Blueprint: new "Per-rotation progress" section; "What remains" narrowed to the
aggregate cyclic rate (Forsythe–Henrici/Schönhage), since per-rotation progress
is now exact.

Sorry-free; builds green (NN.Proofs.Tensor.Basic, NN.Examples.Factorization,
blueprint); repo_lint clean on all new/changed files.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…amples

Add the aggregate-rate development for the largest-pivot Jacobi strategy,
building on the per-rotation decrease (Tier 2):

- offSq_le_count_mul_max: the largest off-diagonal pivot carries at least the
  average share of the mass, ‖offDiag A‖² ≤ (n²−n)·A[p,q]².
- jacobi_off_decrease_classical: substituting that into the exact per-rotation
  decrease yields a genuine linear contraction by 1 − 2/(n²−n) < 1.
- geom_bound_of_contraction / tendsto_zero_of_contraction: a fixed-factor
  contraction iterates to ρᵏ·a₀ and (with offSq_nonneg, factor < 1) tends to 0,
  so the classical algorithm provably converges geometrically. Stated for an
  arbitrary per-step factor, so a future cyclic per-sweep bound plugs in directly.

Honest scope: the cyclic ordering the solver uses does not satisfy the
largest-pivot hypothesis (the research-grade Forsythe–Henrici/Schönhage rate);
that gap stays captured by the exact a-posteriori residual certificate, never
by sorry.

Reviewer examples (NN/Examples/Factorization/JacobiRate.lean): largest pivot
meets the rate (mass 50.04 → 0.04, far under the guaranteed 33.36); negative
control — a tiny non-largest pivot misses the guaranteed factor. Blueprint
gains an "Aggregate rate" section and a precise restatement of what remains.

sorry-free, warning-free; repo_lint shows no new violations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ditional

Add the deferred keystone `choleskyFn_diag_pos_of_posDef`: for a
positive-definite A, every executable Cholesky pivot is strictly positive
(equivalently the radicand A[j,j] − Σ_{k<j} L[j,k]² > 0 at every step).

Proof avoids matrix inverses entirely. Strong induction on the pivot index:
- `choleskyFn_dot_eq_local` — localized reconstruction needing only the
  smaller pivot's positivity (which is all the original `choleskyFn_dot_eq`
  ever uses), powering the induction.
- The Schur-complement witness z is built by reusing the already-proven
  back-substitution `triSolveUpperFn_mulVec` on the leading block (z m = 1,
  z annihilates the leading columns of L).
- `double_sum_gram` collapses the Gram part of zᵀAz to L[m,m]²; the residual
  part reduces to the single (m,m) entry, giving zᵀAz = radicand exactly.
- `Matrix.PosDef.dotProduct_mulVec_pos` forces zᵀAz > 0, hence radicand > 0,
  hence the pivot √radicand > 0.

Unconditional corollaries (no pivot hypothesis):
- `solveRidgeFn_mulVec_of_posSemidef` — for PSD K and γ > 0, solveRidgeFn
  solves (K+γ·I)·x = b exactly. The fully discharged verified core of CHD
  `solve_variationnal`.
- `solveRidgeSpec_mulVec_of_posSemidef` — tensor-level form.

Also lands the spec-layer solve defs (triSolveLowerFn/triSolveUpperFn/
cholSolveFn/addScaledIdFn/solveRidgeFn/solveRidgeSpec), the registration
import, positive/negative `#eval` examples exhibiting the keystone dichotomy
(SPD K+γ·I → all pivots > 0; singular K → a zero pivot), and the blueprint
update (the direct kernel-ridge solve route now has nothing left to prove).

sorry/omega/admit-free across all new and changed source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two capstones on top of the positive-pivot keystone, making the verified
solve_variationnal match the form CHD actually specifies.

Proofs (NN/Proofs/Tensor/Basic/FactorizationsSolve.lean):

* cholesky_posDef — for any PosDef A, the executable choleskyFn IS a genuine
  Cholesky factor (lower-triangular, A = L·Lᵀ, strictly positive diagonal),
  with no pivot/symmetry/success hypothesis. Combines the keystone
  choleskyFn_diag_pos_of_posDef with isCholesky_of_pos.
* solveRidgeFn_eq_inv_mulVec (+ tensor-level solveRidgeSpec_eq_inv_mulVec) —
  solveRidgeFn K γ b = (K + γ·I)⁻¹ b, the closed form CHD solve_variationnal
  specifies. Invertibility from Matrix.PosDef.isUnit; the solve identity
  (K + γ·I)·x = b then pins x uniquely to the inverse. No inverse is ever
  formed by the algorithm.

Examples (NN/Examples/Factorization/RidgeSolve.lean), 8 #eval checks green:

* Capstone reconstruction: SPD K + γ·I gives L·Lᵀ = K + γ·I to machine
  precision; negative control an indefinite matrix hits √(negative) = NaN and
  fails — PosDef (not mere symmetry) is necessary. (Documented subtlety: the
  singular PSD K still reconstructs with a zero pivot; the zero pivot breaks
  only the solve, the dichotomy the keystone isolates — so the reconstruction
  negative control uses a genuinely indefinite matrix.)
* Inverse form: columns built by solveRidgeSpec K γ eⱼ assemble into
  (K + γ·I)⁻¹, and (K + γ·I)·(K + γ·I)⁻¹ = I; negative control γ = 0 on the
  singular K diverges (NaN).

Docs: blueprint Ch4 Factorizations chapter gains a capstone paragraph and a
closed-loop note in "What remains"; the two example module docstrings updated.

sorry/admit/omega-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_test

Tier A's predicate foundation (IsSymEig, add_smul_inv, trace_eq, det_eq) already
existed; this adds the three concrete CHD routines built on it, mirroring
interpolatory.py's eigendecomposition route, with their exact algebra proved over
ℝ from the IsSymEig specification (no appeal to Jacobi convergence).

Spec (NN/Spec/Core/Tensor/Factorizations.lean): executable eig-form mirrors —
projFn (Pga = Vᵀga), ridgeCoeffFn (rᵢ = γ/(λᵢ+γ)), variationalSolveFn, varNoiseFn,
plus tensor wrappers variationalSolveSpec / varNoiseSpec.

Proofs (new NN/Proofs/Tensor/Basic/FactorizationsVariational.lean, sorry/omega-free):
- variationalSolveFn_eq_neg_inv_mulVec: the eig-form solve_variationnal IS -(K+γI)⁻¹·ga
  (from add_smul_inv).
- variationalSolveFn_eq_neg_solveRidgeFn: eig route = Cholesky route — two independent
  implementations of solve_variationnal agree on the one closed form.
- IsSymEig.eigenvalues_nonneg: PSD ⟹ λ ≥ 0 (via VᵀAV PSD-congruence), discharging
  λᵢ+γ ≠ 0 from γ > 0.
- varNoiseFn_eq_ratio: the noise / find_gamma loss / Z_test statistic is the spectral
  ratio Σ(Pgaᵢ·rᵢ)² / Σ Pgaᵢ²·rᵢ.
- ridgeCoeffFn_pos/le_one, varNoiseFn_nonneg/le_one: for a PSD spectrum and γ > 0 the
  noise is a genuine fraction in [0,1].
- projFn_mulVec_self, varNoiseFn_projFn_mulVec: Z_test spectral invariance — feeding
  ga = V·z drops V, so the statistic depends on the kernel only through its spectrum.

Examples (new NN/Examples/Factorization/Variational.lean): 8 green #eval checks on an
SPD kernel — (K+γI)·yb = -ga and yb = -solveRidgeSpec to machine precision, noise ∈
[0,1], spectral invariance noise(V·z)=noise(z); negative controls: wrong eigenvectors
break the solve (residual 3.72), γ = -0.7 pushes noise to -7.19.

Blueprint: new "CHD routines" section + updated "What remains". Scope honesty: only the
deterministic algebra is proved; Z_test's Gaussian sampling/percentiles are statistical,
not algebraic, and are exercised numerically rather than proved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…semidefinite

Every verified CHD solve/find_gamma/Z_test theorem assumes hK : (Matrix.of K).PosSemidef,
but CHD builds K from data (Modes/kernels.py) — that hypothesis was never discharged.
This proves it for the linear mode, the same move as the positive-pivot keystone: turn
an assumed precondition into a theorem.

Spec (NN/Spec/Core/Tensor/Factorizations.lean): maskColsFn (which_dim column masking),
linearKernelFn (K[i,j] = 1 + scale·⟨Φi,Φj⟩ = 𝟙𝟙ᵀ + scale·ΦΦᵀ, mirroring
LinearMode.vectorized_kernel), and the tensor wrapper linearKernelSpec.

Proofs (new NN/Proofs/Tensor/Basic/FactorizationsKernels.lean, sorry/omega-free):
- linearKernelFn_posSemidef: K is PSD for scale ≥ 0 — 𝟙𝟙ᵀ is a rank-one Gram (PSD),
  ΦΦᵀ is a Gram (posSemidef_self_mul_conjTranspose), scale ≥ 0 keeps it PSD
  (PosSemidef.smul), PosSemidef.add closes the sum.
- linearKernelFn_symm: symmetry, a corollary of PosSemidef.isHermitian.
- linearKernelSpec_posSemidef: the tensor-level statement the solve theorems consume —
  so solveRidgeSpec (linearKernelSpec X w scale) γ b is now an unconditional exact solve
  for γ > 0, no PSD hypothesis left to assume.

Examples (new NN/Examples/Factorization/LinearKernel.lean): 6 green #eval checks —
K = Kᵀ, matches the CHD LinearMode formula, all Jacobi eigenvalues ≥ 0 (feature masking
preserved), the PSD kernel feeds an exact ridge solve (residual 0); negative control:
scale = -1 makes 𝟙𝟙ᵀ − ΦΦᵀ indefinite (two negative eigenvalues).

Blueprint: new "Building the kernel" section; flags the follow-ons — quadratic via the
Schur product theorem (PosSemidef.hadamard, in Mathlib) and Gaussian via Schoenberg
(Bochner not in Mathlib v4.30.0, the new research-grade item).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discharge the standing PosSemidef hypothesis for the second of CHD's
non-interpolatory kernels. The quadratic mode

  K[i,j] = scale·(alpha + ⟨Φ i, Φ j⟩)² + (1 − alpha²·scale)

expands algebraically to

  K = 𝟙𝟙ᵀ + (2·scale·alpha)·Φ·Φᵀ + scale·(Φ·Φᵀ ⊙ Φ·Φᵀ),

a sum of three PSD pieces: the all-ones Gram, a nonnegative multiple of
the data Gram Φ·Φᵀ, and a nonnegative multiple of its Hadamard square —
PSD by the Schur product theorem (PosSemidef.hadamard). So K is PSD for
scale ≥ 0 and alpha ≥ 0.

- Spec: quadraticKernelFn / quadraticKernelSpec (NN/Spec/.../Factorizations.lean),
  the exact CHD QuadraticMode.vectorized_kernel.
- Proof: quadraticKernelFn_posSemidef + _symm + tensor-level
  quadraticKernelSpec_posSemidef (FactorizationsKernels), sorry/omega-free.
- Example: NN/Examples/Factorization/QuadraticKernel.lean — 8 green checks
  (symmetry, CHD-formula match, all eigenvalues ≥ 0 with masking preserved,
  exact ridge solve), two genuine negative controls (alpha = −1 → 2 negative
  eigenvalues, scale = −1 → 3).
- Blueprint: dedicated quadratic-mode section (Ch4 Factorizations).

Gaussian mode (Schoenberg/Bochner, absent from Mathlib v4.30.0) remains
the research-grade follow-on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Discharge the third and last CHD kernel mode's PosSemidef hypothesis,
without Bochner/Schoenberg (absent from Mathlib v4.30.0), via an
elementary Hadamard-exponential route that reuses the Schur product
theorem already used for the quadratic mode.

Spec (Factorizations.lean): gaussianKernelFn/gaussianKernelSpec,
K[i,j] = scale·∏_dim (1 + w[dim]·exp(−(X[i,dim]−X[j,dim])²/2l²)),
matching CHD GaussianMode (foldl product + squared-diff-as-product to
stay polymorphic over the law-free Context).

Proof (FactorizationsKernels.lean), sorry/admit/omega-free:
- posSemidef_of_tendsto — the PSD cone is closed under entrywise limits
  (the one genuinely new, upstreamable lemma: the quadratic form is
  continuous in the entries, {≥0} is closed).
- posSemidef_map_exp — entrywise exp of a PSD matrix is PSD via the
  Hadamard-power series exp∘G = Σ G^∘k/k!.
- posSemidef_gaussianCol — a single Gaussian exp(−c(yᵢ−yⱼ)²) is PSD by
  diagonal congruence of the entrywise exponential of the rank-one Gram.
- gaussianKernelFn_posSemidef — each feature factor 𝟙𝟙ᵀ + w·Gaussian is
  PSD, product over features via the Schur product theorem; PSD for
  scale ≥ 0 and a nonnegative mask w ≥ 0. Plus _symm and the
  tensor-level gaussianKernelSpec_posSemidef.

Example (GaussianKernel.lean): 7/7 #eval checks — symmetric, matches
the CHD GaussianMode formula, PSD (no negative eigenvalue), masked PSD,
exact ridge solve; negative controls scale=−1 and a negative mask
weight w=[−2,0] both correctly rejected.

Blueprint Ch4: replaced the "research-grade" Gaussian paragraph with a
section documenting the discharge; all three modes now PSD-verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CHD's outer hypergraph-discovery loop (decision.py, _GraphDiscoveryMain.py)
turns the verified `noise` statistic into graph-structure decisions. This
proves each of those decisions correct, over the same executable specs that
run on Float.

Spec (NN/Spec/Core/Tensor/Factorizations.lean), mirroring the Python verbatim:
- argMinFn / argMaxFn  — the np.argmin / np.argmax folds
- kernelChooserFn      — MinNoiseKernelChooser (valid = noise < Z_low; the `2`
                         sentinel written `1+1` to stay Context-polymorphic)
- modeIncrementFn / modeChooserFn — MaxIncrementModeChooser
- allPrunedFn          — the np.all(active_modes == 0) stopping rule

Proofs (NN/Proofs/Tensor/Basic/FactorizationsDecision.lean, new, sorry-free).
A single generic fold-selection lemma (foldl_select) underwrites both argmin
and argmax; the comparison proofs bridge the Context order test to the real `<`
(gtBool_eq_decide):
- argMinFn_le / argMaxFn_le — the prune step removes a least-activated ancestor
- kernelChooserFn_eq_some / _eq_none — MinNoiseKernelChooser is sound AND
  complete: returns `some s` with s valid and of least noise among all valid
  kernels exactly when a valid kernel exists, else `none`. Its `2`-sentinel
  correctness rests directly on the verified varNoiseFn_le_one (hypothesis
  `hbound`), so the decision is a proved selection over a statistic whose
  [0,1] range was itself proved.
- modeChooserFn_ge — MaxIncrementModeChooser reports the largest noise-jump
  iteration
- allPrunedFn_iff — the stopping test holds iff every ancestor is pruned

Examples (NN/Examples/Factorization/Discovery.lean, new): 13 #eval checks with
negative controls. The end-to-end block eigendecomposes an SPD kernel and runs
a find_gamma sweep feeding the verified varNoiseSpec at several gamma straight
into argMinFn (noises [0.004, 0.040, 0.287], all in [0,1], argmin = 0).

Blueprint Ch4 gains a discovery-decision-layer section and an updated closing
summary. Registrations: Basic.lean (proofs), Factorization.lean (examples).

Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic green (2705 jobs);
banned-tactic sweep clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The cyclic-Jacobi eigensolver was migrated to strict `Array (Array α)`,
but `choleskyColsFn` and the triangular solves were left on the functional
`Fin n → Fin n → α` representation. There each Cholesky column is a closure
that re-evaluates every previous column, so reading the full factor `L` is
exponential — and the spec is compiled without `precompileModules`, so any
`#eval` of `solveRidgeSpec`/`choleskySpec` runs that closure in the
interpreter. A single 4×4 ridge solve cost ~310 s; the QuadraticKernel
example took ~645 s to build, and every ridge/Cholesky-using example was
similarly slow.

Add two strict, array-backed runtime implementations and register them with
`@[implemented_by]`:

* `choleskyColsImpl` → `choleskyColsFn` — materializes each column as an
  `Array α`, so a back-reference `L[i,k]` is an O(1) lookup.
* `solveRidgeImpl` → `solveRidgeFn` — factors `K + γ·I = L·Lᵀ` and runs both
  triangular substitutions entirely over `Array`s, building no `Fin n → α`
  accumulator closures.

`@[implemented_by]` swaps only the compiled/interpreted runtime code; the
functional definitions — and every correctness proof that reasons about them
(`FactorizationsSolve`, `FactorizationsReconstruction`, …) — are untouched.
The examples' residual checks `(K+γ·I)·x − b ≈ 0` / `A = L·Lᵀ` numerically
validate that the array path agrees with the proven definition.

Result: QuadraticKernel 645 s → 6.8 s; a full clean rebuild of
`NN.Examples.Factorization` + `NN.Proofs.Tensor.Basic` is 18.8 s (2705 jobs).
No proof changes; sorry/admit/omega-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CHD's Z_test (interpolatory.py) decides edge significance by comparing the
observed `noise` against the null distribution of the same statistic under
random data: draw N samples, score each one's `noise`, sort, and read off the
5th/95th percentiles as Z_low/Z_high; an edge is significant when
`noise < Z_low`.

Spec layer (Factorizations.lean): `sampleNoisesFn` (each draw scored by the
*same* `varNoiseFn`), `leBool`, the order statistic `kthSmallestFn`
(mergeSort + index), `zLowIdx`/`zHighIdx` (⌊0.05·N⌋ / ⌊0.95·N⌋),
`zLowFn`/`zHighFn`, `zSignificantFn`, and tensor wrappers `zLowSpec`/`zHighSpec`
— mirroring Z_test verbatim and running on Float.

Proofs (FactorizationsDecision, sorry-free): the order-statistic toolkit
(`kthSmallestFn_mem`/`_nonneg`/`_le_one`/`_mono`, bridged to Mathlib's
`sortedLE_mergeSort` via `leBool_eq_le`) and the Z_test guarantees —
`sampleNoisesFn_nonneg`/`_le_one` (every null sample inherits the verified
`noise ∈ [0,1]` bound), `zLowFn`/`zHighFn ∈ [0,1]`, `zLowFn_le_zHighFn`
(Z_low ≤ Z_high by order-statistic monotonicity), and `zTest_admits_edge`
(`noise < Z_low` ⟹ `MinNoiseKernelChooser` returns `some 0`), the chooser's
`noise ≤ 1` precondition again being `varNoiseFn_le_one`. The keystone: the
same [0,1] bound governs both the data noise and the whole null distribution,
so the test is well-posed.

Examples (Discovery): builds the null distribution from a real
eigendecomposition, checks 0 ≤ Z_low ≤ Z_high ≤ 1, shows dominant-eigenvector
data clears the lower tail (significant), and rejects a high noise and a noise
at the upper tail (negative controls).

Blueprint: new "The Z_test: a null-distribution significance threshold"
section and updated scope summary — only the distributional half (Gaussian
draws + calibrated percentile, needs Mathlib.Probability) remains.

Verified: NN.Examples.Factorization + NN.Proofs.Tensor.Basic 2705 jobs green;
blueprint 4949 jobs; sorry/admit/omega-free.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close the provable half of the Z_test's distributional content (the part
left open after the well-posed thresholds), in a new sorry-free module
NN/Proofs/Tensor/Basic/FactorizationsZTest.lean:

* Finite-sample calibration (counting, no probability theory): a sorted list
  has at most k entries below its k-th element, so via countP permutation-
  invariance over mergeSort the threshold's own empirical false-positive rate
  is bounded exactly — at most ⌊N/20⌋ ≈ 5% of the N null draws fall below
  Z_low (zLow_null_exceedance_le) and ≈ 5% rise above Z_high
  (zHigh_null_exceedance_le). This is the exact, non-asymptotic guarantee.

* The Gaussian null law (measure theory): modelling the draws as i.i.d.
  standard Gaussian (nullGaussian = Measure.pi (gaussianReal 0 1)), the
  per-draw noise is measurable (measurable_noiseMap), so its pushforward
  noiseLaw is a probability measure (IsProbabilityMeasure) concentrated on
  [0,1] (noiseLaw_Icc_eq_one) — the verified varNoiseFn ∈ [0,1] bound lifted
  to the law. sampleNoisesFn_eq_noiseMap ties CHD's executable statistic to
  the model.

Scope honesty: the asymptotic frontier — empirical→true quantile convergence
(Glivenko–Cantelli/DKW) and the exchangeability rank rate k/(N+1) — needs an
empirical-process theory absent from Mathlib v4.30.0, and is flagged rather
than stubbed with sorry.

Discovery.lean adds positive/negative #eval checks (exactly 1/20 below Z_low,
0 above Z_high, 19/20 below the slack Z_high as a negative control); the
blueprint Ch4 chapter gains a "Z_test distributional layer" section and the
"What remains" note now scopes only the asymptotic half.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lift the single null draw (FactorizationsZTest) to the i.i.d. *sequence*
`nullSeqGaussian n := Measure.infinitePi (fun _ : ℕ => nullGaussian n)` on
`ℕ → (Fin n → ℝ)` — the infinite product, since `Measure.pi` is finite-index
only — and `nullNoise Λ V γ i ω := noiseMap Λ V γ (ω i)`, the same measurable
statistic read off coordinate `i`. Proven sorry-free in the new
`NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean`:

  - nullNoise_iIndepFun (← iIndepFun_infinitePi), pairwise
    nullNoise_pairwise_indepFun in the Pairwise (·⟂ᵢ[μ]·) shape SLLN takes;
  - nullNoise_hasLaw / nullNoise_identDistrib — each draw has the common law
    noiseLaw (via measurePreserving_eval_infinitePi + HasLaw.comp);
  - nullNoise_mem_Icc ([0,1]-valued) and integrable_nullNoise.

That is exactly the hint/hindep/hident triple `strong_law_ae_real` and the
Hoeffding tail consume, so plan steps (b)–(d) become applications of an
in-place scaffold. The *uniform* GC / DKW–Massart sharp constant and the
exchangeability rank rate k/(N+1) stay research-grade (flagged, not sorry'd).

Discovery.lean exercises the computable shadow — the empirical CDF
F̂_N(t) = #{i<N : noiseᵢ ≤ t}/N — checked a bona fide CDF (in [0,1], monotone,
saturating to 1, 0 below support) with a non-degeneracy negative control; six
new #eval checks pass. Aggregate example docstring and the Ch4 blueprint
(distributional-layer + "What remains") updated to record the scaffold.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ia the SLLN

Build on the step-(a) i.i.d. scaffold (FactorizationsZAsymptotic) to prove the
pointwise empirical-CDF consistency theorem, sorry-free over Mathlib v4.30.0:

- nullBelow Λ V γ t i ω := (Set.Iic t).indicator 1 (nullNoise Λ V γ i ω), the
  threshold indicators 1{noiseᵢ ≤ t}, and empCDF, their normalized prefix sum.
- nullBelow_pairwise_indepFun / nullBelow_identDistrib / integrable_nullBelow:
  the indicators inherit the scaffold's i.i.d.-bounded-integrable structure
  (indicator composed onto each independent, identically-distributed draw).
- integral_nullBelow_zero: their common mean is exactly cdf (noiseLaw Λ V γ) t
  (HasLaw.integral_comp + integral_indicator_one + cdf_eq_real), so empCDF is the
  Monte-Carlo estimator of the null CDF.
- empCDF_tendsto_cdf: strong_law_ae_real on the hint/hindep/hident triple gives
  ∀ᵐ ω, Tendsto (fun N => empCDF … N t ω) atTop (𝓝 (cdf noiseLaw t)) — pointwise
  Glivenko–Cantelli at each fixed t.

Discovery.lean adds the matching #eval block exercising the computable shadow:
the growing-prefix running mean F̂_N settling toward the full-sample estimate of
cdf noiseLaw t (each prefix a valid [0,1] CDF value, cdf 1 = 1 attained at every
N, with a non-degeneracy negative control). Aggregate docstring and the Ch4
blueprint updated to describe step (b); the uniform GC / DKW–Massart sharp
constant and exchangeability rank rate remain flagged research-grade (no sorry).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Hoeffding

Prove the finite-sample companion of step (b)'s almost-sure limit: at a fixed
threshold t, the empirical CDF of the i.i.d. null draws concentrates around the
true null CDF at the sharp Hoeffding rate.

FactorizationsZAsymptotic.lean:
- nullBelow_subgaussian / nullBelow_neg_subgaussian: the [0,1]-bounded threshold
  indicators are sub-Gaussian with variance proxy 1/4 (hasSubgaussianMGF_of_mem_Icc),
  centered at mean cdf noiseLaw t (integral_nullBelow_eq).
- hoeffding_avg_ge: Mathlib's measure_sum_ge_le_of_iIndepFun with ε ↦ N·ε turns the
  proxy sum N/4 into the sharp exponent -2Nε².
- empCDF_upper_tail / empCDF_lower_tail: P(±(empCDF − cdf) ≥ ε) ≤ exp(-2Nε²).
- empCDF_concentration: two-sided DKW-at-a-point P(|empCDF − cdf| ≥ ε) ≤ 2·exp(-2Nε²)
  via measureReal_union_le + le_abs.

Discovery.lean: matching #eval block exercising the bound's computable shadows — the
tail function 2·exp(-2Nε²) (twice one-sided, decreasing in N and ε, non-vacuous < 1
once 2Nε² > ln 2, trivially 2 at ε = 0) and the observed prefix deviation it governs
(every prefix of ≥ 3 draws keeps F̂_N within ε = 0.3 of the full sample; negative
control: N = 1, 2 deviate by 0.5 > ε).

Docs: aggregate Factorization docstring and Ch4 blueprint updated; uniform DKW–Massart
and quantile-transfer step (d) remain flagged research-grade, never sorry'd.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…percentile consistency)

Inverts steps (b)-(c)'s empirical-CDF convergence into consistency of the
empirical percentiles the Z_test chooser thresholds against, sorry-free over
Mathlib v4.30.0.

NN/Proofs/Tensor/Basic/FactorizationsZAsymptotic.lean:
- empCDF_mono: empirical CDF monotone in the threshold.
- StraddlesQuantile / IsLowerQuantile: the honest population straddle hypothesis
  (continuity + strict monotonicity through p at q) and the defining property of
  a lower empirical p-quantile.
- empCDF_eventually_straddle: the transfer engine -- pointwise consistency at
  q +/- eps makes the empirical CDF eventually straddle p.
- empQuantile_tendsto: any lower empirical p-quantile converges a.s. to q (sandwich
  pinned into [q-eps, q+eps], intersected over eps = 1/(m+1) via ae_all_iff).

NN/Examples/Factorization/Discovery.lean: matching #eval block -- lower p-quantile
reaches level p, monotone in p, in [0,1], median converges (<= 0.02 for N >= 3);
negative controls for non-vacuity and straddle-hypothesis sensitivity.

Docs: aggregate Factorization.lean bullet, Ch4 Verso blueprint, and CHDTorch plan
updated to mark the four-tier asymptotic-calibration plan complete, leaving only
the uniform DKW-Massart constant, the zLowFn/zHighFn triangular-array wiring, and
the exchangeability rank rate as research-grade.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@NicolasRouquette

Copy link
Copy Markdown
Author

Closing this in favor of a slimmed-down, refocused contribution.

Since opening it, I realized this branch had grown to mix three separable things:

  • generic matrix-factorization linear algebra,
  • the Computational Hypergraph Discovery (CHD) specific layers, and
  • a KernelFlows port.

Following the question of whether this belongs in TorchLean or in a repo of its own, I've split them:

  • For TorchLean — only the generic, domain-independent factorizations (Cholesky, QR,
    symmetric eigendecomposition, SVD, generalized eigenproblem). Pure linear algebra, no CHD
    anywhere, useful to anyone doing GP / kernel / PCA / least-squares work. Proposed in a new,
    much smaller PR Add verified matrix factorizations: Cholesky and QR (exact) #2 for the exact Cholesky/QR part and a subsequent PR for the iterative
    eigensolver/SVD part. This will help keep the PR reviews to a reasonable, coherent scope.

  • Separate repository (CHDTorch) — all the CHD- and KernelFlows-specific work (discovery,
    the Z-test, kernel losses / gradients / optimizers) now lives in its own Lean package that
    takes TorchLean as a library dependency, with zero changes to TorchLean's source.

So this PR is superseded; continuing in #2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant