Skip to content

Add variance-budget and scale-invariant fit training diagnostics#40

Open
bwengals wants to merge 4 commits into
mainfrom
gp-variance-diagnostics
Open

Add variance-budget and scale-invariant fit training diagnostics#40
bwengals wants to merge 4 commits into
mainfrom
gp-variance-diagnostics

Conversation

@bwengals

@bwengals bwengals commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Adds training diagnostics that describe how a GP splits the response variance across the mean function, the GP signal, and the observation noise, plus a scale-invariant likelihood-based "excess fit" metric for both VFE and the exact GP.

Motivation

excess_fit_per_n was sensitive to the mean and scale of y: the GP defaults to a zero mean function and y enters the fit uncentered, and the metric referenced sigma**2, so the log-determinant's scale dependence was not cancelled. Its documented "goes to 0 at the noise floor" was also wrong (it sat at -0.5).

What's added

  • variance_budget(gp, X, y) (model-agnostic): decomposes the response variance into mean / GP-signal / noise via the law of total variance, Var(y) = Var(m(X)) + mean(diag(K)) + mean(sigma(X)**2). Returns the three contributions, their fractions (sum to 1), and var_ratio = total / Var(y) for calibration. Invariant to the mean and scale of y; works for any mean function, composed (sum) kernels, and scalar or heteroskedastic sigma.
  • vfe_diagnostics: gains the four budget fields, and excess_fit_per_n is redefined to fit_per_n + 0.5*log(2*pi*Var(y - m(X))) + 0.5. Referencing the residual variance instead of sigma**2 cancels the log-determinant's scale term, so the metric is scale-invariant and reads 0 against a constant-mean Gaussian.
  • unapproximated_diagnostics(gp, X, y) (new): the exact-GP analogue, built on marginal_log_likelihood. Reports mll/fit/logdet, per-point fit and complexity, the same scale-invariant excess_fit_per_n, and the variance budget.

All new fields flow through compile_scipy_diagnostics, tracked_minimize, and to_idata automatically via the namedtuple plumbing.

Heteroskedastic sigma

Every metric handles sigma as a scalar or an X-dependent length-N vector (sigma * ones(N) then mean(sigma**2)).

Related

Relates to #7 (live training monitor dashboard): each new namedtuple field is a ready-to-plot series. Does not close any open issue.

Test plan

  • pytest tests/ 275 passed
  • pre-commit run --all-files clean (runs on every commit)
  • scripts/run_mypy.py 37/37 pass

📚 Documentation preview 📚: https://ptgp--40.org.readthedocs.build/en/40/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants