Skip to content
#

statistical-evaluation

Here are 2 public repositories matching this topic...

Language: All
Filter by language

frontier-evals-harness is a lightweight framework for benchmarking frontier language models. It provides deterministic suite versioning, modular adapters, standardized scoring, and paired statistical comparisons with confidence intervals. Built for regression tracking and analysis, it enables reproducible evaluation without infrastructure.

  • Updated Feb 19, 2026
  • Python

Classification models for detecting fake reviews and predicting software bugs. Includes implementations of decision trees, bagging, random forests, logistic regression, and Naive Bayes, with statistical evaluation using McNemar's test.

  • Updated Jun 28, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the statistical-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the statistical-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more