You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
frontier-evals-harness is a lightweight framework for benchmarking frontier language models. It provides deterministic suite versioning, modular adapters, standardized scoring, and paired statistical comparisons with confidence intervals. Built for regression tracking and analysis, it enables reproducible evaluation without infrastructure.
Classification models for detecting fake reviews and predicting software bugs. Includes implementations of decision trees, bagging, random forests, logistic regression, and Naive Bayes, with statistical evaluation using McNemar's test.