Quantitative performance benchmarking of machine learning classifiers for phishing detection, utilizing precision, recall, and F1-score optimization on high-dimensional labeled datasets.
-
Updated
May 28, 2026 - Python
Quantitative performance benchmarking of machine learning classifiers for phishing detection, utilizing precision, recall, and F1-score optimization on high-dimensional labeled datasets.
frontier-evals-harness is a lightweight framework for benchmarking frontier language models. It provides deterministic suite versioning, modular adapters, standardized scoring, and paired statistical comparisons with confidence intervals. Built for regression tracking and analysis, it enables reproducible evaluation without infrastructure.
Classification models for detecting fake reviews and predicting software bugs. Includes implementations of decision trees, bagging, random forests, logistic regression, and Naive Bayes, with statistical evaluation using McNemar's test.
Add a description, image, and links to the statistical-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the statistical-evaluation topic, visit your repo's landing page and select "manage topics."