Does increasing GPT-5.2 reasoning effort improve diagnosis accuracy enough to justify the token/latency cost? Ablation study on 897 paired medical cases.
-
Updated
Mar 25, 2026 - Python
Does increasing GPT-5.2 reasoning effort improve diagnosis accuracy enough to justify the token/latency cost? Ablation study on 897 paired medical cases.
Classification models for detecting fake reviews and predicting software bugs. Includes implementations of decision trees, bagging, random forests, logistic regression, and Naive Bayes, with statistical evaluation using McNemar's test.
A hybrid anomaly detection pipeline combining ensemble machine learning models and deep learning techniques for credit card fraud detection. Evaluates 25 model combinations across multiple datasets and validates performance using McNemar and Friedman statistical tests.
Pre-registered cross-validated voter committees for honest evaluation on dental panoramic VQA: 75.36% on MMOral-OPG-Bench (370/491), McNemar p=1.1e-7
Add a description, image, and links to the mcnemar-test topic page so that developers can more easily learn about it.
To associate your repository with the mcnemar-test topic, visit your repo's landing page and select "manage topics."