Add aime2026 dataset to load_example_dataset by vedthebear · Pull Request #1399 · PrimeIntellect-ai/verifiers

vedthebear · 2026-05-17T04:33:31Z

What

Registers MathArena/aime_2026 (30 problems from the 2026 AIME I + II) in verifiers/utils/data_utils.py alongside the existing aime2024 / aime2025 entries.

Why

AIME 2026 was released after most current models' training cutoffs, which makes it a useful held-out math benchmark for evaluating top-tier models. Including it here means anyone running a math env via verifiers can grab it with the standard one-liner:

from verifiers.utils.data_utils import load_example_dataset
ds = load_example_dataset("aime2026")

How

Two small additions matching the surrounding pattern:

get_preprocess_fn: new aime2026 branch returning {"question": x["problem"], "temp_answer": str(x["answer"])}.
load_example_dataset: new aime2026 branch calling load_dataset("MathArena/aime_2026")["train"].

Implementation note

The source dataset's answer column is typed int64. The new preprocessor returns the renamed key temp_answer (which gets renamed back to answer by the existing hook at the end of load_example_dataset) so the int-to-string cast survives .map's type inference. Same workaround the existing mmlu preprocessor already uses for its int-typed answer column — no new mechanism introduced.

Verification

python -c "
from verifiers.utils.data_utils import load_example_dataset
ds = load_example_dataset('aime2026')
print(ds.features)  # {'question': Value('string'), 'answer': Value('string')}
print(ds[0])        # {'question': '...', 'answer': '277'}
print(len(ds))      # 30
"

Ruff passes on the modified file (uv run ruff check verifiers/utils/data_utils.py).

Note

Low Risk
Low risk: small, additive dataset registration and preprocessing changes with no impact to core evaluation logic beyond enabling a new dataset name.

Overview
Adds a new aime2026 option to load_example_dataset that loads MathArena/aime_2026 (default train) and wires it into get_preprocess_fn.

The new preprocessor maps problem to question and stringifies the int-typed answer via a temp_answer field that is later renamed back to answer to preserve the cast through .map.

^{Reviewed by Cursor Bugbot for commit 663e714. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add `aime2026` dataset support to `load_example_dataset`

Adds the MathArena/aime_2026 dataset as a loadable option in data_utils.py. The preprocessor maps problem to question and casts the integer answer to a string via a temp_answer intermediate column, which is then renamed to answer post-map to avoid type conflicts during dataset mapping.

^{Macroscope summarized 663e714.}

Registers MathArena/aime_2026 (30 problems, 2026 AIME I+II released after most current model training cutoffs, making it useful as a held-out math benchmark when running evaluations on top-of-the-line models). Uses the existing ``temp_answer`` rename hook in ``load_example_dataset`` so the int->str cast on the source dataset's ``answer`` column (typed int64) survives ``.map``'s type inference. Same workaround the mmlu preprocessor already uses for its int-typed answer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aime2026 dataset to load_example_dataset#1399

Add aime2026 dataset to load_example_dataset#1399
vedthebear wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
vedthebear:feat/aime2026-dataset

vedthebear commented May 17, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vedthebear commented May 17, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Implementation note

Verification

Add aime2026 dataset support to load_example_dataset

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vedthebear commented May 17, 2026 •

edited by macroscopeapp Bot

Loading

Add `aime2026` dataset support to `load_example_dataset`