Skip to content

Add aime2026 dataset to load_example_dataset#1399

Open
vedthebear wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
vedthebear:feat/aime2026-dataset
Open

Add aime2026 dataset to load_example_dataset#1399
vedthebear wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
vedthebear:feat/aime2026-dataset

Conversation

@vedthebear
Copy link
Copy Markdown

@vedthebear vedthebear commented May 17, 2026

What

Registers MathArena/aime_2026 (30 problems from the 2026 AIME I + II) in verifiers/utils/data_utils.py alongside the existing aime2024 / aime2025 entries.

Why

AIME 2026 was released after most current models' training cutoffs, which makes it a useful held-out math benchmark for evaluating top-tier models. Including it here means anyone running a math env via verifiers can grab it with the standard one-liner:

from verifiers.utils.data_utils import load_example_dataset
ds = load_example_dataset("aime2026")

How

Two small additions matching the surrounding pattern:

  1. get_preprocess_fn: new aime2026 branch returning {"question": x["problem"], "temp_answer": str(x["answer"])}.
  2. load_example_dataset: new aime2026 branch calling load_dataset("MathArena/aime_2026")["train"].

Implementation note

The source dataset's answer column is typed int64. The new preprocessor returns the renamed key temp_answer (which gets renamed back to answer by the existing hook at the end of load_example_dataset) so the int-to-string cast survives .map's type inference. Same workaround the existing mmlu preprocessor already uses for its int-typed answer column — no new mechanism introduced.

Verification

python -c "
from verifiers.utils.data_utils import load_example_dataset
ds = load_example_dataset('aime2026')
print(ds.features)  # {'question': Value('string'), 'answer': Value('string')}
print(ds[0])        # {'question': '...', 'answer': '277'}
print(len(ds))      # 30
"

Ruff passes on the modified file (uv run ruff check verifiers/utils/data_utils.py).


Note

Low Risk
Low risk: small, additive dataset registration and preprocessing changes with no impact to core evaluation logic beyond enabling a new dataset name.

Overview
Adds a new aime2026 option to load_example_dataset that loads MathArena/aime_2026 (default train) and wires it into get_preprocess_fn.

The new preprocessor maps problem to question and stringifies the int-typed answer via a temp_answer field that is later renamed back to answer to preserve the cast through .map.

Reviewed by Cursor Bugbot for commit 663e714. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add aime2026 dataset support to load_example_dataset

Adds the MathArena/aime_2026 dataset as a loadable option in data_utils.py. The preprocessor maps problem to question and casts the integer answer to a string via a temp_answer intermediate column, which is then renamed to answer post-map to avoid type conflicts during dataset mapping.

Macroscope summarized 663e714.

Registers MathArena/aime_2026 (30 problems, 2026 AIME I+II released after most
current model training cutoffs, making it useful as a held-out math benchmark
when running evaluations on top-of-the-line models).

Uses the existing ``temp_answer`` rename hook in ``load_example_dataset`` so
the int->str cast on the source dataset's ``answer`` column (typed int64)
survives ``.map``'s type inference. Same workaround the mmlu preprocessor
already uses for its int-typed answer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant