Skip to content

Some configs missing dataset.vocab_size in MockIterableDataset #1286

@torsli

Description

@torsli

Describe the bug

Some benchmarking configs, like https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/benchmark/configs/qwen3_moe_30b_te_deepep.yaml, use the MockIterableDataset but do not specify a vocab_size. This leads to TypeError: MockIterableDataset.__init__() missing 1 required positional argument: 'vocab_size'
.
Steps/Code to reproduce bug

Run a recipe with qwen3_moe_30b_te_deepep.yaml

Additional context

There are multiple configs missing dataset.vocab_size, haven't found them all.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions