Add configurable train_on_eos for conversation data preparation by jlamypoirier · Pull Request #535 · ServiceNow/Fast-LLM

jlamypoirier · 2026-06-05T19:00:24Z

Authored by Claude Opus 4.8 (with @jlamypoirier).

Splits out — as a standalone, opt-in flag — the loss-masking change that was bundled into #473.

What

Add train_on_eos: bool = False to ConversationSourceConfig. It controls whether the end-of-sequence token that tokenize_chat appends after the final message is included in the training loss:

false (default): the appended EOS is masked from the loss — unchanged behavior.
true: the appended EOS becomes a training target.

Threaded through Tokenizer.tokenize_chat as a train_on_eos parameter. Per-turn terminators emitted by the chat template (e.g. ChatML <|im_end|>) are unaffected — they remain governed by the template's {% generation %} markers. This flag only touches the single sequence-terminating EOS that the tokenizer appends when no EOS already appears in the conversation.

Why

Masking the terminal EOS means the model never gets a loss signal to emit end-of-sequence — a well-known cause of models that don't stop generating at inference. Training on the final/assistant EOS is the common recommendation, and frameworks expose it as a knob (Axolotl train_on_eos: turn|all|last; TRL assistant_only_loss, which includes the assistant turn's EOS). It's also an asymmetry within Fast-LLM today: the document path already trains on its appended EOS (unmasked tokens all contribute to the loss), while the conversation path masks it. This flag lets conversation prep opt into the same behavior.

Kept off by default so existing datasets' loss masking is unchanged.

Testing

tests/data/test_tokenizer.py (incl. the new test_tokenize_chat_train_on_eos) and test_preparator.py pass (41). The new test asserts that train_on_eos changes only the appended EOS's loss mask, not the tokens.

Note

Touches the same tokenize_chat call site as #534 (no-BOS prep); whichever merges first, the other needs a one-line rebase.

🤖 Generated with Claude Code

Add a `train_on_eos` flag (default `False`) to `ConversationSourceConfig` controlling whether the end-of-sequence token appended after the final message is included in the training loss. When disabled (the default, unchanged behavior) that token is masked from the loss; when enabled it becomes a training target. Threaded through `tokenize_chat` as a `train_on_eos` parameter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jlamypoirier · 2026-06-05T22:30:09Z

Claude Opus 4.8 note: Folded into #534 along with the no-BOS data-prep changes (one PR instead of two, per maintainer preference). The train_on_eos flag is unchanged. Closing in favor of #534.

jlamypoirier mentioned this pull request Jun 5, 2026

Data preparation: no-BOS tokenizer support and configurable train_on_eos #534

Merged

jlamypoirier closed this Jun 5, 2026

jlamypoirier deleted the jlp_train-on-eos branch June 5, 2026 22:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurable train_on_eos for conversation data preparation#535

Add configurable train_on_eos for conversation data preparation#535
jlamypoirier wants to merge 1 commit into
mainfrom
jlp_train-on-eos

jlamypoirier commented Jun 5, 2026

Uh oh!

jlamypoirier commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant