Add batched streaming beam search for RNN-T (mALSD+mAES) and TDT (mALSD) by lilithgrigoryan · Pull Request #15753 · NVIDIA-NeMo/NeMo

lilithgrigoryan · 2026-06-04T15:57:47Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Adds streaming-aware variants of the MALSD (malsd_batch, batched + CUDA-graphs) and MAES (maes_batch, torch eager) beam-search decoders for both RNN-T and TDT.

Word boosting and n-gram LM fusion are both supported.

Follow-up PRs (planned, separate):

Wire the new decoders into the cache-aware streaming inference scripts in examples/asr/asr_chunked_inference/rnnt/.
Add buffered RNN-T support to the same streaming inference scripts.

Internal WB test set (word boosting) — α=1.3 for both greedy and beam.
Model: nvidia/stt_en_fastconformer_transducer_large

strategy	beam	fusion	WER	WERR vs greedy
greedy	–	–	31.91%	—
greedy	–	WB	27.75%	+13.03%
malsd	4	WB	25.01%	+21.62%
malsd	8	WB	24.08%	+24.54%

SLURP test set (n-gram LM fusion) — α=0.8 greedy, α=0.4 beam.
Model: nvidia/stt_en_fastconformer_tdt_large

strategy	beam	fusion	WER	WERR vs greedy
greedy	–	–	23.68%	—
greedy	–	LM	22.46%	+5.15%
malsd	4	LM	19.69%	+16.85%
malsd	8	LM	19.26%	+18.66%

Headline: streaming MALSD + fusion gives +18–25% relative WER reduction vs streaming greedy on both datasets, with the gain growing monotonically with beam size.

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

Example

Run streaming beam search on your own data:

python examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py \
    pretrained_name=nvidia/stt_en_fastconformer_transducer_large \  # or nvidia/stt_en_fastconformer_tdt_large
    dataset_manifest=/path/to/your_test_manifest.json \
    output_filename=preds.jsonl \
    left_context_secs=10 chunk_secs=2 right_context_secs=2 \
    batch_size=256 \
    decoding.strategy=malsd_batch \  # or maes_batch for RNN-T
    decoding.beam.beam_size=4 \
    decoding.beam.allow_cuda_graphs=true

Usage with word-boosting:

python examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py \
    pretrained_name=nvidia/stt_en_fastconformer_transducer_large \  # or nvidia/stt_en_fastconformer_tdt_large
    dataset_manifest=/path/to/your_test_manifest.json \
    output_filename=preds.jsonl \
    left_context_secs=10 chunk_secs=2 right_context_secs=2 \
    batch_size=256 \
    decoding.strategy=malsd_batch \  # or maes_batch for RNN-T
    decoding.beam.beam_size=4 \
    decoding.beam.allow_cuda_graphs=true \
    decoding.beam.boosting_tree.key_phrases_file=/path/to/key_phrases.txt \
    decoding.beam.boosting_tree_alpha=1.3

Usage with LM fusion:

python examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py \
    pretrained_name=nvidia/stt_en_fastconformer_transducer_large \  # or nvidia/stt_en_fastconformer_tdt_large
    dataset_manifest=/path/to/your_test_manifest.json \
    output_filename=preds.jsonl \
    left_context_secs=10 chunk_secs=2 right_context_secs=2 \
    batch_size=256 \
    decoding.strategy=malsd_batch \  # or maes_batch for RNN-T
    decoding.beam.beam_size=4 \
    decoding.beam.allow_cuda_graphs=true \
    decoding.beam.ngram_lm_model=/path/to/your.kenlm.nemo \
    decoding.beam.ngram_lm_alpha=0.4

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

…/streaming-beam-search

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

…/streaming-beam-search

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

github-actions · 2026-06-04T19:27:13Z

[🤖]: Hi @lilithgrigoryan 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

artbataev · 2026-06-05T11:05:28Z

/claude review

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

github-actions · 2026-06-05T18:50:27Z

[🤖]: Hi @lilithgrigoryan 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

lilithgrigoryan added 4 commits June 4, 2026 15:32

add sreaming beam searched with tests

3c15e01

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

clean up

6380ada

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

fix kenlm tests

1ba5286

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

clean up

0362829

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

github-actions Bot added core Changes to NeMo Core ASR labels Jun 4, 2026

copy-pr-bot Bot temporarily deployed to public June 4, 2026 15:58 Inactive

copy-pr-bot Bot temporarily deployed to test June 4, 2026 15:59 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 16:02 Inactive

lilithgrigoryan added 3 commits June 4, 2026 20:18

clean up

9cce90e

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

Merge branch 'main' of https://github.com/NVIDIA/NeMo into lgrigoryan…

aa34fb3

…/streaming-beam-search

clean up refactor cudagraphs, parity with greedy

12ff20a

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

github-actions Bot removed the core Changes to NeMo Core label Jun 4, 2026

copy-pr-bot Bot temporarily deployed to public June 4, 2026 17:47 Inactive

copy-pr-bot Bot had a problem deploying to test June 4, 2026 17:48 Error

github-advanced-security AI found potential problems Jun 4, 2026

View reviewed changes

Comment thread examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py Fixed

copy-pr-bot Bot temporarily deployed to public June 4, 2026 17:51 Inactive

copy-pr-bot Bot had a problem deploying to public June 4, 2026 17:51 Error

clean up tests

273b2eb

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 4, 2026 17:53 Inactive

copy-pr-bot Bot had a problem deploying to test June 4, 2026 17:54 Error

Merge branch 'main' of https://github.com/NVIDIA/NeMo into lgrigoryan…

b6f96fe

…/streaming-beam-search

copy-pr-bot Bot temporarily deployed to public June 4, 2026 17:56 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 17:57 Inactive

clean up

a5871b0

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 4, 2026 18:01 Inactive

copy-pr-bot Bot had a problem deploying to test June 4, 2026 18:03 Error

copy-pr-bot Bot temporarily deployed to test June 4, 2026 18:16 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 18:17 Inactive

lilithgrigoryan added the Run CICD label Jun 4, 2026

claude Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py Outdated

claude Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread nemo/collections/asr/parts/submodules/tdt_malsd_batched_computer.py Outdated

lilithgrigoryan added 2 commits June 5, 2026 19:04

minor fix

c0eece3

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

minor fix in comments

26718b4

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 5, 2026 15:20 Inactive

copy-pr-bot Bot temporarily deployed to test June 5, 2026 15:22 Inactive

copy-pr-bot Bot temporarily deployed to public June 5, 2026 15:24 Inactive

artbataev reviewed Jun 5, 2026

View reviewed changes

Comment thread nemo/collections/asr/parts/utils/streaming_utils.py Outdated

artbataev reviewed Jun 5, 2026

View reviewed changes

Comment thread nemo/collections/asr/parts/submodules/rnnt_maes_batched_computer.py Outdated

lilithgrigoryan added 2 commits June 6, 2026 15:35

revert contextsize changes

c30b56a

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

rm alignments from returns

5cccfcb

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 6, 2026 15:18 Inactive

copy-pr-bot Bot had a problem deploying to test June 6, 2026 15:19 Error

reverted completely

0dc4e93

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 6, 2026 15:23 Inactive

fix circular import

9fda2a5

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 6, 2026 15:29 Inactive

github-advanced-security AI found potential problems Jun 6, 2026

View reviewed changes

Comment thread examples/asr/asr_chunked_inference/rnnt/speech_to_text_streaming_infer_rnnt.py Fixed

Comment thread nemo/collections/asr/parts/utils/batched_beam_decoding_utils.py Fixed

copy-pr-bot Bot temporarily deployed to public June 6, 2026 15:33 Inactive

clean up

1e86cc3

Signed-off-by: lilithgrigoryan <lgrigoryan@nvidia.com>

copy-pr-bot Bot temporarily deployed to public June 6, 2026 15:37 Inactive

copy-pr-bot Bot temporarily deployed to test June 6, 2026 15:39 Inactive

copy-pr-bot Bot temporarily deployed to public June 6, 2026 15:41 Inactive

lilithgrigoryan requested a review from artbataev June 6, 2026 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batched streaming beam search for RNN-T (mALSD+mAES) and TDT (mALSD)#15753

Add batched streaming beam search for RNN-T (mALSD+mAES) and TDT (mALSD)#15753
lilithgrigoryan wants to merge 18 commits into
mainfrom
lgrigoryan/streaming-beam-search

lilithgrigoryan commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

artbataev commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lilithgrigoryan commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

Example

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

artbataev commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lilithgrigoryan commented Jun 4, 2026 •

edited

Loading