fix: add missing docstrings to data pipeline utility functions by stanley1208 · Pull Request #15560 · NVIDIA-NeMo/NeMo

stanley1208 · 2026-03-29T05:36:33Z

What does this PR do?

Add missing docstrings to 4 data pipeline utility functions in audio_to_text_dataset.py, per Contributor guidelines which state:

"Include docstrings for every class and method exposed to the user"

These functions are critical parts of the ASR data pipeline (dataset creation, bucketing, code-switching) and were missing documentation.

Functions documented:

get_code_switched_dataset — Creates multilingual code-switched datasets
convert_to_config_list — Normalizes manifest paths into ListConfig format for bucketing
get_chain_dataset — Chains bucketed datasets with configurable bucketing strategy
calc_bucketing_batch_sizes — Calculates adaptive batch sizes for bucketed training

Collection: ASR

Changelog

Add docstrings to 4 undocumented data pipeline functions in audio_to_text_dataset.py

Before your PR is "Ready for review"

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests? (N/A — docstrings only)
Did you add or update any necessary documentation?

PR Type:

Documentation

Signed-off-by: stanley1208 <stanley.mei08@gmail.com> Made-with: Cursor

stanley1208 · 2026-03-29T05:37:06Z

@nithinraok Ready for review — adds missing docstrings to 4 key data pipeline functions per CONTRIBUTING.md guidelines. Thanks!

fix: add missing docstrings to data pipeline utility functions

bbe1693

Signed-off-by: stanley1208 <stanley.mei08@gmail.com> Made-with: Cursor

github-actions bot added the ASR label Mar 29, 2026

github-actions bot added the community-request label Mar 29, 2026

chtruong814 added the needs-follow-up Issue needs follow-up label Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add missing docstrings to data pipeline utility functions#15560

fix: add missing docstrings to data pipeline utility functions#15560
stanley1208 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
stanley1208:fix/add-docstrings-audio-dataset-utils

stanley1208 commented Mar 29, 2026

Uh oh!

stanley1208 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stanley1208 commented Mar 29, 2026

What does this PR do?

Changelog

Before your PR is "Ready for review"

Uh oh!

stanley1208 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants