Skip to content

fix: add missing docstrings to data pipeline utility functions#15560

Open
stanley1208 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
stanley1208:fix/add-docstrings-audio-dataset-utils
Open

fix: add missing docstrings to data pipeline utility functions#15560
stanley1208 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
stanley1208:fix/add-docstrings-audio-dataset-utils

Conversation

@stanley1208
Copy link
Copy Markdown
Contributor

What does this PR do?

Add missing docstrings to 4 data pipeline utility functions in audio_to_text_dataset.py, per Contributor guidelines which state:

"Include docstrings for every class and method exposed to the user"

These functions are critical parts of the ASR data pipeline (dataset creation, bucketing, code-switching) and were missing documentation.

Functions documented:

  • get_code_switched_dataset — Creates multilingual code-switched datasets
  • convert_to_config_list — Normalizes manifest paths into ListConfig format for bucketing
  • get_chain_dataset — Chains bucketed datasets with configurable bucketing strategy
  • calc_bucketing_batch_sizes — Calculates adaptive batch sizes for bucketed training

Collection: ASR

Changelog

  • Add docstrings to 4 undocumented data pipeline functions in audio_to_text_dataset.py

Before your PR is "Ready for review"

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests? (N/A — docstrings only)
  • Did you add or update any necessary documentation?

PR Type:

  • Documentation

Signed-off-by: stanley1208 <stanley.mei08@gmail.com>
Made-with: Cursor
@github-actions github-actions bot added the ASR label Mar 29, 2026
@stanley1208
Copy link
Copy Markdown
Contributor Author

@nithinraok Ready for review — adds missing docstrings to 4 key data pipeline functions per CONTRIBUTING.md guidelines. Thanks!

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants