Conversation
Signed-off-by: Ssofja <sofiakostandian@gmail.com>
|
|
||
| 10) Cleanup step. Compute full batch WER and log. Concatenate loss list and pass to PTL to compute the equivalent of the original (full batch) Joint step. Delete ancillary objects necessary for sub-batching. | ||
|
|
||
| Transducer Decoding |
There was a problem hiding this comment.
Note to self and other reviewers - decoding docs are now placed in Inference and ASR Language Modeling and Customization
|
|
||
| Refer to the :ref:`Audio Augmentors <asr-api-audio-augmentors>` API section for more details. | ||
|
|
||
| Tokenizer Configurations |
There was a problem hiding this comment.
We need to add one more code block: an example of AggretatedTokenizer
|
|
||
| .. _asr-configs-augmentation-configurations: | ||
|
|
||
| Augmentation Configurations |
There was a problem hiding this comment.
I feel we should keep the SpecAugment part of this section.
|
|
||
| .. _asr-configs-preprocessor-configuration: | ||
|
|
||
| Preprocessor Configuration |
There was a problem hiding this comment.
I think this should be kept
There was a problem hiding this comment.
yeah, users are normally confused by this portion so would need more documentation - if anything.
| use_cer: false | ||
| log_prediction: true | ||
|
|
||
| BLEU Score |
There was a problem hiding this comment.
I would revert the compaction of this section - I think it's pretty recent and describes various config tweaks introduced by @bonham79
There was a problem hiding this comment.
yeah this is deleting a lot of things that are hidden in the code and some improved user functionality. without this you're basically just forcing dependence on torchmetric documentation - and that ain't pretty.
|
/claude review |
| * `CTC Fine-tuning README <https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/asr_finetune>`_ | ||
| * `Transducer Fine-tuning README <https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/asr_finetune>`_ |
There was a problem hiding this comment.
Both links point to the exact same URL (examples/asr/conf/asr_finetune). The Transducer link should presumably point to a different location (e.g., examples/asr/asr_transducer or examples/asr/conf/asr_finetune with an anchor for transducer-specific instructions). As-is, labeling two identical URLs as "CTC" and "Transducer" is misleading.
|
Overall this is a clean docs refactor. One issue found:
Minor note: |
| .. list-table:: | ||
| :header-rows: 1 | ||
|
|
||
| * - Model |
There was a problem hiding this comment.
iirc some of these didn't really prioritize PnC no?
| * - `nemotron-speech-streaming-en-0.6b <https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b>`__ | ||
| - Hybrid | ||
| - ASR, streaming | ||
| - en |
There was a problem hiding this comment.
It may be more economical to just list the architecture and configure a list of supported language models, or maybe a matrix?
| * - `stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc <https://huggingface.co/nvidia/stt_ka_fastconformer_hybrid_transducer_ctc_large_streaming_80ms_pc>`__ | ||
| - Hybrid | ||
| - ASR, PnC, streaming | ||
| - ka |
There was a problem hiding this comment.
Yeah on Piotr's above point, few know the georgian language code off hand.
| .. list-table:: | ||
| :header-rows: 1 | ||
|
|
||
| * - Model |
There was a problem hiding this comment.
I'd move all fastconformers underneath parakeet. This'll just lead to confusion.
There was a problem hiding this comment.
I think it's OK, the concept here is that fastconformer are the older models and parakeet are the newer models.
There was a problem hiding this comment.
ehhh, i think our branding efforts are causing confusion, especially now Nemotron Speech is a thing. In the technical docs there should be clear understanding that these are the same architectures. The naming aspect can be left up to marketing but for devs it should be clear that fastcomformer and parakeet are largely equivalent.
| use_cer: false | ||
| log_prediction: true | ||
|
|
||
| BLEU Score |
There was a problem hiding this comment.
yeah this is deleting a lot of things that are hidden in the code and some improved user functionality. without this you're basically just forcing dependence on torchmetric documentation - and that ain't pretty.
| 2. **Use Lhotse dataloading** for efficient training with dynamic batching. See :doc:`Lhotse Dataloading </dataloaders>`. | ||
| 3. **Monitor validation WER** closely — fine-tuning can overfit quickly on small datasets. | ||
| 4. **Use spec augmentation** during fine-tuning to improve robustness. | ||
| 5. **For multilingual fine-tuning**, consider using ``AggregateTokenizer`` and the Hybrid model with prompt conditioning. |
| 1. **Start with a low learning rate** — fine-tuning with too high a learning rate can destroy pretrained features. | ||
| 2. **Use Lhotse dataloading** for efficient training with dynamic batching. See :doc:`Lhotse Dataloading </dataloaders>`. | ||
| 3. **Monitor validation WER** closely — fine-tuning can overfit quickly on small datasets. | ||
| 4. **Use spec augmentation** during fine-tuning to improve robustness. |
|
|
||
| .. code-block:: python | ||
|
|
||
| config = model.get_transcribe_config() |
There was a problem hiding this comment.
give example transcribe config. this is a more obfuscated aspect of transcription in the codebase
| @@ -1,17 +1,9 @@ | |||
| Models | |||
There was a problem hiding this comment.
move parakeet before canary - more successful so people will be hunting for it
|
|
||
| .. _Conformer-HAT_model: | ||
|
|
||
| Conformer-HAT |
There was a problem hiding this comment.
can we keep these on a legacy model page?
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Signed-off-by: Ssofja <78349198+Ssofja@users.noreply.github.com>
What does this PR do
This PR reperesents the ASR collections' full refactoring
Collection: [docs]
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type: