Skip to content

Commit 7c4b017

Browse files
authored
Adds descriptions of audio quality metrics (#119)
Signed-off-by: Simon Zuberek <szuberek@nvidia.com>
1 parent bca5d1f commit 7c4b017

1 file changed

Lines changed: 12 additions & 1 deletion

File tree

sdp/processors/tts/metrics.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,18 @@ class TorchSquimObjectiveQualityMetricsProcessor(BaseProcessor):
3131
"""This processor calculates Squim quality metrics for audio files.
3232
3333
It uses a pre-trained Squim model to calculate audio quality metrics like PESQ, STOI
34-
and SI-SDR for each audio segment in the manifest.
34+
and SI-SDR for each audio segment in the manifest:
35+
36+
PESQ (Perceptual Evaluation of Speech Quality)
37+
A measure of overall quality for speech (originally designed to detect codec distortions but highly correlated to all kinds of distortion.
38+
39+
STOI (Short-Time Objective Intelligibility)
40+
A measure of speech intelligibility, basically measures speech envelope integrity.
41+
A STOI value of 1.0 means 100% of the speech being evaluated is intelligible on average.
42+
43+
SI-SDR (Scale-Invariant Signal-to-Distortion Ratio)
44+
A measure of how strong the speech signal is vs. all the distortion present in the audio, in decibels.
45+
0 dB means the energies of speech and distortion are the same. A value between 15-20 dB is what is considered "clean enough" speech in general.
3546
3647
Args:
3748
device (str, Optional): Device to run the model on. Defaults to "cuda".

0 commit comments

Comments
 (0)