-
Notifications
You must be signed in to change notification settings - Fork 183
[WIP] feat: Opus decoder (SILK mode) #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev-0.6
Are you sure you want to change the base?
[WIP] feat: Opus decoder (SILK mode) #398
Conversation
They are annoying in some cases. Fixes pdeljanov#304, pdeljanov#305.
Implement a `duration` method for the `FrameSize` struct to convert its value into a `Duration`. Update the test cases to include checks for this new method.
hcsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all thanks for the work! A pure Rust Opus decoder would be very neat.
I have only done a very light and partial reading of the spec so far, so I can't yet review the code for its compliance yet, but I've had a rough look at the Rust level with mostly some nits for the doc comments. Sorry if those have turned out a bit large in number, but I hope they're at least helpful.
One thing to note is that you'd probably want to include a more explicit statement about the license of the IETF RFC 6176 snippets you've included verbatim in the README.md and perhaps with SPDX snippets (see https://reuse.software/faq/#partial-license) for machine-readability (while a bit tedious, these could also help delineate between what you have authored and what is a direct quote). AFAICT they fall under https://trustee.ietf.org/documents/trust-legal-provisions/tlp-5/ section 3.c "Licenses to IETF Documents and IETF Contributions" > "Licenses For Use Outside the IETF Standards Process". Sections 3.c.iii (x) and (y) require attribution and license information for substantial reproductions.
| /// Contents | ||
| /// | ||
| /// Codebooks: | ||
| /// These are predefined sets of vectors used for quantizing various parameters like LSFs, | ||
| /// pitch contours, and LTP filters. The encoder selects the closest vector from the codebook | ||
| /// to represent the parameter, reducing the amount of data needed to transmit. | ||
| /// | ||
| /// Probability Distribution Functions (PDFs) and Cumulative Distribution Functions (CDFs): | ||
| /// PDFs define the probability of each possible symbol (like a codebook index) occurring. | ||
| /// CDFs are used in entropy coding (e.g., range coding) to efficiently encode symbols based on their probabilities. | ||
| /// | ||
| /// LSF (Line Spectral Frequencies): | ||
| /// LSFs are used to represent the spectral envelope of the audio signal. | ||
| /// They are quantized using codebooks to ensure efficient transmission and reconstruction. | ||
| /// | ||
| /// VAD (Voice Activity Detection): | ||
| /// Determines whether a frame contains speech or is silent/inactive, influencing how the frame is encoded. | ||
| /// | ||
| /// LTP (Long-Term Prediction): | ||
| /// Utilizes past audio samples to predict future ones, | ||
| /// improving compression efficiency, especially in voiced speech. | ||
| /// | ||
| /// Pitch Contour: | ||
| /// Represents the pitch variations over time, crucial for natural-sounding speech synthesis. | ||
| /// | ||
| /// Excitation Coding: | ||
| /// Deals with representing the excitation signal, which, when combined with the filter, | ||
| /// recreates the original audio signal. | ||
| /// | ||
| /// Rate Level: | ||
| /// Determines the bit rate used for encoding, balancing audio quality against bandwidth usage. | ||
| /// | ||
| /// | ||
| /// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs nit: Here and in a bunch of other places you have what seem to be module level doc comments attached to a specific item inside the module /// rather than the module //!
| /// Table 17: Codebook Selection for NB/MB Normalized LSF Stage-2 Index | ||
| /// Decoding | ||
| ///``` | ||
| /// https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs nit: For URLs to be linked in the generated docs, you have to surround them with angle brackets (or use the [text](url) syntax). This the case for all links to the RFC in doc comments.
| /// https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2 | |
| /// <https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2> |
| /// +----+---------------------+ | ||
| /// Table 17: Codebook Selection for NB/MB Normalized LSF Stage-2 Index | ||
| /// Decoding | ||
| ///``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs nit: doc comments should consistently have a single space following the ///. The closing backticks for code blocks in a bunch of places are missing this.
| /// Probability Distribution Functions (PDFs) and Cumulative Distribution Functions (CDFs): | ||
| /// PDFs define the probability of each possible symbol (like a codebook index) occurring. | ||
| /// CDFs are used in entropy coding (e.g., range coding) to efficiently encode symbols based on their probabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs nit: perhaps add explicitly that exclusively inverse CDFs are used for recovering symbols in the decoder (and not PDFs like they are listed in the RFC, tough the CDFs may be derived from them) and that they are encoded as integer values as a fraction of the total, with the total always being the first item in your arrays. Linking to decode_symbol_with_icdf and/or Section 4.1.3.3 would also be nice.
This was something I stumbled over a bit while reviewing, trying to make the numbers from the PDFs match the values in the tables. (though admittedly, the names of the constants start with ICDF)
| pub const CODEBOOK_NORMALIZED_LSF_STAGE_TWO_INDEX_NARROWBAND_OR_MEDIUMBAND: [&[u32]; 32] = [ | ||
| &[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], | ||
| &[1, 3, 1, 2, 2, 1, 2, 1, 1, 1], | ||
| &[2, 1, 1, 1, 1, 1, 1, 1, 1, 1], | ||
| &[1, 2, 2, 2, 2, 1, 2, 1, 1, 1], | ||
| &[2, 3, 3, 3, 3, 2, 2, 2, 2, 2], | ||
| &[0, 5, 3, 3, 2, 2, 2, 2, 1, 1], | ||
| &[0, 2, 2, 2, 2, 2, 2, 2, 2, 1], | ||
| &[2, 3, 6, 4, 4, 4, 5, 4, 5, 5], | ||
| &[2, 4, 5, 5, 4, 5, 4, 6, 4, 4], | ||
| &[2, 4, 4, 7, 4, 5, 4, 5, 5, 4], | ||
| &[4, 3, 3, 3, 2, 3, 2, 2, 2, 2], | ||
| &[1, 5, 5, 6, 4, 5, 4, 5, 5, 5], | ||
| &[2, 7, 4, 6, 5, 5, 5, 5, 5, 5], | ||
| &[2, 7, 5, 5, 5, 5, 5, 6, 5, 4], | ||
| &[3, 3, 5, 4, 4, 5, 4, 5, 4, 4], | ||
| &[2, 3, 3, 5, 5, 4, 4, 4, 4, 4], | ||
| &[2, 4, 4, 6, 4, 5, 4, 5, 5, 5], | ||
| &[2, 5, 4, 6, 5, 5, 5, 4, 5, 4], | ||
| &[2, 7, 4, 5, 4, 5, 4, 5, 5, 5], | ||
| &[2, 5, 4, 6, 7, 6, 5, 6, 5, 4], | ||
| &[3, 6, 7, 4, 6, 5, 5, 6, 4, 5], | ||
| &[2, 7, 6, 4, 4, 4, 5, 4, 5, 5], | ||
| &[4, 5, 5, 4, 6, 6, 5, 6, 5, 4], | ||
| &[2, 5, 5, 6, 5, 6, 4, 6, 4, 4], | ||
| &[4, 5, 5, 5, 3, 7, 4, 5, 5, 4], | ||
| &[2, 3, 4, 5, 5, 6, 4, 5, 5, 4], | ||
| &[2, 3, 2, 3, 3, 4, 2, 3, 3, 3], | ||
| &[1, 1, 2, 2, 2, 2, 2, 3, 2, 2], | ||
| &[4, 5, 5, 6, 6, 6, 5, 6, 4, 5], | ||
| &[3, 5, 5, 4, 4, 4, 4, 3, 3, 2], | ||
| &[2, 5, 3, 7, 5, 5, 4, 4, 5, 4], | ||
| &[4, 4, 5, 4, 5, 6, 5, 6, 5, 4], | ||
| ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs nit: it could be nice to also reference the name of the equivalent / names of related constants/structs/functions in the normative C code, since that is the one source of truth for what is correct. Here for example silk_NLSF_CB2_SELECT_NB_MB and silk_NLSF_unpack (as far as I can tell). Though in this case the data is stored in a significantly different manner in the C code (requiring unpacking first). In general data would hopefully be more easily comparable.
| config: u8, | ||
| pub stereo: bool, | ||
| pub frame_count: FrameCount, | ||
| pub audio_mode: AudioMode, | ||
| pub bandwidth: Bandwidth, | ||
| pub frame_size: FrameDuration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config being private while audio_mode, bandwidth, and frame_size are public could be a potential hazard. If the latter were written to, the type would be in an inconsistent state and as_byte would not return the TOC byte that matches expectations. Perhaps use accessor functions and private fields, or drop the fields and perform the decoding in accessor functions.
| pub enum Bandwidth { | ||
| NarrowBand = 8_000, | ||
| MediumBand = 12_000, | ||
| WideBand = 16_000, | ||
| SuperWideBand = 24_000, | ||
| #[default] | ||
| FullBand = 48_000, | ||
| } | ||
|
|
||
| /// Effective sample rate for a given bandwidth, Hz. | ||
| impl Bandwidth { | ||
| pub fn sample_rate(&self) -> u32 { | ||
| return match self { | ||
| Bandwidth::NarrowBand => 8_000, | ||
| Bandwidth::MediumBand => 12_000, | ||
| Bandwidth::WideBand => 16_000, | ||
| Bandwidth::SuperWideBand => 24_000, | ||
| Bandwidth::FullBand => 48_000, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the FrameDuration, you could use #[repr(u32)] and avoid duplicating the sample rate values this way. Alternatively, you could just not specify any discriminants and only keep the sample rates in the sample_rate function.
| impl FrameDuration { // at 48kHz | ||
| pub fn sample_count(&self) -> usize { | ||
| return match self { | ||
| FrameDuration::Ms2_5 => 120, | ||
| FrameDuration::Ms5 => 240, | ||
| FrameDuration::Ms10 => 480, | ||
| FrameDuration::Ms20 => 960, | ||
| FrameDuration::Ms40 => 1_920, | ||
| FrameDuration::Ms60 => 2_880, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just call the function sample_count_48khz and drop the comment. That way it is also obvious at the usage sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file looks like it was accidentally included, it should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While a CHANGELOG.md file can be very nice to have, this file seems to be generated by an editor plugin with not so helpful content (and URLs with a not yet reserved/allocated TLD). This file should probably just be removed again for now.
This PR adds initial Opus codec support to Symphonia. The current implementation focuses on the SILK decoder path and core Opus infrastructure.
Core Components
OpusDecoderinterface implementingAudioDecoderandRegisterableAudioDecodertraitsTechnical Details
Implementation Status
symphonia-coredev-0.6 APINext Steps
This PR is a functional prototype for Opus SILK streams. CELT and hybrid mode support will be added in future updates. Feedback and early review are welcome as development continues.