Skip to content

Conversation

@jiztastamablastamarang
Copy link

This PR adds initial Opus codec support to Symphonia. The current implementation focuses on the SILK decoder path and core Opus infrastructure.

Core Components

  • Complete SILK decoder with frame decoding, LPC synthesis, and sample reconstruction
  • Range decoder for Opus bitstream entropy parsing
  • Opus packet parser supporting standard frame formats
  • Header and TOC (Table of Contents) parsing for all Opus frame types
  • Main OpusDecoder interface implementing AudioDecoder and RegisterableAudioDecoder traits

Technical Details

  • SILK decoder path includes:
    • LPC coefficient extraction
    • LSF interpolation and conversion
    • Frame header parsing with VAD flag detection
    • Sample reconstruction using LTP and LPC filters
    • Error concealment logic
    • Fixed-point arithmetic conversions from the reference implementation
  • Range decoder supports both regular and folded entropy coding
  • Frame type detection for all standard Opus modes
  • Support for all required Opus bitstream elements

Implementation Status

  • Supported: SILK-only mode decoding (functional for basic Opus SILK streams)
  • Stubbed: CELT and hybrid mode decoders (not yet implemented)
  • All code compiles and integrates with symphonia-core dev-0.6 API

Next Steps

  • Implement CELT decoder
  • Add hybrid mode support
  • Expand test coverage
  • Optimize for stereo mode

This PR is a functional prototype for Opus SILK streams. CELT and hybrid mode support will be added in future updates. Feedback and early review are welcome as development continues.

Copy link

@hcsch hcsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all thanks for the work! A pure Rust Opus decoder would be very neat.

I have only done a very light and partial reading of the spec so far, so I can't yet review the code for its compliance yet, but I've had a rough look at the Rust level with mostly some nits for the doc comments. Sorry if those have turned out a bit large in number, but I hope they're at least helpful.

One thing to note is that you'd probably want to include a more explicit statement about the license of the IETF RFC 6176 snippets you've included verbatim in the README.md and perhaps with SPDX snippets (see https://reuse.software/faq/#partial-license) for machine-readability (while a bit tedious, these could also help delineate between what you have authored and what is a direct quote). AFAICT they fall under https://trustee.ietf.org/documents/trust-legal-provisions/tlp-5/ section 3.c "Licenses to IETF Documents and IETF Contributions" > "Licenses For Use Outside the IETF Standards Process". Sections 3.c.iii (x) and (y) require attribution and license information for substantial reproductions.

Comment on lines +1 to +34
/// Contents
///
/// Codebooks:
/// These are predefined sets of vectors used for quantizing various parameters like LSFs,
/// pitch contours, and LTP filters. The encoder selects the closest vector from the codebook
/// to represent the parameter, reducing the amount of data needed to transmit.
///
/// Probability Distribution Functions (PDFs) and Cumulative Distribution Functions (CDFs):
/// PDFs define the probability of each possible symbol (like a codebook index) occurring.
/// CDFs are used in entropy coding (e.g., range coding) to efficiently encode symbols based on their probabilities.
///
/// LSF (Line Spectral Frequencies):
/// LSFs are used to represent the spectral envelope of the audio signal.
/// They are quantized using codebooks to ensure efficient transmission and reconstruction.
///
/// VAD (Voice Activity Detection):
/// Determines whether a frame contains speech or is silent/inactive, influencing how the frame is encoded.
///
/// LTP (Long-Term Prediction):
/// Utilizes past audio samples to predict future ones,
/// improving compression efficiency, especially in voiced speech.
///
/// Pitch Contour:
/// Represents the pitch variations over time, crucial for natural-sounding speech synthesis.
///
/// Excitation Coding:
/// Deals with representing the excitation signal, which, when combined with the filter,
/// recreates the original audio signal.
///
/// Rate Level:
/// Determines the bit rate used for encoding, balancing audio quality against bandwidth usage.
///
///
///
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs nit: Here and in a bunch of other places you have what seem to be module level doc comments attached to a specific item inside the module /// rather than the module //!

/// Table 17: Codebook Selection for NB/MB Normalized LSF Stage-2 Index
/// Decoding
///```
/// https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs nit: For URLs to be linked in the generated docs, you have to surround them with angle brackets (or use the [text](url) syntax). This the case for all links to the RFC in doc comments.

Suggested change
/// https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2
/// <https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2>

/// +----+---------------------+
/// Table 17: Codebook Selection for NB/MB Normalized LSF Stage-2 Index
/// Decoding
///```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs nit: doc comments should consistently have a single space following the ///. The closing backticks for code blocks in a bunch of places are missing this.

Comment on lines +8 to +10
/// Probability Distribution Functions (PDFs) and Cumulative Distribution Functions (CDFs):
/// PDFs define the probability of each possible symbol (like a codebook index) occurring.
/// CDFs are used in entropy coding (e.g., range coding) to efficiently encode symbols based on their probabilities.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs nit: perhaps add explicitly that exclusively inverse CDFs are used for recovering symbols in the decoder (and not PDFs like they are listed in the RFC, tough the CDFs may be derived from them) and that they are encoded as integer values as a fraction of the total, with the total always being the first item in your arrays. Linking to decode_symbol_with_icdf and/or Section 4.1.3.3 would also be nice.

This was something I stumbled over a bit while reviewing, trying to make the numbers from the PDFs match the values in the tables. (though admittedly, the names of the constants start with ICDF)

Comment on lines +110 to +143
pub const CODEBOOK_NORMALIZED_LSF_STAGE_TWO_INDEX_NARROWBAND_OR_MEDIUMBAND: [&[u32]; 32] = [
&[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
&[1, 3, 1, 2, 2, 1, 2, 1, 1, 1],
&[2, 1, 1, 1, 1, 1, 1, 1, 1, 1],
&[1, 2, 2, 2, 2, 1, 2, 1, 1, 1],
&[2, 3, 3, 3, 3, 2, 2, 2, 2, 2],
&[0, 5, 3, 3, 2, 2, 2, 2, 1, 1],
&[0, 2, 2, 2, 2, 2, 2, 2, 2, 1],
&[2, 3, 6, 4, 4, 4, 5, 4, 5, 5],
&[2, 4, 5, 5, 4, 5, 4, 6, 4, 4],
&[2, 4, 4, 7, 4, 5, 4, 5, 5, 4],
&[4, 3, 3, 3, 2, 3, 2, 2, 2, 2],
&[1, 5, 5, 6, 4, 5, 4, 5, 5, 5],
&[2, 7, 4, 6, 5, 5, 5, 5, 5, 5],
&[2, 7, 5, 5, 5, 5, 5, 6, 5, 4],
&[3, 3, 5, 4, 4, 5, 4, 5, 4, 4],
&[2, 3, 3, 5, 5, 4, 4, 4, 4, 4],
&[2, 4, 4, 6, 4, 5, 4, 5, 5, 5],
&[2, 5, 4, 6, 5, 5, 5, 4, 5, 4],
&[2, 7, 4, 5, 4, 5, 4, 5, 5, 5],
&[2, 5, 4, 6, 7, 6, 5, 6, 5, 4],
&[3, 6, 7, 4, 6, 5, 5, 6, 4, 5],
&[2, 7, 6, 4, 4, 4, 5, 4, 5, 5],
&[4, 5, 5, 4, 6, 6, 5, 6, 5, 4],
&[2, 5, 5, 6, 5, 6, 4, 6, 4, 4],
&[4, 5, 5, 5, 3, 7, 4, 5, 5, 4],
&[2, 3, 4, 5, 5, 6, 4, 5, 5, 4],
&[2, 3, 2, 3, 3, 4, 2, 3, 3, 3],
&[1, 1, 2, 2, 2, 2, 2, 3, 2, 2],
&[4, 5, 5, 6, 6, 6, 5, 6, 4, 5],
&[3, 5, 5, 4, 4, 4, 4, 3, 3, 2],
&[2, 5, 3, 7, 5, 5, 4, 4, 5, 4],
&[4, 4, 5, 4, 5, 6, 5, 6, 5, 4],
];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs nit: it could be nice to also reference the name of the equivalent / names of related constants/structs/functions in the normative C code, since that is the one source of truth for what is correct. Here for example silk_NLSF_CB2_SELECT_NB_MB and silk_NLSF_unpack (as far as I can tell). Though in this case the data is stored in a significantly different manner in the C code (requiring unpacking first). In general data would hopefully be more easily comparable.

Comment on lines +47 to +52
config: u8,
pub stereo: bool,
pub frame_count: FrameCount,
pub audio_mode: AudioMode,
pub bandwidth: Bandwidth,
pub frame_size: FrameDuration,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config being private while audio_mode, bandwidth, and frame_size are public could be a potential hazard. If the latter were written to, the type would be in an inconsistent state and as_byte would not return the TOC byte that matches expectations. Perhaps use accessor functions and private fields, or drop the fields and perform the decoding in accessor functions.

Comment on lines +167 to +187
pub enum Bandwidth {
NarrowBand = 8_000,
MediumBand = 12_000,
WideBand = 16_000,
SuperWideBand = 24_000,
#[default]
FullBand = 48_000,
}

/// Effective sample rate for a given bandwidth, Hz.
impl Bandwidth {
pub fn sample_rate(&self) -> u32 {
return match self {
Bandwidth::NarrowBand => 8_000,
Bandwidth::MediumBand => 12_000,
Bandwidth::WideBand => 16_000,
Bandwidth::SuperWideBand => 24_000,
Bandwidth::FullBand => 48_000,
};
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the FrameDuration, you could use #[repr(u32)] and avoid duplicating the sample rate values this way. Alternatively, you could just not specify any discriminants and only keep the sample rates in the sample_rate function.

Comment on lines +236 to +247
impl FrameDuration { // at 48kHz
pub fn sample_count(&self) -> usize {
return match self {
FrameDuration::Ms2_5 => 120,
FrameDuration::Ms5 => 240,
FrameDuration::Ms10 => 480,
FrameDuration::Ms20 => 960,
FrameDuration::Ms40 => 1_920,
FrameDuration::Ms60 => 2_880,
};
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps just call the function sample_count_48khz and drop the comment. That way it is also obvious at the usage sites.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks like it was accidentally included, it should be removed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While a CHANGELOG.md file can be very nice to have, this file seems to be generated by an editor plugin with not so helpful content (and URLs with a not yet reserved/allocated TLD). This file should probably just be removed again for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants