[WIP] feat: Opus decoder (SILK mode) #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

jiztastamablastamarang wants to merge 55 commits into pdeljanov:dev-0.6 from jiztastamablastamarang:opus-decoder

jiztastamablastamarang commented Jul 16, 2025

This PR adds initial Opus codec support to Symphonia. The current implementation focuses on the SILK decoder path and core Opus infrastructure.

Core Components

Complete SILK decoder with frame decoding, LPC synthesis, and sample reconstruction
Range decoder for Opus bitstream entropy parsing
Opus packet parser supporting standard frame formats
Header and TOC (Table of Contents) parsing for all Opus frame types
Main OpusDecoder interface implementing AudioDecoder and RegisterableAudioDecoder traits

Technical Details

SILK decoder path includes:
- LPC coefficient extraction
- LSF interpolation and conversion
- Frame header parsing with VAD flag detection
- Sample reconstruction using LTP and LPC filters
- Error concealment logic
- Fixed-point arithmetic conversions from the reference implementation
Range decoder supports both regular and folded entropy coding
Frame type detection for all standard Opus modes
Support for all required Opus bitstream elements

Implementation Status

Supported: SILK-only mode decoding (functional for basic Opus SILK streams)
Stubbed: CELT and hybrid mode decoders (not yet implemented)
All code compiles and integrates with symphonia-core dev-0.6 API

Next Steps

Implement CELT decoder
Add hybrid mode support
Expand test coverage
Optimize for stereo mode

This PR is a functional prototype for Opus SILK streams. CELT and hybrid mode support will be added in future updates. Feedback and early review are welcome as development continues.

a1phyr and others added 30 commits

July 14, 2025 08:21


          chore: Fix macOS CI and some clippy checks. (pdeljanov#320)

6bf7fb0


          chore: Lower log level for some logs.

e8e8ba4

They are annoying in some cases.

Fixes pdeljanov#304, pdeljanov#305.


          chore: Fix inaccurate feature flag instructions in docs (pdeljanov#312)

2658fef


          Basic project setup

21dd235


          Module tree draft

564cdd4


          Extend project structure

2366b62


          Draft range decoder

2be841e


          Failing draft

0422f75


          Upd structure

7231b7a


          Range decoder with BitsReaderRtl

a767149


          Copycat Range decoder with passing tests

dd8019f


          Handle end of the bitstream gracefully

f3a2510


          Refactor range decoder test

ae1fa18


          Identification header impl

32f86a5


          Refactor naming in range decoder

a3eb52e


          Switch from Read to Readbytes train in header.rs

65fd18f


          Figuring out that header.rs is overkill

3815cba


          Refactor range, remove update method

2a72902


          Toc impl draft

44fb007


          Toc tests

60a5e02


          Refactor Toc tests by adding check pattern and macros test for mapping

eb44ccd


          Refactor TOC tests adding test table


          Frame packing draft

8a02a2e


          Tests for Frame packing with failing padding

9b5ccee


          Add test for toc, ensuring everything is ok

729e149


          Fin tests for packet

a82681b


          Minor refactoring

eacaad6


          Refactor carcass of opus decoder

38453eb


          Refine silk decoder api

8de29d5


          Restructured silk creating separate mod

7d67e05

jiztastamablastamarang added 25 commits

July 14, 2025 08:21


          Silk Frame draft

129c38c


          Improving decoder api

5d4c6ed


          Tables

8727cb5


          icdf tables

760e414


          Refactor codebook and icdf

956cccc


          Minor refactoring

a55622d


          Keep improving silk decoder design

c4101b6


          Add duration method to FrameSize and update tests

d97a035

Implement a `duration` method for the `FrameSize` struct to convert its value into a `Duration`. Update the test cases to include checks for this new method.


          Add sample_rate for bandwidth

e7f0557


          Raw silk decoder impl

0ea640e


          Silk decoder excitaion fix

4eb4932


          Remove redundant methods from silk decoder

31f44ba


          Refining silk decoder

a1dadfb


          Minor refactoring

4883bd0


          Remove unused module

ab9d7b5


          Add lpc lsf

dafb443


          Refactor lfc

bf33ee8


          Module docs

0ae84d9


          Upd docs

4b61c5e


          Upd readme

06ef5d0


          Remove unused deps & switch to LazyLock


          Refactor opus packet

4d75d6a


          Refactor toc & packet

164680b


          Update before presentation

eab094a


          Upd Opus decoder to dev-0.6 API

f15e19d

hasezoey mentioned this pull request

I getting "unsupported codec" for one webm file, but it works for another. #307

Open

roderickvd mentioned this pull request

add wav output method which takes writer RustAudio/rodio#790

Merged

hcsch reviewed

View reviewed changes

hcsch left a comment

First of all thanks for the work! A pure Rust Opus decoder would be very neat.

I have only done a very light and partial reading of the spec so far, so I can't yet review the code for its compliance yet, but I've had a rough look at the Rust level with mostly some nits for the doc comments. Sorry if those have turned out a bit large in number, but I hope they're at least helpful.

One thing to note is that you'd probably want to include a more explicit statement about the license of the IETF RFC 6176 snippets you've included verbatim in the README.md and perhaps with SPDX snippets (see https://reuse.software/faq/#partial-license) for machine-readability (while a bit tedious, these could also help delineate between what you have authored and what is a direct quote). AFAICT they fall under https://trustee.ietf.org/documents/trust-legal-provisions/tlp-5/ section 3.c "Licenses to IETF Documents and IETF Contributions" > "Licenses For Use Outside the IETF Standards Process". Sections 3.c.iii (x) and (y) require attribution and license information for substantial reproductions.

symphonia-codec-opus/src/silk/constant.rs

Comment on lines +1 to +34

    
              ///  Contents

              ///

              /// Codebooks:

              /// These are predefined sets of vectors used for quantizing various parameters like LSFs, 

              /// pitch contours, and LTP filters. The encoder selects the closest vector from the codebook 

              /// to represent the parameter, reducing the amount of data needed to transmit.

              ///

              /// Probability Distribution Functions (PDFs) and Cumulative Distribution Functions (CDFs):

              /// PDFs define the probability of each possible symbol (like a codebook index) occurring.

              /// CDFs are used in entropy coding (e.g., range coding) to efficiently encode symbols based on their probabilities.

              ///

              /// LSF (Line Spectral Frequencies):

              /// LSFs are used to represent the spectral envelope of the audio signal. 

              /// They are quantized using codebooks to ensure efficient transmission and reconstruction.

              ///

              /// VAD (Voice Activity Detection):

              /// Determines whether a frame contains speech or is silent/inactive, influencing how the frame is encoded.

              ///

              /// LTP (Long-Term Prediction):

              /// Utilizes past audio samples to predict future ones, 

              /// improving compression efficiency, especially in voiced speech.

              ///

              /// Pitch Contour:

              /// Represents the pitch variations over time, crucial for natural-sounding speech synthesis.

              ///

              /// Excitation Coding:

              /// Deals with representing the excitation signal, which, when combined with the filter,

              /// recreates the original audio signal.

              ///

              /// Rate Level:

              /// Determines the bit rate used for encoding, balancing audio quality against bandwidth usage.

              ///

              ///

              ///

hcsch Oct 6, 2025

docs nit: Here and in a bunch of other places you have what seem to be module level doc comments attached to a specific item inside the module /// rather than the module //!

symphonia-codec-opus/src/silk/constant.rs

    
              ///   Table 17: Codebook Selection for NB/MB Normalized LSF Stage-2 Index

              ///             Decoding

              ///```

              /// https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2

hcsch Oct 6, 2025

docs nit: For URLs to be linked in the generated docs, you have to surround them with angle brackets (or use the [text](url) syntax). This the case for all links to the RFC in doc comments.

Suggested change

      
            /// https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2
          
            /// <https://datatracker.ietf.org/doc/html/rfc6716#section-4.2.7.5.2>

symphonia-codec-opus/src/silk/constant.rs

    
              ///   +----+---------------------+

              ///   Table 17: Codebook Selection for NB/MB Normalized LSF Stage-2 Index

              ///             Decoding

              ///```

hcsch Oct 6, 2025

docs nit: doc comments should consistently have a single space following the ///. The closing backticks for code blocks in a bunch of places are missing this.

symphonia-codec-opus/src/silk/constant.rs

Comment on lines +8 to +10

    
              /// Probability Distribution Functions (PDFs) and Cumulative Distribution Functions (CDFs):

              /// PDFs define the probability of each possible symbol (like a codebook index) occurring.

              /// CDFs are used in entropy coding (e.g., range coding) to efficiently encode symbols based on their probabilities.

hcsch Oct 6, 2025

docs nit: perhaps add explicitly that exclusively inverse CDFs are used for recovering symbols in the decoder (and not PDFs like they are listed in the RFC, tough the CDFs may be derived from them) and that they are encoded as integer values as a fraction of the total, with the total always being the first item in your arrays. Linking to decode_symbol_with_icdf and/or Section 4.1.3.3 would also be nice.

This was something I stumbled over a bit while reviewing, trying to make the numbers from the PDFs match the values in the tables. (though admittedly, the names of the constants start with ICDF)

symphonia-codec-opus/src/silk/constant.rs

Comment on lines +110 to +143

    
              pub const CODEBOOK_NORMALIZED_LSF_STAGE_TWO_INDEX_NARROWBAND_OR_MEDIUMBAND: [&[u32]; 32] = [

                  &[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],

                  &[1, 3, 1, 2, 2, 1, 2, 1, 1, 1],

                  &[2, 1, 1, 1, 1, 1, 1, 1, 1, 1],

                  &[1, 2, 2, 2, 2, 1, 2, 1, 1, 1],

                  &[2, 3, 3, 3, 3, 2, 2, 2, 2, 2],

                  &[0, 5, 3, 3, 2, 2, 2, 2, 1, 1],

                  &[0, 2, 2, 2, 2, 2, 2, 2, 2, 1],

                  &[2, 3, 6, 4, 4, 4, 5, 4, 5, 5],

                  &[2, 4, 5, 5, 4, 5, 4, 6, 4, 4],

                  &[2, 4, 4, 7, 4, 5, 4, 5, 5, 4],

                  &[4, 3, 3, 3, 2, 3, 2, 2, 2, 2],

                  &[1, 5, 5, 6, 4, 5, 4, 5, 5, 5],

                  &[2, 7, 4, 6, 5, 5, 5, 5, 5, 5],

                  &[2, 7, 5, 5, 5, 5, 5, 6, 5, 4],

                  &[3, 3, 5, 4, 4, 5, 4, 5, 4, 4],

                  &[2, 3, 3, 5, 5, 4, 4, 4, 4, 4],

                  &[2, 4, 4, 6, 4, 5, 4, 5, 5, 5],

                  &[2, 5, 4, 6, 5, 5, 5, 4, 5, 4],

                  &[2, 7, 4, 5, 4, 5, 4, 5, 5, 5],

                  &[2, 5, 4, 6, 7, 6, 5, 6, 5, 4],

                  &[3, 6, 7, 4, 6, 5, 5, 6, 4, 5],

                  &[2, 7, 6, 4, 4, 4, 5, 4, 5, 5],

                  &[4, 5, 5, 4, 6, 6, 5, 6, 5, 4],

                  &[2, 5, 5, 6, 5, 6, 4, 6, 4, 4],

                  &[4, 5, 5, 5, 3, 7, 4, 5, 5, 4],

                  &[2, 3, 4, 5, 5, 6, 4, 5, 5, 4],

                  &[2, 3, 2, 3, 3, 4, 2, 3, 3, 3],

                  &[1, 1, 2, 2, 2, 2, 2, 3, 2, 2],

                  &[4, 5, 5, 6, 6, 6, 5, 6, 4, 5],

                  &[3, 5, 5, 4, 4, 4, 4, 3, 3, 2],

                  &[2, 5, 3, 7, 5, 5, 4, 4, 5, 4],

                  &[4, 4, 5, 4, 5, 6, 5, 6, 5, 4],

              ];

hcsch Oct 6, 2025

docs nit: it could be nice to also reference the name of the equivalent / names of related constants/structs/functions in the normative C code, since that is the one source of truth for what is correct. Here for example silk_NLSF_CB2_SELECT_NB_MB and silk_NLSF_unpack (as far as I can tell). Though in this case the data is stored in a significantly different manner in the C code (requiring unpacking first). In general data would hopefully be more easily comparable.

symphonia-codec-opus/src/toc.rs

Comment on lines +47 to +52

+                  config: u8,
+                  pub stereo: bool,
+                  pub frame_count: FrameCount,
+                  pub audio_mode: AudioMode,
+                  pub bandwidth: Bandwidth,
+                  pub frame_size: FrameDuration,

hcsch Oct 6, 2025

config being private while audio_mode, bandwidth, and frame_size are public could be a potential hazard. If the latter were written to, the type would be in an inconsistent state and as_byte would not return the TOC byte that matches expectations. Perhaps use accessor functions and private fields, or drop the fields and perform the decoding in accessor functions.

symphonia-codec-opus/src/toc.rs

Comment on lines +167 to +187

+              pub enum Bandwidth {
+                  NarrowBand = 8_000,
+                  MediumBand = 12_000,
+                  WideBand = 16_000,
+                  SuperWideBand = 24_000,
+                  #[default]
+                  FullBand = 48_000,
+              }
+              /// Effective sample rate for a given bandwidth, Hz.
+              impl Bandwidth {
+                  pub fn sample_rate(&self) -> u32 {
+                      return match self {
+                          Bandwidth::NarrowBand => 8_000,
+                          Bandwidth::MediumBand => 12_000,
+                          Bandwidth::WideBand => 16_000,
+                          Bandwidth::SuperWideBand => 24_000,
+                          Bandwidth::FullBand => 48_000,
+                      };
+                  }
+              }

hcsch Oct 6, 2025

Similar to the FrameDuration, you could use #[repr(u32)] and avoid duplicating the sample rate values this way. Alternatively, you could just not specify any discriminants and only keep the sample rates in the sample_rate function.

symphonia-codec-opus/src/toc.rs

Comment on lines +236 to +247

+              impl FrameDuration { // at 48kHz
+                  pub fn sample_count(&self) -> usize {
+                      return match self {
+                          FrameDuration::Ms2_5 => 120,
+                          FrameDuration::Ms5 => 240,
+                          FrameDuration::Ms10 => 480,
+                          FrameDuration::Ms20 => 960,
+                          FrameDuration::Ms40 => 1_920,
+                          FrameDuration::Ms60 => 2_880,
+                      };
+                  }
+              }

hcsch Oct 6, 2025

Perhaps just call the function sample_count_48khz and drop the comment. That way it is also obvious at the usage sites.

help.txt

hcsch Oct 6, 2025

This file looks like it was accidentally included, it should be removed

CHANGELOG.md

hcsch Oct 6, 2025

While a CHANGELOG.md file can be very nice to have, this file seems to be generated by an editor plugin with not so helpful content (and URLs with a not yet reserved/allocated TLD). This file should probably just be removed again for now.

UnknownSuperficialNight mentioned this pull request

Symphonia custom registry support RustAudio/rodio#806

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet