Add WAV/MP3 input with automatic 48 kHz resampling#15
Draft
Add WAV/MP3 input with automatic 48 kHz resampling#15
Conversation
…ols and cover example Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add support for WAV and MP3 input conversion to 48kHz
Add WAV/MP3 input with automatic 48 kHz resampling
Mar 7, 2026
Updated the cover script to remove LLM step and clarify audio processing.
Updated the cover JSON to reflect new attributes and changes.
resync with master
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
--src-audio(cover mode) andneural-codec --encodepaths only accepted WAV at exactly 48 kHz. This adds transparent WAV + MP3 support at any sample rate, auto-resampled to the 48 kHz the VAE encoder requires — noffmpegpre-conversion needed.New:
src/audio.hSingle header providing
read_audio(path, T_audio, n_channels):.mp3→ dr_mp3, anything else → dr_wavaudio_resample_linear) is channel-agnostic; only runs whensr ≠ 48000malloc'd interleaved float[T × n_ch]; caller freesNew:
thirdparty/dr_wav.hv0.14.5 — WAV decode (public domain / MIT-0, mackron/dr_libs)dr_mp3.hv0.7.3 — MP3 decode via minimp3 (public domain / MIT-0)Zero new link-time dependencies — both are single-header, included once per translation unit via
#define DR_*_IMPLEMENTATIONinsideaudio.h.Tool changes
neural-codec.cpp: encode path switchesread_wav()→read_audio(); emits a warning if the decoded channel count ≠ 2dit-vae.cpp:--src-audioswitches toread_audio(); same channel-count warning; help text updatedCMakeLists.txt:thirdparty/added asSYSTEMinclude in the sharedlink_ggml_backendsmacro (vendor warnings suppressed)Example
New example
examples/cover.sh+examples/cover.json— demonstrates cover-mode generation from a WAV or MP3 reference track with inline usage notes.🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.