Skip to content

Add WAV/MP3 input with automatic 48 kHz resampling#15

Draft
Copilot wants to merge 6 commits intomasterfrom
copilot/add-wav-mp3-conversion
Draft

Add WAV/MP3 input with automatic 48 kHz resampling#15
Copilot wants to merge 6 commits intomasterfrom
copilot/add-wav-mp3-conversion

Conversation

Copy link

Copilot AI commented Mar 7, 2026

The --src-audio (cover mode) and neural-codec --encode paths only accepted WAV at exactly 48 kHz. This adds transparent WAV + MP3 support at any sample rate, auto-resampled to the 48 kHz the VAE encoder requires — no ffmpeg pre-conversion needed.

New: src/audio.h

Single header providing read_audio(path, T_audio, n_channels):

  • Format detected by extension: .mp3 → dr_mp3, anything else → dr_wav
  • Channel layout preserved as-is from the source file — no up/down-mix
  • Linear resampler (audio_resample_linear) is channel-agnostic; only runs when sr ≠ 48000
  • Returns malloc'd interleaved float [T × n_ch]; caller frees

New: thirdparty/

  • dr_wav.h v0.14.5 — WAV decode (public domain / MIT-0, mackron/dr_libs)
  • dr_mp3.h v0.7.3 — MP3 decode via minimp3 (public domain / MIT-0)

Zero new link-time dependencies — both are single-header, included once per translation unit via #define DR_*_IMPLEMENTATION inside audio.h.

Tool changes

  • neural-codec.cpp: encode path switches read_wav()read_audio(); emits a warning if the decoded channel count ≠ 2
  • dit-vae.cpp: --src-audio switches to read_audio(); same channel-count warning; help text updated
  • CMakeLists.txt: thirdparty/ added as SYSTEM include in the shared link_ggml_backends macro (vendor warnings suppressed)

Example

# Before: required exact 48 kHz stereo WAV, manual ffmpeg conversion otherwise
# After: any of these just work
./dit-vae --src-audio reference.mp3   ...
./dit-vae --src-audio reference.wav   ...  # any sample rate
./neural-codec --vae vae.gguf --encode -i song.mp3 -o song.latent

New example

examples/cover.sh + examples/cover.json — demonstrates cover-mode generation from a WAV or MP3 reference track with inline usage notes.


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI and others added 2 commits March 7, 2026 19:02
…ols and cover example

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for WAV and MP3 input conversion to 48kHz Add WAV/MP3 input with automatic 48 kHz resampling Mar 7, 2026
lmangani added 3 commits March 7, 2026 20:08
Updated the cover script to remove LLM step and clarify audio processing.
Updated the cover JSON to reflect new attributes and changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants