Code to train a custom time-domain autoencoder to dereverb audio
-
Updated
Nov 30, 2023 - Python
Code to train a custom time-domain autoencoder to dereverb audio
Dual-model speech AI toolkit for speaker verification and speaker-aware diarization, with streaming inference, meeting analysis, long-audio monitoring, and speaker-bank integration.
Real-time speech enhancement pipeline — custom-trained U-Net denoising model, ONNX inference, Overlap-Add synthesis, and virtual audio routing for Teams, Zoom, and DAW use. CPU-only, no cloud dependency.
A custom MCP server that separates a YouTube track into stems (vocals, drums, bass) and extracts a sonic signature: BPM, musical key, stereo width, transient punch, and a 512-dim CLAP semantic embedding. Runs locally on CPU via Demucs and librosa.
Engine identification using acoustic signal analysis and machine learning to classify 8 vehicle types. Audio signals are processed using FFT and feature extraction, and a multi-class model predicts vehicle categories based on their unique sound patterns.
AI-generated audio summarisation pipeline — Whisper transcription, LLM key-insight extraction, and structured spoken summaries with TTS playback and Streamlit interface.
ML-based speech emotion recognition system that analyzes audio features to classify emotions with a simple interface for testing.
Machine learning system for music genre classification using feature engineering, stratified evaluation, SVC/XGBoost modeling, and reproducible prediction export.
Neural TTS and voice-cloning application using XTTS/VITS. Supports 3–30 s reference audio for speaker adaptation, real-time pitch/speed control, and WAV/MP3 export.
Automated audio/video ML pipeline for detecting and transcribing jazz solos from live recordings. Runs nightly against Smalls Jazz Club archives: uses CLAP (instrument detection), Demucs (source separation), CLIP (performer identification), and basic-pitch (MIDI transcription). Results served via REST API.
Key Features: Simple VAE architecture with encoder/decoder Synthetic music data generation for training Interactive training with progress tracking Music generation from latent space sampling Audio conversion and playback Downloadable audio files
Audio file processing pipeline with GPT-4-powered error diagnosis — detects codec issues, sample rate mismatches, and corruption artefacts with automated remediation suggestions.
Music harmony AI — chord progression analysis with Roman numeral labelling, voice leading checker, style-conditioned progression generation (Baroque/Jazz/Pop), and MIDI export via music21.
Audio analysis in javascript/typescript
Add a description, image, and links to the audio-ml topic page so that developers can more easily learn about it.
To associate your repository with the audio-ml topic, visit your repo's landing page and select "manage topics."