3D-Corpus is a feature extraction and processing pipeline for audio datasets, designed for visualization in 3D space.
- Audio file loading and preprocessing
- Onset detection-based audio segmentation
- MFCC, Spectral Centroid, and Chroma feature extraction
- GPU acceleration support (Apple Silicon MPS backend and CUDA)
- Asynchronous processing and optimized batch processing
- Python 3.10 or higher
- Apple Silicon Mac (M1/M2/M3) or CUDA-compatible GPU
- 32GB RAM recommended
- Clone the repository:
git clone https://github.com/unohee/3d-corpus.git
cd 3d-corpus- Create and activate a virtual environment:
python -m venv research_env
source research_env/bin/activate # macOS/Linux- Install required packages:
pip install -r requirements.txtpython featureExtractor.py ./path/to/audio/folderpython featureExtractor_torch.py ./path/to/audio/folderTo use the text-based user interface for selecting datasets:
python curses_interface.pyYou can also run the pipeline with various options:
python run.py ./path/to/audio/folder [options]Available options:
--download-only: Only download the FSD50K dataset without extracting features--no-onset: Disable onset detection and extract features for entire audio files--save-splits: Save onset-split audio files to disk--output-dir DIR: Specify the directory to save split audio files (default: splitted_files)
-
MFCC (Mel-Frequency Cepstral Coefficients)
- 13 MFCC coefficients
- 40 mel filter banks
- 256 frame size, 256 hop length
-
Spectral Centroid
- Center frequency of the spectrum
- 256 frame size, 256 hop length
-
Chroma Features
- 12 semitone bins
- 256 frame size, 256 hop length
- Asynchronous I/O processing
- Batch processing optimization
- GPU memory management
- Transformer caching
- Vectorized operations
- Multi-processing and multi-threading
The codebase is organized into several well-documented Python modules:
featureExtractor_torch.py: GPU-accelerated feature extraction with PyTorchfeatureExtractor.py: CPU-based feature extraction with librosacurses_interface.py: Text-based user interface for dataset selectionrun.py: Command-line interface with FSD50K dataset download capabilities
All functions include detailed docstrings with parameter descriptions and return value information.
dataset/
├── [dataset_name].pkl # Original audio buffers
└── [dataset_name]_features.pkl # Extracted features
The extracted features are normalized to consistent 1D array formats:
- MFCC: Multi-dimensional to 1D array (dimension reduction)
- Spectral Centroid: Array to scalar value
- Chroma: Multi-dimensional to 1D array
- FSD50K dataset: https://zenodo.org/record/4060432
MIT License
- librosa: https://librosa.org/
- torchaudio: https://pytorch.org/audio/
- PyTorch: https://pytorch.org/