This is a repository prepared for fine-tuning Qwen2-Audio, supporting both GPU and NPU. It supports reading and writing data in ark and wav formats, and is compatible with DDP, DeepSpeed, and LoRA.
The following are the dependencies required to run this project:
numpy==1.26.0
torch==2.1.0
torch-npu==2.1.0.post10 (if using NPU)
torchaudio==2.1.0
torchvision==0.16.0
soundfile
transformers==4.46.3
hydra
OmegaConf
deepspeed
Please refer to the sample data path /data/aishell-1 to prepare your data.
multitask.jsonl
my_wav.scp
multiprompt.jsonl
Modify the following two variables in train.sh, or simply use the default provided data without preparation:
TRAIN_DATA_PATH
EVAL_DATA_PATH
Set the following necessary environment variables in train.sh:
LOCAL_DIR=
MODEL_PATH=
Set the required variables in config/config.py.
Run the following command to start training:
bash train.sh
bash infer.sh
- Data Path: Ensure that
train_data_pathandeval_data_pathpoint to the correct data directories. - Device Selection: Based on your hardware environment, set
device_typetonpuorcuda. If not using NPU, comment out allimport torch_npustatements. If you encounter any errors related to NPU, modify them tocuda. - Dependency Installation: Ensure all dependencies are correctly installed. If not installed, you can use the following command:
pip install numpy==1.26.0 torch==2.1.0 torch-npu==2.1.0.post10 torchaudio==2.1.0 torchvision==0.16.0