|
--- |
|
license: mit |
|
--- |
|
# Amphion Singing Voice Conversion Pretrained Models |
|
|
|
## Quick Start |
|
|
|
We provide a [DiffWaveNetSVC](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC) pretrained checkpoint for you to play. Specially, it is trained under the real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers: |
|
|
|
| Singer | Language | Training Duration (mins) | |
|
| :-----------------: | :------: | :----------------------: | |
|
| David Tao 陶喆 | Chinese | 45.51 | |
|
| Eason Chan 陈奕迅 | Chinese | 43.36 | |
|
| Feng Wang 汪峰 | Chinese | 41.08 | |
|
| Jian Li 李健 | Chinese | 38.90 | |
|
| John Mayer | English | 30.83 | |
|
| Adele | English | 27.23 | |
|
| Ying Na 那英 | Chinese | 27.02 | |
|
| Yijie Shi 石倚洁 | Chinese | 24.93 | |
|
| Jacky Cheung 张学友 | Chinese | 18.31 | |
|
| Taylor Swift | English | 18.31 | |
|
| Faye Wong 王菲 | English | 16.78 | |
|
| Michael Jackson | English | 15.13 | |
|
| Tsai Chin 蔡琴 | Chinese | 10.12 | |
|
| Bruno Mars | English | 6.29 | |
|
| Beyonce | English | 6.06 | |
|
|
|
To make these singers sing the songs you want to listen to, just run the following commands: |
|
|
|
### Step1: Download the acoustics model checkpoint |
|
```bash |
|
git lfs install |
|
git clone https://huggingface.co/amphion/singing_voice_conversion |
|
``` |
|
|
|
### Step2: Download the vocoder checkpoint |
|
```bash |
|
git clone https://huggingface.co/amphion/BigVGAN_singing_bigdata |
|
``` |
|
|
|
### Step3: Clone the Amphion's Source Code of GitHub |
|
```bash |
|
git clone https://github.com/open-mmlab/Amphion.git |
|
``` |
|
|
|
### Step4: Download ContentVec Checkpoint |
|
You could download **ContentVec** Checkpoint from [this repo](https://github.com/auspicious3000/contentvec). In this pretrained model, we used `checkpoint_best_legacy_500.pt`, which is the legacy ContentVec with 500 classes. |
|
|
|
### Step5: Specify the checkpoints' path |
|
Use the soft link to specify the downloaded checkpoints: |
|
|
|
```bash |
|
cd Amphion |
|
mkdir -p ckpts/svc |
|
ln -s "$(realpath ../singing_voice_conversion/vocalist_l1_contentvec+whisper)" ckpts/svc/vocalist_l1_contentvec+whisper |
|
ln -s "$(realpath ../BigVGAN_singing_bigdata/bigvgan_singing)" pretrained/bigvgan_singing |
|
``` |
|
|
|
Also, you need to move `checkpoint_best_legacy_500.pt` you downloaded at **Step4** into `Amphion/pretrained/contentvec`. |
|
|
|
### Step6: Conversion |
|
|
|
You can follow [this recipe](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC#4-inferenceconversion) to conduct the conversion. For example, if you want to make Taylor Swift sing the songs in the `[Your Audios Folder]`, just run: |
|
|
|
```bash |
|
sh egs/svc/MultipleContentsSVC/run.sh --stage 3 --gpu "0" \ |
|
--config "ckpts/svc/vocalist_l1_contentvec+whisper/args.json" \ |
|
--infer_expt_dir "ckpts/svc/vocalist_l1_contentvec+whisper" \ |
|
--infer_output_dir "ckpts/svc/vocalist_l1_contentvec+whisper/result" \ |
|
--infer_source_audio_dir [Your Audios Folder] \ |
|
--infer_vocoder_dir "pretrained/bigvgan_singing" \ |
|
--infer_target_speaker "vocalist_l1_TaylorSwift" \ |
|
--infer_key_shift "autoshift" |
|
``` |
|
|
|
**Note**: The supported `infer_target_speaker` values can be seen [here](https://huggingface.co/amphion/singing_voice_conversion/blob/main/vocalist_l1_contentvec%2Bwhisper/singers.json). |
|
|
|
## Citaions |
|
|
|
```bibtex |
|
@article{zhang2023leveraging, |
|
title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion}, |
|
author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng}, |
|
journal={Machine Learning for Audio Worshop, NeurIPS 2023}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|