Update README.md of svc (#2)

d9043cd about 1 year ago

3.98 kB

	---
	license: mit
	---
	# Amphion Singing Voice Conversion Pretrained Models

	## Quick Start

	We provide a [DiffWaveNetSVC](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC) pretrained checkpoint for you to play. Specially, it is trained under the real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers:

	\| Singer \| Language \| Training Duration (mins) \|
	\| :-----------------: \| :------: \| :----------------------: \|
	\| David Tao 陶喆 \| Chinese \| 45.51 \|
	\| Eason Chan 陈奕迅 \| Chinese \| 43.36 \|
	\| Feng Wang 汪峰 \| Chinese \| 41.08 \|
	\| Jian Li 李健 \| Chinese \| 38.90 \|
	\| John Mayer \| English \| 30.83 \|
	\| Adele \| English \| 27.23 \|
	\| Ying Na 那英 \| Chinese \| 27.02 \|
	\| Yijie Shi 石倚洁 \| Chinese \| 24.93 \|
	\| Jacky Cheung 张学友 \| Chinese \| 18.31 \|
	\| Taylor Swift \| English \| 18.31 \|
	\| Faye Wong 王菲 \| English \| 16.78 \|
	\| Michael Jackson \| English \| 15.13 \|
	\| Tsai Chin 蔡琴 \| Chinese \| 10.12 \|
	\| Bruno Mars \| English \| 6.29 \|
	\| Beyonce \| English \| 6.06 \|

	To make these singers sing the songs you want to listen to, just run the following commands:

	### Step1: Download the acoustics model checkpoint
	```bash
	git lfs install
	git clone https://huggingface.co/amphion/singing_voice_conversion
	```

	### Step2: Download the vocoder checkpoint
	```bash
	git clone https://huggingface.co/amphion/BigVGAN_singing_bigdata
	```

	### Step3: Clone the Amphion's Source Code of GitHub
	```bash
	git clone https://github.com/open-mmlab/Amphion.git
	```

	### Step4: Download ContentVec Checkpoint
	You could download ContentVec Checkpoint from [this repo](https://github.com/auspicious3000/contentvec). In this pretrained model, we used `checkpoint_best_legacy_500.pt`, which is the legacy ContentVec with 500 classes.

	### Step5: Specify the checkpoints' path
	Use the soft link to specify the downloaded checkpoints:

	```bash
	cd Amphion
	mkdir -p ckpts/svc
	ln -s "$(realpath ../singing_voice_conversion/vocalist_l1_contentvec+whisper)" ckpts/svc/vocalist_l1_contentvec+whisper
	ln -s "$(realpath ../BigVGAN_singing_bigdata/bigvgan_singing)" pretrained/bigvgan_singing
	```

	Also, you need to move `checkpoint_best_legacy_500.pt` you downloaded at Step4 into `Amphion/pretrained/contentvec`.

	### Step6: Conversion

	You can follow [this recipe](https://github.com/open-mmlab/Amphion/tree/main/egs/svc/MultipleContentsSVC#4-inferenceconversion) to conduct the conversion. For example, if you want to make Taylor Swift sing the songs in the `[Your Audios Folder]`, just run:

	```bash
	sh egs/svc/MultipleContentsSVC/run.sh --stage 3 --gpu "0" \
	--config "ckpts/svc/vocalist_l1_contentvec+whisper/args.json" \
	--infer_expt_dir "ckpts/svc/vocalist_l1_contentvec+whisper" \
	--infer_output_dir "ckpts/svc/vocalist_l1_contentvec+whisper/result" \
	--infer_source_audio_dir [Your Audios Folder] \
	--infer_vocoder_dir "pretrained/bigvgan_singing" \
	--infer_target_speaker "vocalist_l1_TaylorSwift" \
	--infer_key_shift "autoshift"
	```

	Note: The supported `infer_target_speaker` values can be seen [here](https://huggingface.co/amphion/singing_voice_conversion/blob/main/vocalist_l1_contentvec%2Bwhisper/singers.json).

	## Citaions

	```bibtex
	@article{zhang2023leveraging,
	title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion},
	author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng},
	journal={Machine Learning for Audio Worshop, NeurIPS 2023},
	year={2023}
	}
	```