|
--- |
|
license: cc-by-nc-4.0 |
|
pipeline_tag: audio-to-audio |
|
library_name: f5-tts |
|
extra_gated_prompt: "You agree to not use the model to generate, share, or promote content that is illegal, harmful, deceptive, or intended to impersonate real individuals without their informed consent." |
|
extra_gated_fields: |
|
Affiliation: text |
|
Country: country |
|
I agree to use this model for non-commercial use ONLY: checkbox |
|
--- |
|
|
|
# EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion |
|
|
|
[](https://github.com/EZ-VC/EZ-VC) |
|
[](https://arxiv.org/abs/2505.16691) |
|
[](https://ez-vc.github.io/EZ-VC-Demo/) |
|
[](https://asr.iitm.ac.in/) |
|
<!-- <img src="https://github.com/user-attachments/assets/12d7749c-071a-427c-81bf-b87b91def670" alt="Watermark" style="width: 40px; height: auto"> --> |
|
|
|
|
|
### Our paper has been accepted to the Findings of EMNLP 2025! |
|
|
|
## Installation |
|
|
|
### Create a separate environment if needed |
|
|
|
```bash |
|
# Create a python 3.10 conda env (you could also use virtualenv) |
|
conda create -n ez-vc python=3.10 |
|
conda activate ez-vc |
|
``` |
|
|
|
### Local installation |
|
|
|
```bash |
|
git clone https://github.com/EZ-VC/EZ-VC |
|
cd EZ-VC |
|
git submodule update --init --recursive |
|
pip install -e . |
|
|
|
# Install espnet for xeus (Exactly this version) |
|
pip install 'espnet @ git+https://github.com/wanchichen/espnet.git@ssl' |
|
``` |
|
|
|
## Inference |
|
|
|
We have provided a Jupyter notebook for inference in "src/f5_tts/infer/infer.ipynb". |
|
|
|
Open [Inference notebook](https://github.com/EZ-VC/EZ-VC/blob/main/src/f5_tts/infer/infer.ipynb). |
|
|
|
Run all. |
|
|
|
The converted audio will be available at the last cell. |
|
|
|
|
|
## Acknowledgements |
|
|
|
- [F5-TTS](https://arxiv.org/abs/2410.06885) for opensourcing their code which has made EZ-VC possible. |
|
|
|
## Citation |
|
If our work and codebase is useful for you, please cite as: |
|
``` |
|
@misc{joglekar2025ezvceasyzeroshotanytoany, |
|
title={EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion}, |
|
author={Advait Joglekar and Divyanshu Singh and Rooshil Rohit Bhatia and S. Umesh}, |
|
year={2025}, |
|
eprint={2505.16691}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.SD}, |
|
url={https://arxiv.org/abs/2505.16691}, |
|
} |
|
``` |
|
## License |
|
|
|
Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license. Sorry for any inconvenience this may cause. |