metadata
license: cc-by-nc-4.0
pipeline_tag: audio-to-audio
library_name: f5-tts
extra_gated_prompt: >-
You agree to not use the model to generate, share, or promote content that is
illegal, harmful, deceptive, or intended to impersonate real individuals
without their informed consent.
extra_gated_fields:
Affiliation: text
Country: country
I agree to use this model for non-commercial use ONLY: checkbox
EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion
Our paper has been accepted to the Findings of EMNLP 2025!
Installation
Create a separate environment if needed
# Create a python 3.10 conda env (you could also use virtualenv)
conda create -n ez-vc python=3.10
conda activate ez-vc
Local installation
git clone https://github.com/EZ-VC/EZ-VC
cd EZ-VC
git submodule update --init --recursive
pip install -e .
# Install espnet for xeus (Exactly this version)
pip install 'espnet @ git+https://github.com/wanchichen/espnet.git@ssl'
Inference
We have provided a Jupyter notebook for inference in "src/f5_tts/infer/infer.ipynb".
Open Inference notebook.
Run all.
The converted audio will be available at the last cell.
Acknowledgements
- F5-TTS for opensourcing their code which has made EZ-VC possible.
Citation
If our work and codebase is useful for you, please cite as:
@misc{joglekar2025ezvceasyzeroshotanytoany,
title={EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion},
author={Advait Joglekar and Divyanshu Singh and Rooshil Rohit Bhatia and S. Umesh},
year={2025},
eprint={2505.16691},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2505.16691},
}
License
Our code is released under MIT License. The pre-trained models are licensed under the CC-BY-NC license. Sorry for any inconvenience this may cause.