--- license: mit language: - en base_model: - coqui/XTTS-v2 --- # Fine-Tuned Xtts Model This project fine-tunes a TTS (Text-to-Speech) model using an mp3 file extracted from a YouTube video. The training was conducted on a Hugging Face Space running locally via Docker. A GPU is recommended for faster training. ### Training Data - **Source Video**: [YouTube Video](https://www.youtube.com/watch?v=u6J20_Aem3Y) - **Training Audio**: The mp3 file used for training is included in the `files` directory. ### dockerimage Fine tuned with this docker image [FineTune Xtts Docker image](https://hub.docker.com/r/athomasson2/fine_tune_xtts) ### Notes - Ensure you have a GPU available for optimal performance during training. - The Docker image pulls the latest version each time it's run. This model is based on xtts v2 which cannot be used commercially as per the [xtts license which is in a limbo state](https://github.com/coqui-ai/TTS/issues/3490)