ⓍTTS 🇦🇷

ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours.

This model was trained by IdeaLab in CITECCA, in the Universidad Nacional de Rio Negro

Language

This model's Spanish language has been finetuned using ylacombe's google argentinian spanish dataset to archieve an argentinian accent.

Training Parameters

batch_size=8,
grad_accum_steps=96,
batch_group_size=48,
eval_batch_size=8,
num_loader_workers=8,
eval_split_max_size=256,
optimizer="AdamW",
optimizer_wd_only_on_weights=True,
optimizer_params={"betas": [0.9, 0.96], "eps": 1e-8, "weight_decay": 1e-2},
lr=5e-06, 
lr_scheduler="MultiStepLR",
lr_scheduler_params={"milestones": [50000 * 18, 150000 * 18, 300000 * 18], "gamma": 0.5, "last_epoch": -1},

License

This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here.

Using 🐸TTS Command line:

 tts --model_name /path/to/xtts/ \
     --text "Che boludo, vamos a tomar unos mates." \
     --speaker_wav /path/to/target/speaker.wav \
     --language_idx es \
     --use_cuda true

Using the model directly:

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("/path/to/xtts/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True)
model.cuda()

outputs = model.synthesize(
    "Che boludo, vamos a tomar unos mates.",
    config,
    speaker_wav="/data/TTS-public/_refclips/3.wav",
    gpt_cond_len=3,
    language="es",
)
Downloads last month
23
Inference Examples
Inference API (serverless) does not yet support coqui models for this pipeline type.

Dataset used to train marianbasti/XTTS-v2-argentinian-spanish