Model

This is a finetuned version of the Spanish version of Massively Multilingual Speech (MMS) models, which are light-weight, low-latency TTS models based on the VITS architecture.

It was trained in around 20 minutes with as little as 80 to 150 samples, on this Argentinian Spanish dataset.

Training recipe available in this github repository: ylacombe/finetune-hf-vits.

Usage

Transformers

from transformers import pipeline
import scipy

model_id = "ylacombe/mms-spa-finetuned-argentinian-monospeaker"
synthesiser = pipeline("text-to-speech", model_id) # add device=0 if you want to use a GPU

speech = synthesiser("Hola, ¿cómo estás hoy?")

scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"])

Transformers.js

If you haven't already, you can install the Transformers.js JavaScript library from NPM using:

npm i @xenova/transformers

Example: Generate Spanish speech with ylacombe/mms-spa-finetuned-argentinian-monospeaker.

import { pipeline } from '@xenova/transformers';

// Create a text-to-speech pipeline
const synthesizer = await pipeline('text-to-speech', 'ylacombe/mms-spa-finetuned-argentinian-monospeaker', {
    quantized: false, // Remove this line to use the quantized version (default)
});

// Generate speech
const output = await synthesizer('Hola, ¿cómo estás hoy?');
console.log(output);
// {
//   audio: Float32Array(69888) [ ... ],
//   sampling_rate: 16000
// }

Optionally, save the audio to a wav file (Node.js):

import wavefile from 'wavefile';
import fs from 'fs';

const wav = new wavefile.WaveFile();
wav.fromScratch(1, output.sampling_rate, '32f', output.audio);
fs.writeFileSync('out.wav', wav.toBuffer());

Downloads last month
65
Safetensors
Model size
83M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ylacombe/mms-spa-finetuned-argentinian-monospeaker

Spaces using ylacombe/mms-spa-finetuned-argentinian-monospeaker 3