VITS model Text to Speech Russian

The text accepts lowercase

Example Text to Speech

from transformers import VitsModel, AutoTokenizer
import torch
import scipy

model = VitsModel.from_pretrained("joefox/tts_vits_ru_hf")
tokenizer = AutoTokenizer.from_pretrained("joefox/tts_vits_ru_hf")

text = "Привет, как дел+а? Всё +очень хорош+о! А у тебя как?"
text = text.lower()
inputs = tokenizer(text, return_tensors="pt")
inputs['speaker_id'] = 3

with torch.no_grad():
    output = model(**inputs).waveform
    
scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output[0].cpu().numpy())

For displayed in a Jupyter Notebook / Google Colab:

from IPython.display import Audio

Audio(output, rate=model.config.sampling_rate)

Languages covered

Russian (ru_RU)

Downloads last month
510
Safetensors
Model size
15.1M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using joefox/tts_vits_ru_hf 1