VITS model Text to Speech Russian
The text accepts lowercase
Example Text to Speech
from transformers import VitsModel, AutoTokenizer
import torch
import scipy
model = VitsModel.from_pretrained("joefox/tts_vits_ru_hf")
tokenizer = AutoTokenizer.from_pretrained("joefox/tts_vits_ru_hf")
text = "Привет, как дел+а? Всё +очень хорош+о! А у тебя как?"
text = text.lower()
inputs = tokenizer(text, return_tensors="pt")
inputs['speaker_id'] = 3
with torch.no_grad():
output = model(**inputs).waveform
scipy.io.wavfile.write("techno.wav", rate=model.config.sampling_rate, data=output[0].cpu().numpy())
For displayed in a Jupyter Notebook / Google Colab:
from IPython.display import Audio
Audio(output, rate=model.config.sampling_rate)
Languages covered
Russian (ru_RU)
- Downloads last month
- 510
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.