Audio
Collection
Dhivehi Voice AI Collection: Tools for Thaana speech recognition (ASR), text-to-speech (TTS), and audio processing
•
26 items
•
Updated
This is a fine-tuned VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) model for Divehi speech synthesis. The model produces Male voice audio from Thaana-scripted Divehi text. Fine-tuned from Meta’s MMS-TTS architecture using a curated dataset of synthetic Divehi speech.
Field | Value |
---|---|
Model ID | alakxender/mms-tts-div-finetuned-md-m01 |
Base Architecture | MMS-TTS (VITS) |
Language | Divehi (dv) |
Voice | Male |
Sampling Rate | 16 kHz |
Tokenizer | VITSTokenizer |
Inference Engine | Transformers (🤗 Hugging Face) |
from transformers import VitsModel, VitsTokenizer
import torchaudio
tokenizer = VitsTokenizer.from_pretrained("alakxender/mms-tts-div-finetuned-md-m01")
model = VitsModel.from_pretrained("alakxender/mms-tts-div-finetuned-md-m01")
text = "މޫސުން ވަރަށް ގޯސްވެ، ފުވައްމުލަކުން ފެށިގެން އައްޑުއަށް އޮރެންޖް އެލާޓް ނެރެފި"
inputs = tokenizer(text, return_tensors="pt")
waveform = model.generate(**inputs).waveform[0]
torchaudio.save("output.wav", waveform.unsqueeze(0), 16000)
alakxender/mms-tts-div-finetuned-md-m01
3.228
{
"5": "Excellent (very natural)",
"4": "Good (mostly natural)",
"3": "Fair (some robotic quality)",
"2": "Poor (noticeably unnatural)",
"1": "Bad (unintelligible or very synthetic)"
}
outputs/audio/
outputs/spectrograms/
outputs/report.txt
outputs/mos_scores.txt
Base model
facebook/mms-tts-div