π»π³ Vietnamese Text-to-Speech (TTS)
Model Description
This is a Vietnamese Text-to-Speech (TTS) model trained to generate natural-sounding Vietnamese speech from text. The model is designed for applications such as virtual assistants, audiobooks, and accessibility tools.
- Model Name:
zalopay/vietnamese-tts
- Language: Vietnamese (
vi
) - Task: Text-to-Speech (TTS)
- Framework: F5-TTS
- License: CC-BY-4.0
Model Architecture
- F5-TTS uses Diffusion Transformer with ConvNeXt V2, faster trained and inference.
Training Data
- Dataset: this model was trained using 200+ hours public Vietnamese Voice and Youtube
Inference Example
from f5_tts.infer.utils_infer import (
preprocess_ref_audio_text,
load_vocoder,
load_model,
infer_process,
save_spectrogram,
)
vocoder = load_vocoder()
# dim: 1024
# depth: 22
# heads: 16
# ff_mult: 2
# text_dim: 512
model = load_model(
DiT,
dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4),
ckpt_path=str(
cached_path("hf://zalopay/vietnamese-tts/model_960000.pt")
),
mel_spec_type="vocos",
vocab_file=str(cached_path("hf://zalopay/vietnamese-tts/vocab.txt")),
)
...
ref_audio, ref_text = preprocess_ref_audio_text(ref_audio_orig, ref_text)
gr.Info("Generated audio text: {} with audio file {} ".format(ref_text, ref_audio_orig))
final_wave, final_sample_rate, combined_spectrogram = infer_process(
ref_audio,
ref_text,
gen_text,
model,
vocoder,
cross_fade_duration=0.15,
nfe_step=32,
speed=speed,
)
Applications
- Virtual assistants (e.g., chatbots, AI voice interactions)
- Audiobooks and content narration
- Accessibility tools for visually impaired users
- Automated announcements and voiceovers
Limitations & Biases
- May struggle with uncommon words or names.
- Limited support for different accents or dialects.
- Background noise or pronunciation inconsistencies may occur.
- Duplicated voice may occur
Citation
If you use this model, please cite:
@misc{zalopay-vietnamese-tts,
title={Zalopay Vietnamese Text-to-Speech Model},
author={Zalopay},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/zalopay/vietnamese-tts}
}
Acknowledgments
Special thanks to F5-TTS for providing such wonderful base model and framework
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for zalopay/vietnamese-tts
Base model
SWivid/F5-TTS