File size: 2,138 Bytes
4b65bde 20404a9 4b65bde 20404a9 4b65bde 20404a9 4b65bde 20404a9 4b65bde 20404a9 4b65bde 20404a9 4b65bde |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
tags:
- text-to-speech
- vietnamese
- ai-model
- deep-learning
license: cc-by-nc-sa-4.0
library_name: pytorch
datasets:
- PhoAudioBook
- ViVoice
- UEH
model_name: ZipVoice-Vietnamese-2500h
language: vi
---
# π Important Note β οΈ
This model is only intended for **research purposes**.
**Access requests must be made using an institutional, academic, or corporate email**. Requests from public email providers will be denied. We appreciate your understanding.
# ποΈ ZipVoice-Vietnamese-2500h
ZipVoice is a series of fast and high-quality zero-shot TTS models based on flow matching.
Key features:
1. Small and fast: only 123M parameters.
2. High-quality voice cloning: state-of-the-art performance in speaker similarity, intelligibility, and naturalness.
3. Multi-lingual: support Chinese and English.
4. Multi-mode: support both single-speaker and dialogue speech generation.
This checkpoint is a compact fine-tuned version of ZipVoice trained on 2500 hours of Vietnamese speech.
π For more fine-tuning and inference experiments, visit: https://github.com/k2-fsa/ZipVoice.
π **License:** [CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0) β Non-commercial research use only.
---
## π Model Details
- **Dataset:** PhoAudioBook, ViVoice, TeacherDinh-UEH.
- **Total dataset durations:** 2500 hours
- **Data processing Technique:**
- Remove all music background from audios, using facebook demucs model: https://github.com/facebookresearch/demucs
- Do not use audio files shorter than 1 second or longer than 30 seconds.
- Keep the default punctuation marks unchanged.
- Normalize to lowercase format.
- **Training Configuration:**
- **Base Model:** ZipVoice with espeak-ng vi for tokenizer
- **GPU:** RTX 3090
- **Batch Siz:** Max duration 200
- **Training Progress:** Stopped at **525,000 steps at epoch 11**
---
## π Update Note
Thank you, Teacher Δα»nh from the University of Economics Ho Chi Minh City (UEH), for providing me with an additional 50-hours high-quality labeled dataset.
Him contact: https://www.facebook.com/luudinhit93 |