hynt commited on
Commit
20404a9
Β·
verified Β·
1 Parent(s): 4b65bde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -10
README.md CHANGED
@@ -7,12 +7,10 @@ tags:
7
  license: cc-by-nc-sa-4.0
8
  library_name: pytorch
9
  datasets:
10
- - VLSP2021
11
- - VLSP2022
12
- - VLSP2023
13
- - vietTTS
14
  - UEH
15
- model_name: ZipVoice-Vietnamese-150h
16
  language: vi
17
  ---
18
 
@@ -20,7 +18,7 @@ language: vi
20
  This model is only intended for **research purposes**.
21
  **Access requests must be made using an institutional, academic, or corporate email**. Requests from public email providers will be denied. We appreciate your understanding.
22
 
23
- # πŸŽ™οΈ ZipVoice-Vietnamese-150h
24
  ZipVoice is a series of fast and high-quality zero-shot TTS models based on flow matching.
25
 
26
  Key features:
@@ -32,7 +30,7 @@ Key features:
32
 
33
  4. Multi-mode: support both single-speaker and dialogue speech generation.
34
 
35
- This checkpoint is a compact fine-tuned version of ZipVoice trained on 150 hours of Vietnamese speech.
36
 
37
  πŸ”— For more fine-tuning and inference experiments, visit: https://github.com/k2-fsa/ZipVoice.
38
 
@@ -42,8 +40,8 @@ This checkpoint is a compact fine-tuned version of ZipVoice trained on 150 hours
42
 
43
  ## πŸ“Œ Model Details
44
 
45
- - **Dataset:** VLSP 2021, VLSP 2022, VLSP 2023, VietTTS, TeacherDinh-UEH and some speech sources from YouTube channels.
46
- - **Total dataset durations:** 150 hours
47
  - **Data processing Technique:**
48
  - Remove all music background from audios, using facebook demucs model: https://github.com/facebookresearch/demucs
49
  - Do not use audio files shorter than 1 second or longer than 30 seconds.
@@ -53,7 +51,7 @@ This checkpoint is a compact fine-tuned version of ZipVoice trained on 150 hours
53
  - **Base Model:** ZipVoice with espeak-ng vi for tokenizer
54
  - **GPU:** RTX 3090
55
  - **Batch Siz:** Max duration 200
56
- - **Training Progress:** Stopped at **96,000 steps at epoch 30**
57
 
58
  ---
59
 
 
7
  license: cc-by-nc-sa-4.0
8
  library_name: pytorch
9
  datasets:
10
+ - PhoAudioBook
11
+ - ViVoice
 
 
12
  - UEH
13
+ model_name: ZipVoice-Vietnamese-2500h
14
  language: vi
15
  ---
16
 
 
18
  This model is only intended for **research purposes**.
19
  **Access requests must be made using an institutional, academic, or corporate email**. Requests from public email providers will be denied. We appreciate your understanding.
20
 
21
+ # πŸŽ™οΈ ZipVoice-Vietnamese-2500h
22
  ZipVoice is a series of fast and high-quality zero-shot TTS models based on flow matching.
23
 
24
  Key features:
 
30
 
31
  4. Multi-mode: support both single-speaker and dialogue speech generation.
32
 
33
+ This checkpoint is a compact fine-tuned version of ZipVoice trained on 2500 hours of Vietnamese speech.
34
 
35
  πŸ”— For more fine-tuning and inference experiments, visit: https://github.com/k2-fsa/ZipVoice.
36
 
 
40
 
41
  ## πŸ“Œ Model Details
42
 
43
+ - **Dataset:** PhoAudioBook, ViVoice, TeacherDinh-UEH.
44
+ - **Total dataset durations:** 2500 hours
45
  - **Data processing Technique:**
46
  - Remove all music background from audios, using facebook demucs model: https://github.com/facebookresearch/demucs
47
  - Do not use audio files shorter than 1 second or longer than 30 seconds.
 
51
  - **Base Model:** ZipVoice with espeak-ng vi for tokenizer
52
  - **GPU:** RTX 3090
53
  - **Batch Siz:** Max duration 200
54
+ - **Training Progress:** Stopped at **525,000 steps at epoch 11**
55
 
56
  ---
57