Update README.md
Browse files
README.md
CHANGED
@@ -89,10 +89,7 @@ pip install -r requirements.txt
|
|
89 |
python inference.py --config config.yml --model model.pth --text "ุงูุฅูุชูููุงูู ููุญูุชูุงุฌู ุฅูููู ุงููุนูู
ููู ููุงููู
ูุซูุงุจูุฑูุฉ"
|
90 |
```
|
91 |
|
92 |
-
Make sure
|
93 |
-
- Set the config path to point to the configuration file from this Hugging Face repository
|
94 |
-
- Install espeak-ng on your system as it's required for the phonemizer to work
|
95 |
-
- Use properly diacritized Arabic text for best results
|
96 |
|
97 |
### Out-of-Scope Use
|
98 |
|
@@ -109,36 +106,23 @@ The model is specifically designed for Arabic text-to-speech synthesis and may n
|
|
109 |
- Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
|
110 |
- The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
|
111 |
|
112 |
-
### Training
|
113 |
-
- **Hardware:** Single NVIDIA H100 GPU
|
114 |
-
- **Training Duration:** 20 epochs
|
115 |
-
- **Validation Metrics:** Identical to original StyleTTS2 training methodology
|
116 |
-
|
117 |
-
### Training Procedure
|
118 |
-
|
119 |
-
#### Training Hyperparameters
|
120 |
|
121 |
- **Number of epochs:** 20
|
122 |
- **Diffusion training:** Started from epoch 5
|
123 |
-
- **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
|
124 |
-
- **Validation methodology:** Identical to original StyleTTS2 training process
|
125 |
-
- **Notable modifications:**
|
126 |
-
- Removed WavLM adversarial training component
|
127 |
-
- Custom PL-BERT trained for Arabic language
|
128 |
-
|
129 |
-
## Technical Specifications
|
130 |
-
|
131 |
-
### Model Architecture and Objective
|
132 |
|
133 |
-
|
134 |
-
|
135 |
-
|
136 |
-
3. Modified training procedure without WavLM adversarial component
|
137 |
|
138 |
### Compute Infrastructure
|
139 |
-
|
140 |
- **Hardware Type:** NVIDIA H100 GPU
|
141 |
-
|
|
|
|
|
|
|
|
|
|
|
142 |
|
143 |
## Citation
|
144 |
|
|
|
89 |
python inference.py --config config.yml --model model.pth --text "ุงูุฅูุชูููุงูู ููุญูุชูุงุฌู ุฅูููู ุงููุนูู
ููู ููุงููู
ูุซูุงุจูุฑูุฉ"
|
90 |
```
|
91 |
|
92 |
+
Make sure use properly diacritized Arabic text for best results
|
|
|
|
|
|
|
93 |
|
94 |
### Out-of-Scope Use
|
95 |
|
|
|
106 |
- Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
|
107 |
- The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
|
108 |
|
109 |
+
### Training Hyperparameters
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
110 |
|
111 |
- **Number of epochs:** 20
|
112 |
- **Diffusion training:** Started from epoch 5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
113 |
|
114 |
+
### Objectives
|
115 |
+
- **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
|
116 |
+
- **Validation objectives:** Identical to original StyleTTS2 validation process
|
|
|
117 |
|
118 |
### Compute Infrastructure
|
|
|
119 |
- **Hardware Type:** NVIDIA H100 GPU
|
120 |
+
|
121 |
+
### Notable Modifications from Original StyleTTS2 in Architecture and Objectives
|
122 |
+
The architecture of the model follows that of StyleTTS2 with the following exceptions:
|
123 |
+
- Removed WavLM adversarial training component
|
124 |
+
- Custom PL-BERT trained for Arabic language
|
125 |
+
|
126 |
|
127 |
## Citation
|
128 |
|