fadi77 commited on
Commit
72201df
ยท
verified ยท
1 Parent(s): 49ae1d1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -27
README.md CHANGED
@@ -89,10 +89,7 @@ pip install -r requirements.txt
89
  python inference.py --config config.yml --model model.pth --text "ุงู„ุฅูุชู’ู‚ูŽุงู†ู ูŠูŽุญู’ุชูŽุงุฌู ุฅูู„ูŽู‰ ุงู„ู’ุนูŽู…ูŽู„ู ูˆูŽุงู„ู’ู…ูุซูŽุงุจูŽุฑูŽุฉ"
90
  ```
91
 
92
- Make sure to:
93
- - Set the config path to point to the configuration file from this Hugging Face repository
94
- - Install espeak-ng on your system as it's required for the phonemizer to work
95
- - Use properly diacritized Arabic text for best results
96
 
97
  ### Out-of-Scope Use
98
 
@@ -109,36 +106,23 @@ The model is specifically designed for Arabic text-to-speech synthesis and may n
109
  - Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
110
  - The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
111
 
112
- ### Training Infrastructure
113
- - **Hardware:** Single NVIDIA H100 GPU
114
- - **Training Duration:** 20 epochs
115
- - **Validation Metrics:** Identical to original StyleTTS2 training methodology
116
-
117
- ### Training Procedure
118
-
119
- #### Training Hyperparameters
120
 
121
  - **Number of epochs:** 20
122
  - **Diffusion training:** Started from epoch 5
123
- - **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
124
- - **Validation methodology:** Identical to original StyleTTS2 training process
125
- - **Notable modifications:**
126
- - Removed WavLM adversarial training component
127
- - Custom PL-BERT trained for Arabic language
128
-
129
- ## Technical Specifications
130
-
131
- ### Model Architecture and Objective
132
 
133
- The model combines:
134
- 1. A custom-trained Arabic PL-BERT model for text understanding
135
- 2. StyleTTS2 architecture for speech synthesis
136
- 3. Modified training procedure without WavLM adversarial component
137
 
138
  ### Compute Infrastructure
139
-
140
  - **Hardware Type:** NVIDIA H100 GPU
141
- - **Training Time:** Full training completed in 20 epochs
 
 
 
 
 
142
 
143
  ## Citation
144
 
 
89
  python inference.py --config config.yml --model model.pth --text "ุงู„ุฅูุชู’ู‚ูŽุงู†ู ูŠูŽุญู’ุชูŽุงุฌู ุฅูู„ูŽู‰ ุงู„ู’ุนูŽู…ูŽู„ู ูˆูŽุงู„ู’ู…ูุซูŽุงุจูŽุฑูŽุฉ"
90
  ```
91
 
92
+ Make sure use properly diacritized Arabic text for best results
 
 
 
93
 
94
  ### Out-of-Scope Use
95
 
 
106
  - Dataset: [fadi77/arabic-audiobook-dataset-24khz](https://huggingface.co/datasets/fadi77/arabic-audiobook-dataset-24khz)
107
  - The PL-BERT component was trained on fully diacritized Wikipedia Arabic text
108
 
109
+ ### Training Hyperparameters
 
 
 
 
 
 
 
110
 
111
  - **Number of epochs:** 20
112
  - **Diffusion training:** Started from epoch 5
 
 
 
 
 
 
 
 
 
113
 
114
+ ### Objectives
115
+ - **Training objectives:** All original StyleTTS2 objectives maintained, except WavLM adversarial training
116
+ - **Validation objectives:** Identical to original StyleTTS2 validation process
 
117
 
118
  ### Compute Infrastructure
 
119
  - **Hardware Type:** NVIDIA H100 GPU
120
+
121
+ ### Notable Modifications from Original StyleTTS2 in Architecture and Objectives
122
+ The architecture of the model follows that of StyleTTS2 with the following exceptions:
123
+ - Removed WavLM adversarial training component
124
+ - Custom PL-BERT trained for Arabic language
125
+
126
 
127
  ## Citation
128