How to train other hakka ?
How to train other Hakka accent models?
Hi, for the training script, you can refer to this file: https://github.com/txya900619/TTS/blob/hat_tts/recipes/hat-tts/vits/train_vits.py.
For the data loading part, you can refer to hat_tts in https://github.com/txya900619/TTS/blob/hat_tts/TTS/tts/datasets/formatters.py, and customize it according to the data you have. For example, with this model, we first convert Hakka characters to IPA and then use that as the model input.
However, at this current point in time, I would recommend not using this model architecture. Instead, directly fine-tune F5-TTS; the results will be much better. The next version of our Hakka TTS is also planned to be trained this way.
Hi, for the training script, you can refer to this file: https://github.com/txya900619/TTS/blob/hat_tts/recipes/hat-tts/vits/train_vits.py.
For the data loading part, you can refer to hat_tts in https://github.com/txya900619/TTS/blob/hat_tts/TTS/tts/datasets/formatters.py, and customize it according to the data you have. For example, with this model, we first convert Hakka characters to IPA and then use that as the model input.
However, at this current point in time, I would recommend not using this model architecture. Instead, directly fine-tune F5-TTS; the results will be much better. The next version of our Hakka TTS is also planned to be trained this way.
Thank your. Today, I trained the Hakka dataset for 200 epochs using the f5-tts model today, but testing revealed many problems. For example, the synthesized audio often has missing words, and the voice quality is similar to the original audio, but occasionally some strange things happen.
hi lukeewin, may I ask: which datasets did you use to train f5-tts?
If I train a Hakka speech synthesis model based on F5-TTS, do I need to modify the code in F5-TTS, such as the g2p (grapheme-to-phoneme) code? Do I need to modify it, or can I directly use F5-TTS to train the Hakka model? Also, how many hours of dataset do you recommend using to train the model? What are the special requirements for the audio sampling rate and format? And for the training epochs, how many rounds are generally needed? I tried training for 200 epochs today and saw some effects, but the synthesized Hakka speech doesn't sound quite right, it's a bit like speaking Mandarin. I'm not sure if it's because the training epochs are too few, or for other reasons.
hi lukeewin, may I ask: which datasets did you use to train f5-tts?
The dataset used is my own private dataset, not an open-source dataset.
hi lukeewin, may I ask: which datasets did you use to train f5-tts?
Use LuChuan GuangXi hakka datasets