How to train other hakka ?

by lukeewin - opened May 16

Discussion

lukeewin

May 16

How to train other Hakka accent models?

txya900619

FormoSpeech org May 16

•

edited May 16

Hi, for the training script, you can refer to this file: https://github.com/txya900619/TTS/blob/hat_tts/recipes/hat-tts/vits/train_vits.py.
For the data loading part, you can refer to hat_tts in https://github.com/txya900619/TTS/blob/hat_tts/TTS/tts/datasets/formatters.py, and customize it according to the data you have. For example, with this model, we first convert Hakka characters to IPA and then use that as the model input.
However, at this current point in time, I would recommend not using this model architecture. Instead, directly fine-tune F5-TTS; the results will be much better. The next version of our Hakka TTS is also planned to be trained this way.

lukeewin

May 16

Hi, for the training script, you can refer to this file: https://github.com/txya900619/TTS/blob/hat_tts/recipes/hat-tts/vits/train_vits.py.
For the data loading part, you can refer to hat_tts in https://github.com/txya900619/TTS/blob/hat_tts/TTS/tts/datasets/formatters.py, and customize it according to the data you have. For example, with this model, we first convert Hakka characters to IPA and then use that as the model input.
However, at this current point in time, I would recommend not using this model architecture. Instead, directly fine-tune F5-TTS; the results will be much better. The next version of our Hakka TTS is also planned to be trained this way.

Thank your. Today, I trained the Hakka dataset for 200 epochs using the f5-tts model today, but testing revealed many problems. For example, the synthesized audio often has missing words, and the voice quality is similar to the original audio, but occasionally some strange things happen.

hungshinlee

FormoSpeech org May 16

hi lukeewin, may I ask: which datasets did you use to train f5-tts?

lukeewin

May 16

If I train a Hakka speech synthesis model based on F5-TTS, do I need to modify the code in F5-TTS, such as the g2p (grapheme-to-phoneme) code? Do I need to modify it, or can I directly use F5-TTS to train the Hakka model? Also, how many hours of dataset do you recommend using to train the model? What are the special requirements for the audio sampling rate and format? And for the training epochs, how many rounds are generally needed? I tried training for 200 epochs today and saw some effects, but the synthesized Hakka speech doesn't sound quite right, it's a bit like speaking Mandarin. I'm not sure if it's because the training epochs are too few, or for other reasons.

lukeewin

May 16

hi lukeewin, may I ask: which datasets did you use to train f5-tts?

The dataset used is my own private dataset, not an open-source dataset.

lukeewin

May 16

hi lukeewin, may I ask: which datasets did you use to train f5-tts?

Use LuChuan GuangXi hakka datasets

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment