Feedback on model multi

#4
by Hobis - opened

I tested your Multi V2 model, and it's great, but it makes language mistakes. For example, the English voice speaking Polish has a foreign accent. Wouldn't it have been better to use a BPE tokenizer instead of IPA? It seems to me that eSpeak itself has language errors.

I'm limited with 20gb vram for training so the results will not be as good as should I'm adding Chinese and Italian in V3 . I don't know about bpe tokenizera i must check this out . All voices will always keep their accent no matter what language you use to synthesise.

XTTS uses a BPE tokenizer

Espeak Ipa helps a lot with numbers handling and it reduces a lot numer of tokens ( phonemes ) to train a model with bpe I would need more GPU resources and training data especially digits which for different languages got different rules of writing espeak handles it as a subprocess.

I understand, I'm waiting for the v3 model.

Hey, I have some feedback about the multilanguage model. Can you help me understand this? When I’m using it locally on Pinokio, I don’t see the options to choose:

#Select Reference Language
#Select Synthesized Language

These options are available only on Hugging Face: https://huggingface.co/spaces/Gregniuki/f5-tts_Polish_English_German
Because I can’t select the input and output languages, the model isn’t generating proper audio. Am I doing something wrong, perhaps? I’d be very grateful for help.
I’m attaching screenshots to better explain the issue.
Zrzut ekranu 2025-03-29 221516.jpg
Zrzut ekranu 2025-03-29 221714.jpg

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment