ai4bharat/indic-parler-tts · Doubts and queries about the model

ai4bharat

I have a few doubts and queries regarding this TTS model, which are as follows:

Language Identification for Common Scripts:
How does the model identify the language when multiple languages share the same or nearly identical scripts? For example:
- Bengali and Assamese use the same script.
- Hindi and Sanskrit have almost identical scripts.
Under the "Tips" section, there is a statement:

"The remaining speech features (gender, speaking rate, pitch, and reverberation) can be controlled directly
through the prompt."

Can you provide an example usage for this?
Why is it necessary to fine-tune on a subset of the same dataset used to train the pre-trained model?