This model is fine-tuned on the Tamil dataset from Common Voice 16.1, preprocessed using Epitran for transliterating text into IPA. The 'tam-Taml' code was employed to generate a precise phoneme list, crucial for capturing the nuances of Tamil phonetics:

  • Vowels:

    • Monophthongs:'a', 'aː', 'e', 'eː', 'i', 'iː', 'o', 'oː', 'u', 'uː'
    • Diphthongs: 'aj', 'aʋ'
  • Consonants:

    • Nasals: 'm', 'n̪', 'n', 'ɳ', 'ɲ', 'ŋ'
    • Stops: 'p', 't̪', 'ʈ', 'k',
    • Affricates: 't͡ʃ', 'd͡ʒ'
    • Fricatives: 's', 'ʂ', 'ʃ', 'h'
    • Tap: 'ɾ'
    • Trill: 'r'
    • Approximants: 'ʋ','ɻ', 'j', 'l', 'ɭ'
    • Consonant cluster: 'kʂ'
  • Special Symbols: '்' (denotes the absence of inherent vowel)

Downloads last month
17
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train speech31/XLS-R-tamil-phoneme