MMS-LID-256 for Regional Languages Classification in India

Model Description

This model includes the implementation of regional languages classification in India described in Voxlect: A Speech Foundation Model Benchmark for Modeling Dialect and Regional Languages Around the Globe

Github repository: https://github.com/tiantiaf0627/voxlect

The included languages spoken in India are:

label_list = [
    "assamese",
    "bengali",
    "bodo",
    "dogri",
    "english",
    "gujarati",
    "hindi",
    "kannada",
    "kashmiri",
    "konkani",
    "maithili",
    "malayalam",
    "manipuri",
    "marathi",
    "nepali",
    "odia",
    "punjabi",
    "sanskrit",
    "santali",
    "sindhi",
    "tamil",
    "telugu",
    "urdu"
]

How to use this model

Download repo

git clone [email protected]:tiantiaf0627/voxlect

Install the package

conda create -n voxlect python=3.8
cd voxlect
pip install -e .

Load the model

# Load libraries
import torch
import torch.nn.functional as F
from src.model.dialect.mms_dialect import MMSWrapper

# Find device
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

# Load model from Huggingface
model = MMSWrapper.from_pretrained("tiantiaf/voxlect-indic-lid-mms-lid-256").to(device)
model.eval()

Prediction

# Label List
label_list = [
    "assamese",
    "bengali",
    "bodo",
    "dogri",
    "english",
    "gujarati",
    "hindi",
    "kannada",
    "kashmiri",
    "konkani",
    "maithili",
    "malayalam",
    "manipuri",
    "marathi",
    "nepali",
    "odia",
    "punjabi",
    "sanskrit",
    "santali",
    "sindhi",
    "tamil",
    "telugu",
    "urdu"
]
    
# Load data, here just zeros as the example
# Our training data filters output audio shorter than 3 seconds (unreliable predictions) and longer than 15 seconds (computation limitation)
# So you need to prepare your audio to a maximum of 15 seconds, 16kHz and mono channel
max_audio_length = 15 * 16000
data = torch.zeros([1, 16000]).float().to(device)[:, :max_audio_length]
logits, embeddings = model(data, return_feature=True)
    
# Probability and output
dialect_prob = F.softmax(logits, dim=1)
print(dialect_list[torch.argmax(dialect_prob).detach().cpu().item()])

Responsible Use: Users should respect the privacy and consent of the data subjects, and adhere to the relevant laws and regulations in their jurisdictions when using Voxlect.

If you have any questions, please contact: Tiantian Feng ([email protected])

❌ Out-of-Scope Use

Clinical or diagnostic applications
Surveillance
Privacy-invasive applications
No commercial use

If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition!

@article{feng2025voxlect,
  title={Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe},
  author={Feng, Tiantian and Huang, Kevin and Xu, Anfeng and Shi, Xuan and Lertpetchpun, Thanathai and Lee, Jihwan and Lee, Yoonjeong and Byrd, Dani and Narayanan, Shrikanth},
  journal={arXiv preprint arXiv:2508.01691},
  year={2025}
}

tiantiaf
/

voxlect-indic-lid-mms-lid-256

MMS-LID-256 for Regional Languages Classification in India

Model Description

How to use this model

Download repo

Install the package

Load the model

Prediction

If you have any questions, please contact: Tiantian Feng ([email protected])

If you like our work or use the models in your work, kindly cite the following. We appreciate your recognition!

Model tree for tiantiaf/voxlect-indic-lid-mms-lid-256

Datasets used to train tiantiaf/voxlect-indic-lid-mms-lid-256

Collection including tiantiaf/voxlect-indic-lid-mms-lid-256

Voxlect - MMS-LID-256