Indic Benchmarks

community

https://indicbench.com

AI & ML interests

None defined yet.

Organization Card

Community About org cards

Indic Language Benchmarking for Large Language Models

India is diverse with 22+ languages. This project aims to benchmark the performance of large language models on Indic languages across datasets. Goal is to evaluate a models abilities in understanding, generating, and processing text in these languages.

We currently have 8 languages across 3 datasets, more coming soon

Languages

Bengali (bn)
Gujarati (gu)
Hindi (hi)
Kannada (kn)
Malayalam (ml)
Odiya (or)
Tamil (ta)
Telugu (te)

Datasets

ARC-Challenge: hi, bn, gu, kn, ml, or, ta, te
TruthfulQA: hi, bn, gu, kn, ml, or, ta, te
Hellaswag: hi, bn, gu, kn, ml, or, ta, te

Code

We are also trying to build an MMLU dataset with Indian Knowledge. If anyone is interested in contributing, please reach out to Ram, Munish

models 0

None public yet

datasets 23

indicbench/hellaswag_or

Viewer • Updated Mar 28, 2024 • 20k • 8

indicbench/hellaswag_ta

Viewer • Updated Mar 28, 2024 • 20k • 21

indicbench/hellaswag_ml

Viewer • Updated Mar 28, 2024 • 20k • 16

indicbench/hellaswag_kn

Viewer • Updated Mar 28, 2024 • 20k • 7

indicbench/hellaswag_te

Viewer • Updated Mar 28, 2024 • 20k • 10

indicbench/hellaswag_bn

Viewer • Updated Mar 28, 2024 • 20k • 37

indicbench/hellaswag_gu

Viewer • Updated Mar 28, 2024 • 20k • 10

indicbench/truthfulqa_te

Updated Mar 28, 2024 • 12

indicbench/truthfulqa_ml

Updated Mar 28, 2024 • 53

indicbench/truthfulqa_kn

Updated Mar 28, 2024 • 8

View 23 datasets