Papers
arxiv:2506.13487

TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs

Published on Jun 16
Authors:
,
,
,

Abstract

TurBLiMP evaluates language models' linguistic abilities using 16 phenomena across 1000 minimal pairs, highlighting challenges for LMs in Turkish syntax, word order, and morphology.

AI-generated summary

We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguistic phenomena with 1000 minimal pairs each, TurBLiMP fills an important gap in linguistic evaluation resources for Turkish. In designing the benchmark, we give extra attention to two properties of Turkish that remain understudied in current syntactic evaluations of LMs, namely word order flexibility and subordination through morphological processes. Our experiments on a wide range of LMs and a newly collected set of human acceptability judgments reveal that even cutting-edge Large LMs still struggle with grammatical phenomena that are not challenging for humans, and may also exhibit different sensitivities to word order and morphological complexity compared to humans.

Community

Hi,

I've just tried out the evaluation code from the official repo and extended it to perform an overall evaluation of my current Turkish Language Models.

Here are the results:

Phenomenon dbmdz/electra-small-turkish-cased-generator dbmdz/electra-base-turkish-cased-generator dbmdz/electra-base-turkish-mc4-cased-generator dbmdz/electra-base-turkish-mc4-uncased-generator dbmdz/bert-base-turkish-cased dbmdz/bert-base-turkish-uncased dbmdz/bert-base-turkish-128k-cased dbmdz/bert-base-turkish-128k-uncased dbmdz/distilbert-base-turkish-cased dbmdz/convbert-base-turkish-cased dbmdz/convbert-base-turkish-mc4-cased dbmdz/convbert-base-turkish-mc4-uncased
Anaphor Agreement 74.1 94.3 94.3 92.8 96.7 97.3 97.3 97.7 96.9 58.1 44.3 44.6
Argument Str. Tran. 86.6 99.6 99.4 98.7 99.7 99.6 99.8 99.1 97.5 51.9 58.1 51.3
Argument Str. Ditr. 79.3 96.1 95.5 95.2 99.8 96.1 96.1 96.1 95.4 64.6 58.6 64.5
Binding 70.7 96.2 91.4 89.6 99.9 98.5 97.7 99 93 89.1 49.4 78.4
Determiners 91.8 99.3 98.2 99.1 99.9 100 99 99.3 82.9 0 0 0
Ellipsis 10.6 49.7 46.3 49 87.4 73.6 96.6 87.5 13.6 54.7 57.8 67.9
Irregular Forms 98.7 97.9 99 99.8 98.8 100 99.9 99.6 94.1 82.9 86.6 95.2
Island Effects 39.1 35.3 41.8 44 49.4 39.8 60.9 51.2 47.4 96.7 99.4 100
Nominalization 90 96.6 97 95.4 97.4 97 98.9 97.4 95.6 55.2 59.2 60.6
NPI Licensing 90.9 96.1 95 98 98.2 97.6 97.2 95 92.1 82.1 95.6 71.9
Passives 100 91.2 93.6 91.6 82.2 78.1 84.4 81.3 98.8 100 100 99
Quantifiers 97.9 98 98 97.6 95.7 94.6 98 98.4 98.4 99 99 99
Relative Clauses 79.9 90.7 92 91.6 97.7 97.5 97 98.5 92 53.4 53.7 56.9
Scrambling 99.5 100 100 99.8 100 100 99.6 100 99.8 38.7 59.3 63.3
Subject Agreement 82.8 99 97.2 96.1 98.3 99.2 99.1 98.8 97 47.7 43.9 56.4
Suspended Affixation 97.5 99 99.1 98.8 100 100 100 100 100 25.4 12.8 23.2
Model Average 80.6 89.9 89.9 89.8 93.8 91.8 95.1 93.7 87.2 62.5 61.1 64.5

So I highly recommend to try out the dbmdz/bert-base-turkish-128k-casedmodel as well, it is achieving new SOTA on that benchmark! I will release model outputs and my evaluation code soon on the Model Hub :)

/cc @ezgibasar , @fpadovani , @jumelet and @arianna-bis

·

Hi,
Thank you for your work! It is super interesting to see the results for all these architectures. Good to know that the cased variant performs even better than the uncased one. We'll keep these results in mind moving forward!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.13487 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.13487 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.13487 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.