arxiv:2506.13487

TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs

Published on Jun 16

Authors:

Abstract

TurBLiMP evaluates language models' linguistic abilities using 16 phenomena across 1000 minimal pairs, highlighting challenges for LMs in Turkish syntax, word order, and morphology.

AI-generated summary

We introduce TurBLiMP, the first Turkish benchmark of linguistic minimal pairs, designed to evaluate the linguistic abilities of monolingual and multilingual language models (LMs). Covering 16 linguistic phenomena with 1000 minimal pairs each, TurBLiMP fills an important gap in linguistic evaluation resources for Turkish. In designing the benchmark, we give extra attention to two properties of Turkish that remain understudied in current syntactic evaluations of LMs, namely word order flexibility and subordination through morphological processes. Our experiments on a wide range of LMs and a newly collected set of human acceptability judgments reveal that even cutting-edge Large LMs still struggle with grammatical phenomena that are not challenging for humans, and may also exhibit different sensitivities to word order and morphological complexity compared to humans.

View arXiv page View PDF Add to collection

Community

stefan-it

3 days ago

Hi,

I've just tried out the evaluation code from the official repo and extended it to perform an overall evaluation of my current Turkish Language Models.

Here are the results:

Phenomenon	`dbmdz/electra-small-turkish-cased-generator`	`dbmdz/electra-base-turkish-cased-generator`	`dbmdz/electra-base-turkish-mc4-cased-generator`	`dbmdz/electra-base-turkish-mc4-uncased-generator`	`dbmdz/bert-base-turkish-cased`	`dbmdz/bert-base-turkish-uncased`	`dbmdz/bert-base-turkish-128k-cased`	`dbmdz/bert-base-turkish-128k-uncased`	`dbmdz/distilbert-base-turkish-cased`	`dbmdz/convbert-base-turkish-cased`	`dbmdz/convbert-base-turkish-mc4-cased`	`dbmdz/convbert-base-turkish-mc4-uncased`
Anaphor Agreement	74.1	94.3	94.3	92.8	96.7	97.3	97.3	97.7	96.9	58.1	44.3	44.6
Argument Str. Tran.	86.6	99.6	99.4	98.7	99.7	99.6	99.8	99.1	97.5	51.9	58.1	51.3
Argument Str. Ditr.	79.3	96.1	95.5	95.2	99.8	96.1	96.1	96.1	95.4	64.6	58.6	64.5
Binding	70.7	96.2	91.4	89.6	99.9	98.5	97.7	99	93	89.1	49.4	78.4
Determiners	91.8	99.3	98.2	99.1	99.9	100	99	99.3	82.9	0	0	0
Ellipsis	10.6	49.7	46.3	49	87.4	73.6	96.6	87.5	13.6	54.7	57.8	67.9
Irregular Forms	98.7	97.9	99	99.8	98.8	100	99.9	99.6	94.1	82.9	86.6	95.2
Island Effects	39.1	35.3	41.8	44	49.4	39.8	60.9	51.2	47.4	96.7	99.4	100
Nominalization	90	96.6	97	95.4	97.4	97	98.9	97.4	95.6	55.2	59.2	60.6
NPI Licensing	90.9	96.1	95	98	98.2	97.6	97.2	95	92.1	82.1	95.6	71.9
Passives	100	91.2	93.6	91.6	82.2	78.1	84.4	81.3	98.8	100	100	99
Quantifiers	97.9	98	98	97.6	95.7	94.6	98	98.4	98.4	99	99	99
Relative Clauses	79.9	90.7	92	91.6	97.7	97.5	97	98.5	92	53.4	53.7	56.9
Scrambling	99.5	100	100	99.8	100	100	99.6	100	99.8	38.7	59.3	63.3
Subject Agreement	82.8	99	97.2	96.1	98.3	99.2	99.1	98.8	97	47.7	43.9	56.4
Suspended Affixation	97.5	99	99.1	98.8	100	100	100	100	100	25.4	12.8	23.2
Model Average	80.6	89.9	89.9	89.8	93.8	91.8	95.1	93.7	87.2	62.5	61.1	64.5

So I highly recommend to try out the dbmdz/bert-base-turkish-128k-casedmodel as well, it is achieving new SOTA on that benchmark! I will release model outputs and my evaluation code soon on the Model Hub :)

/cc @ezgibasar , @fpadovani , @jumelet and @arianna-bis

ezgibasar

2 days ago

Hi,
Thank you for your work! It is super interesting to see the results for all these architectures. Good to know that the cased variant performs even better than the uncased one. We'll keep these results in mind moving forward!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.13487 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.13487 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.13487 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.