Papers
arxiv:2506.14012

Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text

Published on Jun 16
· Submitted by amr-mohamed on Jun 25
Authors:
,

Abstract

LLMs' comprehension and reasoning skills are evaluated under code-switching conditions, revealing that embedding English into other languages can improve understanding, while prompts and fine-tuning affect degradation mitigation differently.

AI-generated summary

Code-switching (CSW) is the act of alternating between two or more languages within a single discourse. This phenomenon is widespread in multilingual communities, and increasingly prevalent in online content, where users naturally mix languages in everyday communication. As a result, Large Language Models (LLMs), now central to content processing and generation, are frequently exposed to code-switched inputs. Given their widespread use, it is crucial to understand how LLMs process and reason about such mixed-language text. This paper presents a systematic evaluation of LLM comprehension under code-switching by generating CSW variants of established reasoning and comprehension benchmarks. While degradation is evident when foreign tokens disrupt English textx2013even under linguistic constraintsx2013embedding English into other languages often improves comprehension. Though prompting yields mixed results, fine-tuning offers a more stable path to degradation mitigation.

Community

Paper author Paper submitter

This paper investigates how LLMs handle code-switched text by generating mixed-language versions of benchmarks, revealing that disrupting English with foreign tokens degrades performance, while embedding English into other languages can enhance it—highlighting fine-tuning as a more reliable strategy than prompting for mitigating such degradations.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.14012 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.14012 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.14012 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.