arxiv:2505.17894

Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model

Published on May 23

· Submitted by

Hennara on May 27

#1 Paper of the day

Upvote

200

Authors:

Khalil Hennara ,

Muhammad Hreden ,

Mohamed Motaism Hamed ,

Zeina Aldallal ,

Sara Chrouf ,

Abstract

Mutarjim is a compact Arabic-English translation model that outperforms larger models on established benchmarks and achieves state-of-the-art performance on a new comprehensive Tarjama-25 benchmark.

AI-generated summary

We introduce Mutarjim, a compact yet powerful language model for bidirectional Arabic-English translation. While large-scale LLMs have shown impressive progress in natural language processing tasks, including machine translation, smaller models. Leveraging this insight, we developed Mutarjim based on Kuwain-1.5B , a language model tailored for both Arabic and English. Despite its modest size, Mutarjim outperforms much larger models on several established benchmarks, achieved through an optimized two-phase training approach and a carefully curated, high-quality training corpus.. Experimental results show that Mutarjim rivals models up to 20 times larger while significantly reducing computational costs and training requirements. We also introduce Tarjama-25, a new benchmark designed to overcome limitations in existing Arabic-English benchmarking datasets, such as domain narrowness, short sentence lengths, and English-source bias. Tarjama-25 comprises 5,000 expert-reviewed sentence pairs and spans a wide range of domains, offering a more comprehensive and balanced evaluation framework. Notably, Mutarjim achieves state-of-the-art performance on the English-to-Arabic task in Tarjama-25, surpassing even significantly larger and proprietary models like GPT-4o mini. We publicly release Tarjama-25 to support future research and advance the evaluation of Arabic-English translation systems.

View arXiv page View PDF Add to collection

Community

Hennara

Paper author Paper submitter 3 days ago

bayanbaghdadi00

3 days ago

This is truly remarkable work! Mutarjim's performance despite its compact size is incredibly impressive and a significant step forward for Arabic-English translation. Tarjama-25 is also a vital contribution to the field. Congratulations on this outstanding achievement!