arxiv:2507.05980

RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages

Published on Jul 8

· Submitted by

gabrielchua on Jul 10

Upvote

Authors:

Gabriel Chua ,

Abstract

Large language models (LLMs) and their safety classifiers often perform poorly on low-resource languages due to limited training data and evaluation benchmarks. This paper introduces RabakBench, a new multilingual safety benchmark localized to Singapore's unique linguistic context, covering Singlish, Chinese, Malay, and Tamil. RabakBench is constructed through a scalable three-stage pipeline: (i) Generate - adversarial example generation by augmenting real Singlish web content with LLM-driven red teaming; (ii) Label - semi-automated multi-label safety annotation using majority-voted LLM labelers aligned with human judgments; and (iii) Translate - high-fidelity translation preserving linguistic nuance and toxicity across languages. The final dataset comprises over 5,000 safety-labeled examples across four languages and six fine-grained safety categories with severity levels. Evaluations of 11 popular open-source and closed-source guardrail classifiers reveal significant performance degradation. RabakBench not only enables robust safety evaluation in Southeast Asian multilingual settings but also offers a reproducible framework for building localized safety datasets in low-resource environments. The benchmark dataset, including the human-verified translations, and evaluation code are publicly available.

View arXiv page View PDF Project page GitHub 1 Add to collection

Community

gabrielchua

Paper author Paper submitter about 6 hours ago

•

edited about 6 hours ago

🌏 every country has a linguistic fingerprint - a blend of dialects and languages shaping daily life. in the age of global ai, capturing these local nuances isn't optional; it's essential for responsible deployments.

to address this, the govtech ai practice and sutd's social ai studio teamed up to build RabakBench. singapore's rich linguistic landscape - singlish/english, chinese, malay, tamil - creates the perfect stress test for llms and their guardrails.

we think this is a meaningful and challenging benchmark 🏋🏼: evaluations on eleven popular open- and closed-source guardrails shows major inconsistencies. for example, popular guardrail options like openai moderation or llamaguard are not necessarily the best

building high-quality multilingual safety benchmarks for low-resource languages is labor-intensive and difficult to scale. to overcome this, we built RabakBench through a three-stage process that combines human-in-the-loop annotation with LLM-assisted red-teaming 🔦 and multilingual translation 💬. we share this as a demonstration of how we can scale up human annotation with LLM-assisted workflows and collaborative workshops, enabling rigorous, culturally-grounded benchmarks even in data-scarce settings

we hope RabakBench helps devs, researchers, and policymakers. tell us what you think!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.05980 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.05980 in a Space README.md to link it from this page.