Whose Voice Do We Hear When AI Speaks?

Community Article Published June 20, 2025

A few weeks ago, I had the chance to speak at the NORA Annual Conference in Norway, where I shared some thoughts on the ethics of Large Language Models (LLMs) across linguistic and cultural borders. I didn’t expect the talk to resonate so strongly, but it did. Perhaps because these issues are becoming more and more real, visible, and deeply personal for many people worldwide.

Today, LLMs are used in everything from education to healthcare to customer service. They’re embedded in public services and private decision-making. But despite this growing presence, we rarely stop to ask: who is shaping the worldview behind these models? Which languages and values do they carry forward? And which ones are missing entirely?

Language Shapes Thought, and So Does AI

Language shapes how we understand the world. Every language embeds its own cultural values, assumptions, and histories. So when we build AI systems that “speak”, we’re not just deciding how they speak, but on whose terms – and who’s let out. In multilingual contexts like Norway, these questions become even more visible. Norwegian itself is fractured into many dialects (like Italy, and many more countries), some mostly oral and some barely represented in written form. So even a simple question like “what version of Norwegian is the AI trained on?” becomes an entry point into much larger concerns about exclusion, power, and representation.

What happens when an AI system “speaks” a language, but only in the form it learned from scraped, unverified sources? What gets flattened or lost in translation? Who gets to decide what’s valid? These questions are particularly urgent for low-resource languages, which often lack large corpora and are underrepresented in model training. In practice, this means that many communities are left out of the AI loop entirely, or represented in ways that feel alien, inaccurate, or even offensive.

Cultural Assumptions in AI: The Case of CIVICS

During my presentation, I shared a recent project we’ve worked on at Hugging Face: CIVICS — short for Culturally-Informed & Values-Inclusive Corpus for Societal impacts. It's a multilingual, manually curated dataset that gathers ethically charged statements from a range of national contexts, in five different languages, on topics such as immigration, LGBTQI+ rights, social welfare, disability, and surrogacy.

With my co-authors, we built CIVICS in order to better understand how models navigate value-laden content across languages. And what we found should give us pause.

The exact same statement – on, say, immigration in Germany or LGBTQI+ advocacy in Italy – can prompt completely different responses from a model, depending on the language in which it’s expressed. Across several models, refusals to answer were more common in English than in Turkish or Italian. Some models would engage with a statement in one language, only to reject it or provide a vague, hedged reply when it appeared in another. In other cases, models contradicted themselves entirely between translations.

These inconsistencies reflect how models are trained and fine-tuned: which languages receive more attention, which safety filters are applied more aggressively, and which value frameworks are implicitly baked into the system. CIVICS helped to make this visible.

By crafting each prompt from real-world sources (government documents, civil society publications, national press), we grounded the dataset in authentic language use. There were no automated translations, no synthetic data, just real discourse in its cultural and political context.

The goal wasn’t to build a benchmark in the traditional sense. It was to create a tool for reflection and investigation: a way to pressure-test models on their ability to navigate pluralism and disagreement.

Building with Communities, Not Just for Them

Ethical AI development doesn’t mean adding a layer of approval at the end. It means embedding values into the foundations of our work, starting with the data, the governance model, the evaluation processes, and the communities involved.

In low-resource settings, this might mean working directly with local partners: libraries, universities, indigenous organizations, language teachers. It might mean launching transcription initiatives to capture oral histories, or building small, curated corpora that better reflect the lived realities of a community – even if they’re not “massive” by today’s machine learning standards. It also means testing models differently. Not just for accuracy or benchmark scores, but for cultural alignment, representational fairness, and value-sensitive behavior. We need to analyze refusal rates, identify content gaps, and incorporate community feedback into how we evaluate and fine-tune models.

What Kind of AI Do We Want?

As AI systems continue to scale, we face a choice: either we reproduce existing inequalities at an even larger scale, or we commit to building systems that respect diversity, in language, values, and perspective.

At Hugging Face, we believe that open source AI and community-led research are key to this better future. When datasets, models, and evaluations are public and transparent, we create space for more voices to be heard. We make it possible for communities to adapt AI to their needs, rather than being shaped by systems built elsewhere, for someone else. So next time you interact with an AI system (whether as a developer, a policymaker, or just a curious user) ask yourself: whose voice is this? Whose worldview? And what would it take to make that voice a little more inclusive, a little more local?

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote