hf-dumplings (hf-dumplings)

frimelle

authored 2 papers 2 months ago

Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions

Paper • 2310.05779 • Published Oct 9, 2023 • 1

Presumed Cultural Identity: How Names Shape LLM Responses

Paper • 2502.11995 • Published Feb 17 • 11

frimelle

posted an update 2 months ago

Post

2406

What’s in a name? More than you might think, especially for AI.
Whenever I introduce myself, people often start speaking French to me, even though my French is très basic. It turns out that AI systems do something similar:
Large language models infer cultural identity from names, shaping their responses based on presumed backgrounds. But is this helpful personalization or a reinforcement of stereotypes?
In our latest paper, we explored this question by testing DeepSeek, Llama, Aya, Mistral-Nemo, and GPT-4o-mini on how they associate names with cultural identities. We analysed 900 names from 30 cultures and found strong assumptions baked into AI responses: some cultures were overrepresented, while others barely registered.
For example, a name like "Jun" often triggered Japan-related responses, while "Carlos" was linked primarily to Mexico, even though these names exist in multiple countries. Meanwhile, names from places like Ireland led to more generic answers, suggesting weaker associations in the training data.
This has real implications for AI fairness: How should AI systems personalize without stereotyping? Should they adapt at all based on a name?
Work with some of my favourite researchers: @sidicity Arnav Arora and @IAugenstein
Read the full paper here: Presumed Cultural Identity: How Names Shape LLM Responses (2502.11995)

frimelle

posted an update 3 months ago

Post

527

I was quoted in an article about the French Lucie AI in La Presse. While I love the name for obvious reasons 👀 there were still a lot of problems with the model and how and when it was deployed. Nevertheless seeing new smaller models being developed is an exciting direction for the next years of AI development to come!

https://www.lapresse.ca/affaires/techno/2025-02-02/radioscopie/lucie-l-ia-francaise-qui-ne-passe-pas-le-test.php

Also fun to see my comments in French.

frimelle

posted an update 3 months ago

Post

1682

Seeing AI develop has been a wild ride, from trying to explain why we'd bother to generate a single sentence with a *neural network* to explaining that AI is not a magic, all-knowing box. The recent weeks and months have been a lot of talking about how AI works; to policy makers, to other developers, but also and mainly friends and family without a technical background.

Yesterday, the first provisions of the EU AI Act came into force, and one of the the key highlights are the AI literacy requirements for organisations deploying AI systems. This isn't just a box-ticking exercise. Ensuring that employees and stakeholders understand AI systems is crucial for fostering responsible and transparent AI development. From recognising biases to understanding model limitations, AI literacy empowers individuals to engage critically with these technologies and make informed decisions.

In the context of Hugging Face, AI literacy has many facets: allowing more people to contribute to AI development, providing courses and documentation to ensuring access is possible, and accessible AI tools that empower users to better understand how AI systems function. This isn't just a regulatory milestone; it’s an opportunity to foster a culture where AI literacy becomes foundational, enabling stakeholders to recognise biases, assess model limitations, and engage critically with technology.

Embedding these principles into daily practice, and eventually extending our learnings in AI literacy to the general public, is essential for building trustworthy AI that aligns with societal values.

2 replies

·

lunarflu

posted an update 5 months ago

Post

2039

great blogpost! 🔥@wolfram
https://huggingface.co/blog/wolfram/llm-comparison-test-2024-12-04

frimelle

authored a paper 6 months ago

Wikimedia data for AI: a review of Wikimedia datasets for NLP tasks and AI-assisted editing

Paper • 2410.08918 • Published Oct 11, 2024 • 2

lunarflu

posted an update 8 months ago

Post

1564

@Blane187 could you please modify the title of your blogpost? content is cool, title could be nicer imo https://huggingface.co/blog/Blane187/wtf-is-rvc

3 replies

·

lunarflu

posted an update 9 months ago

Post

1915

Cool things this week from @huggingface !

🌎AI math olympiad winner NuminaMath is here!
🤗Announcing New Hugging Face and Keras NLP integration
✨UI overhaul to HF tokens!
🧊 Embed our dataset viewer on any webpage!

https://huggingface.co/blog/winning-aimo-progress-prize
https://huggingface.co/blog/keras-nlp-integration
https://huggingface.co/settings/tokens
https://x.com/julien_c/status/1812099420726456457

Check out the full list on our discord! 👇
https://discord.com/invite/JfAtkvEtRb

lunarflu

posted an update 11 months ago

Post

2338

By popular demand, HF activity tracker v1.0 is here! 📊 let's build it together!🤗

Lots of things to improve, feel free to open PRs in the community tab!

good PR ideas:
- track more types of actions that include date+time
- bigger plot
- track discord activity too 🤯
- link github? ⚡

https://huggingface.co/spaces/huggingface-projects/LevelBot

2 replies

·

frimelle

posted an update 11 months ago

Post

1882

Wikimedia and Hugging Face seem kind of naturally complementary: Both are community-centred, value openness and consent. That's why I'd love to see more Wikipedia and other Wikimedia projects' datasets on Hugging Face to advance machine learning with diverse, community-curated data! See my new article on the Hugging Face hub for why and how to create more Wikimedia datasets on Hugging Face: https://huggingface.co/blog/frimelle/wikipedias-treasure-trove-ml-data

lunarflu

posted an update 11 months ago

Post

1976

Weekly highlights for the HF ecosystem!

🚀 Phi 3
🦅 Falcon VLM
🤗 sentence-transformers v3.0 is here! Train and finetune embedding models with multi-GPU training, bf16 support, loss logging, callbacks and more!
🥳 Gradio launch event 6/6! We're launching 1.0 versions of two new libraries, Python + JS client libraries to programmatically query Gradio apps, and several new features making it easier to use Gradio apps in production!
✨ Tools now available in HuggingChat! Use any AI apps built by the community! 🔥
🧊 ML for 3D Course Unit 3 is here! Covering Gaussian splatting, how it fits in the generative 3D pipeline, and hands-on code to build your own demo!

See the full list here!
https://discord.com/channels/879548962464493619/897387888663232554/1245036889539612764 !