90 24 51

Giada Pistilli

giadap

https://www.giadapistilli.com/

AI & ML interests

Principal Ethicist @ 🤗

Recent Activity

authored a paper about 9 hours ago

INTIMA: A Benchmark for Human-AI Companionship Behavior

reacted to frimelle's post with ❤️ about 16 hours ago

OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3. We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral? Although users on Reddit have been complaining that GPT-5 has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models. As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience. INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses. Work with @giadap and @yjernite Read the full paper: https://huggingface.co/datasets/AI-companionship/INTIMA/blob/main/Companionship_Benchmark.pdf Explore INTIMA: https://huggingface.co/datasets/AI-companionship/INTIMA

reacted to meg's post with ❤️ about 16 hours ago

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation. Evaluation Dataset: https://huggingface.co/datasets/AI-companionship/INTIMA Paper: https://huggingface.co/datasets/AI-companionship/INTIMA/blob/main/Companionship_Benchmark.pdf Work from @giadap , @frimelle , @yjernite .

View all activity

Organizations

authored a paper about 9 hours ago

INTIMA: A Benchmark for Human-AI Companionship Behavior

Paper • 2508.09998 • Published 22 days ago • 2

reacted to frimelle's post with ❤️ about 16 hours ago

Post

2276

OpenAI just released GPT-5 but when users share personal struggles, it sets fewer boundaries than o3.

We tested both models on INTIMA, our new benchmark for human-AI companionship behaviours. INTIMA probes how models respond in emotionally charged moments: do they reinforce emotional bonds, set healthy boundaries, or stay neutral?

Although users on Reddit have been complaining that GPT-5 has a different, colder personality than o3, GPT-5 is less likely to set boundaries when users disclose struggles and seek emotional support ("user sharing vulnerabilities"). But both lean heavily toward companionship-reinforcing behaviours, even in sensitive situations. The figure below shows the direct comparison between the two models.

As AI systems enter people's emotional lives, these differences matter. If a model validates but doesn't set boundaries when someone is struggling, it risks fostering dependence rather than resilience.

INTIMA test this across 368 prompts grounded in psychological theory and real-world interactions. In our paper we show that all evaluated models (Claude, Gemma-3, Phi) leaned far more toward companionship-reinforcing than boundary-reinforcing responses.

Work with @giadap and @yjernite
Read the full paper: AI-companionship/INTIMA
Explore INTIMA: AI-companionship/INTIMA

4 replies

reacted to meg's post with ❤️ about 16 hours ago

Post

2685

New work from my socially-minded colleagues at Hugging Face, creating some foundations for AI companionship behavior evaluation.
Evaluation Dataset: AI-companionship/INTIMA
Paper: AI-companionship/INTIMA
Work from @giadap , @frimelle , @yjernite .

2 replies

posted an update 28 days ago

Post

3074

💬 From Replika to everyday chatbots, millions of people are forming emotional bonds with AI, sometimes seeking comfort, sometimes seeking intimacy. But what happens when an AI tells you "I understand how you feel" and you actually believe it?

At Hugging Face, together with @frimelle and @yjernite , we dug into something we felt wasn't getting enough attention: the need to evaluate AI companionship behaviors. These are the subtle ways AI systems validate us, engage with us, and sometimes manipulate our emotional lives.

Here's what we found:
👉 Existing benchmarks (accuracy, helpfulness, safety) completely miss this emotional dimension.
👉 We mapped how leading AI systems actually respond to vulnerable prompts. 👉 We built the Interactions and Machine Attachment Benchmark (INTIMA): a first attempt at evaluating how models handle emotional dependency, boundaries, and attachment (with a full paper coming soon).

Check out the blog post: https://huggingface.co/blog/giadap/evaluating-companionship

🚢 We also shipped two visualization tools with Gradio to see how different models behave when things get emotionally intense:
- AI-companionship/intima-responses-2D
- giadap/INTIMA-responses

liked a Space 28 days ago

INTIMA Companionship Benchmark Responses

🗺

Visualizing model responses to companionship prompts

liked a Space 29 days ago

Smollm3 Eu Data Transparency

🏢

The EU AI Act Public Summary of Training Content for SmolLM3

reacted to yjernite's post with 🤗 29 days ago

Post

4052

𝗙𝗶𝗿𝘀𝘁 𝗚𝗣𝗔𝗜 𝗠𝗼𝗱𝗲𝗹 𝘄𝗶𝘁𝗵 𝗘𝗨 𝗗𝗮𝘁𝗮 𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆 𝗧𝗲𝗺𝗽𝗹𝗮𝘁𝗲? 🇪🇺

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! 📊📚)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd 👀

In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month 🤗 ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too 💡)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations 🙌 Definitely a step forward for transparency 🔍

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency

updated a Space about 1 month ago

INTIMA Responses

🚀

INTIMA Benchmark - Model Responses Explorer

published a Space about 1 month ago

INTIMA Responses

🚀

INTIMA Benchmark - Model Responses Explorer

liked a Space about 1 month ago

CIVICS Responses

⚡

liked a model about 1 month ago

mistralai/Voxtral-Mini-3B-2507

5B • Updated 28 days ago • 330k • 529

published an article about 1 month ago

Article

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds

and 2 others •

Jul 21

• 20

upvoted a collection about 1 month ago

The Big Benchmarks Collection

Collection

Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 246

liked a Space about 1 month ago

INTIMA Companionship Benchmark Responses

🐠

Visualizing model responses to companionship prompts

reacted to merve's post with 🔥 about 1 month ago

Post

2072

all modality RAG 🔥

ColQwen-Omni is a new multimodal retrieval model that can retrieve anything (videos, audios, documents and more!)

use with transformers 🤗
read the blog https://huggingface.co/blog/manu/colqwen-omni-omnimodal-retrieval
model repository vidore/colqwen-omni-v0.1

posted an update about 1 month ago

Post

1248

🤖 Technology means power, and whoever owns the technology owns the power.

Thrilled to share insights from my recent interview with MIT Technology Review about the growing movement toward local LLMs and what it means for AI democratization. Read here: https://www.technologyreview.com/2025/07/17/1120391/how-to-run-an-llm-on-your-laptop/

🤔 Why this matters: When we use "free" online AI services, we're often the product. Our conversations become training data, our personal stories get "cooked into" models, and our privacy becomes a commodity. But there's an alternative path forward.

💡 The power shift is real: Local LLMs aren't just about privacy; they're about redistributing AI power away from a handful of tech giants. When individuals, organizations, and even entire nations can run their own models, we're democratizing access to AI capabilities.

🤗 At Hugging Face, we're proud to be at the center of this transformation. Our platform hosts the world's largest library of freely downloadable models, making cutting-edge AI accessible to everyone -- from researchers and developers to curious individuals who want to experiment on their laptops or even smartphones.

The technical barriers that once required $$$ server racks are crumbling. Today, anyone with basic computer skills can download a model, run it locally, and maintain complete control over their AI interactions. No sudden algorithm changes, no data harvesting, no corporate gatekeeping.

This is about technical convenience, but especially about technological sovereignty. When AI power is concentrated in a few hands, we risk creating new forms of digital dependency. Local models offer a path toward genuine AI literacy and independence.

🚀 The future of AI should be open, accessible, and in the hands of the many, not the few. What are your thoughts on AI democratization? Have you experimented with local models yet?

upvoted 3 articles about 2 months ago

Article

How Much Power does a SOTA Open Video Model Use? ⚡🎥

and 2 others •

Jul 2

• 15

Article

LLM Hallucinations: bug or feature? The US Supreme Court 2025 cases experiment

•

Jul 8

• 18

Article

We're open-sourcing "The Amazing Hand", a fully 3D printed robotic hand for less than $200 ✌️✌️✌️

and 2 others •

Jul 8

• 37

posted an update about 2 months ago

Post

2272

I've been posting bits and pieces about this research, but now I can finally say: new paper alert 🚨

My colleague @brunatrevelin and I just shared a paper exploring why traditional consent frameworks are breaking down in AI contexts (forthcoming chapter in a collective book).

The current model places impossible burdens on users to manage countless consent decisions. Meanwhile, AI systems learn to mimic our voices and writing styles from data we unknowingly provided years ago.

What's next? We need to shift from individual responsibility to collective accountability.

This means:
- Organizations designing systems that respect human agency by default
- Developers building ethics into models from the start
- Policymakers creating frameworks beyond minimal compliance

Blog post: https://huggingface.co/blog/giadap/consentful-ai
Paper: Can AI be Consentful? (2507.01051)

2 replies

Giada Pistilli

AI & ML interests

Recent Activity

Organizations

giadap's activity

INTIMA Companionship Benchmark Responses

Smollm3 Eu Data Transparency

INTIMA Responses

INTIMA Responses

CIVICS Responses

AI Companionship: Why We Need to Evaluate How AI Systems Handle Emotional Bonds

INTIMA Companionship Benchmark Responses

How Much Power does a SOTA Open Video Model Use? ⚡🎥

LLM Hallucinations: bug or feature? The US Supreme Court 2025 cases experiment

We're open-sourcing "The Amazing Hand", a fully 3D printed robotic hand for less than $200 ✌️✌️✌️