BigScience Catalogue Data

non-profit

https://bigscience.huggingface.co

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

karthikrangasai authored a paper 15 days ago

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

karthikrangasai authored a paper 15 days ago

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

pjox authored a paper 4 months ago

SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing

View all activity

karthikrangasai

authored 2 papers 15 days ago

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

Paper • 2206.15076 • Published Jun 30, 2022 • 5

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 37

albertvillanova

posted an update 3 months ago

Post

2695

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

albertvillanova

posted an update 3 months ago

Post

1947

5 years already working in democratizing AI 🤗
Grateful to be part of such an awesome team making it happen every day.

pjox

authored a paper 4 months ago

SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing

Paper • 2512.11192 • Published Dec 12, 2025 • 1

afaji

authored a paper 4 months ago

PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues

Paper • 2601.17277 • Published Jan 24 • 6

yjernite

authored a paper 4 months ago

INTIMA: A Benchmark for Human-AI Companionship Behavior

Paper • 2508.09998 • Published Aug 4, 2025 • 11

christopher

authored a paper 6 months ago

Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

Paper • 2512.03073 • Published Nov 27, 2025 • 7

meg

posted an update 7 months ago

Post

4282

🤖 Did you know your voice might be cloned without your consent from just *one sentence* of audio?
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate

3 replies

christopher

authored a paper 7 months ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Paper • 2510.13996 • Published Oct 15, 2025 • 9

lvwerra

authored a paper 7 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 40

sasha

authored 3 papers 7 months ago

Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model

Paper • 2211.02001 • Published Nov 3, 2022

Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI

Paper • 2409.14160 • Published Sep 21, 2024 • 3

From Efficiency Gains to Rebound Effects: The Problem of Jevons' Paradox in AI's Polarized Environmental Debate

Paper • 2501.16548 • Published Jan 27, 2025

christopher

posted an update 7 months ago

Post

752

Something very cool is cooking at

Lichess

1 reply

HugoLaurencon

authored a paper 8 months ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36

meg

posted an update 8 months ago

Post

2972

🤖 As AI-generated content is shared in movies/TV/across the web, there's one simple low-hanging fruit 🍇 to help know what's real: Visible watermarks. With the Gradio team, I've made sure it's trivially easy to add this disclosure to images, video, chatbot text. See how: https://huggingface.co/blog/watermarking-with-gradio
Thanks to the code collab in particular from @abidlabs and Yuvraj Sharma.

yjernite

posted an update 8 months ago

Post

3036

Tremendous quality of life upgrade on the Hugging Face Hub - we now have auto-complete emojis 🤗 🥳 👏 🙌 🎉

Get ready for lots more very serious analysis on a whole range of topics from yours truly now that we have unlocked this full range of expression 😄 🤔 🗣 🙊

davanstrien

posted an update 9 months ago

Post

2620

I fine-tuned a smol VLM to generate specialized art history metadata!

https://huggingface.co/davanstrien/iconclass-vlm: Qwen2.5-VL-3B trained using SFT to generate ICONCLASS codes (think Dewey Decimal for art!)

Trained with TRL + HF Jobs - single UV script, no GPU needed!

Space to explore predictions on a test set: davanstrien/iconclass-predictions

Blog soon!

1 reply

afaji

authored a paper 9 months ago

Predicting the Order of Upcoming Tokens Improves Language Modeling

Paper • 2508.19228 • Published Aug 26, 2025 • 23

AI & ML interests

Recent Activity

Team members 45

bigscience-catalogue-data's activity