view article Article Extract Text and Knowledge from Images with Open Vision Language Models Oct 23 • 5
view article Article Introducing AI Sheets: a tool to work with datasets using open AI models! +4 Aug 8 • 106
view article Article LLM Hallucinations: bug or feature? The US Supreme Court 2025 cases experiment Jul 8 • 19
view article Article FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages Jul 8 • 32
view article Article FineWeb2-C: Help Build Better Language Models in Your Language Dec 23, 2024 • 21
view article Article Introducing the Synthetic Data Generator - Build Datasets with Natural Language +4 Dec 16, 2024 • 152
view article Article Open Preference Dataset for Text-to-Image Generation by the 🤗 Community +5 Dec 9, 2024 • 69
view article Article Argilla 2.4: Easily Build Fine-Tuning and Evaluation Datasets on the Hub — No Code Required +1 Nov 4, 2024 • 45
view article Article How to build a custom text classifier without days of human labeling Oct 17, 2024 • 56
view article Article How to optimize your data labelling project with custom interfaces Oct 16, 2024 • 20
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context +6 Jul 23, 2024 • 241
view article Article How we leveraged distilabel to create an Argilla 2.0 Chatbot +3 Jul 16, 2024 • 33