view article Article Open Source AI: A Cornerstone of Digital Sovereignty By frimelle and 1 other • 19 days ago • 14
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published 25 days ago • 42
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated 24 days ago • 26
view article Article Blazingly fast whisper transcriptions with Inference Endpoints By mfuntowicz and 5 others • May 13 • 70
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 868
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 101
view article Article OCR Processing and Text in Image Analysis with Florence-2-base and Qwen2-VL-2B By PandorAI1995 • Oct 18, 2024 • 17
HPLT 1.2 Uni-Direction Translation Models Collection HPLT's MT releases. https://github.com/hplt-project/HPLT-MT-Models • 65 items • Updated Apr 7 • 2
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 130