Introducing Synthetic Data Workshop: Your Gateway to Easy Synthetic Dataset Creation Jun 20, 2024 β’ 12
Synthetic dataset generation techniques: generating custom sentence similarity data May 23, 2024 β’ 16
Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia? May 7, 2024 β’ 8
Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models Mar 20, 2024 β’ 74
Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 β’ 29
Huggy Lingo: Using Machine Learning to Improve Language Metadata on the Hugging Face Hub Aug 2, 2023 β’ 1
Babelscape/t5-base-summarization-claim-extractor Text2Text Generation β’ Updated 16 days ago β’ 149k β’ 7
ibm-granite/granite-embedding-107m-multilingual Sentence Similarity β’ Updated 19 days ago β’ 6.87k β’ 7
Running on CPU Upgrade 20 20 Leaderboard 2 Demo π Demo of the new, massively multilingual leaderboard
karina-zadorozhny/M320M-multi-modal-molecular-dataset Viewer β’ Updated 8 days ago β’ 17.5M β’ 39 β’ 2
Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B Viewer β’ Updated 9 days ago β’ 150k β’ 640 β’ 10