Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23, 2024 • 33
Cohere Labs Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 4 items • Updated 25 days ago • 42
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier Paper • 2412.04261 • Published Dec 5, 2024 • 5
Cohere Labs Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated 25 days ago • 56
Teaching Models to Understand (but not Generate) High-risk Data Paper • 2505.03052 • Published May 5 • 6
view article Article GaLore: Advancing Large Model Training on Consumer-grade Hardware By Titus-von-Koeller and 8 others • Mar 20, 2024 • 32
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 25
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 41
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 455
view article Article Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub By drbh and 6 others • Jun 12 • 126
view article Article From PyTorch DDP to 🤗 Accelerate to 🤗 Trainer, mastery of distributed training with ease By muellerzr • Oct 21, 2022 • 36
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 46
view article Article How to train a new language model from scratch using Transformers and Tokenizers By julien-c • Feb 14, 2020 • 44
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 47