Dynamic Chunking for End-to-End Hierarchical Sequence Modeling Paper • 2507.07955 • Published 5 days ago • 15
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 7 days ago • 515
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • 6 days ago • 543
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper • 2507.04009 • Published 10 days ago • 28
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published 14 days ago • 73
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published 13 days ago • 50
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Paper • 2507.02778 • Published 12 days ago • 9
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 By tomaarsen and 1 other • 14 days ago • 89
Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published 22 days ago • 8
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 21 days ago • 44
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published 19 days ago • 28
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 5 days ago • 204
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published Jun 12 • 71
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 29 days ago • 253
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Paper • 2506.10952 • Published Jun 12 • 23
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9 • 18