Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! By Writer and 1 other • 6 days ago • 44
mem-agent: Persistent, Human Readable Memory Agent Trained with Online RL By driaforall and 1 other • 6 days ago • 12
Fine-tune Any LLM from the Hugging Face Hub with Together AI By togethercomputer and 3 others • 7 days ago • 7
"Anemll-style" Root-Mean-Square (RMS) Normalization on the Apple Neural Engine: A Simple Hack By anemll • about 15 hours ago • 7
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models By imomayiz and 4 others • about 20 hours ago • 6
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 216
Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚 By Isayoften • Aug 26, 2024 • 74
makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7, 2024 • 98
Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! By Writer and 1 other • 6 days ago • 44
mem-agent: Persistent, Human Readable Memory Agent Trained with Online RL By driaforall and 1 other • 6 days ago • 12
Fine-tune Any LLM from the Hugging Face Hub with Together AI By togethercomputer and 3 others • 7 days ago • 7
"Anemll-style" Root-Mean-Square (RMS) Normalization on the Apple Neural Engine: A Simple Hack By anemll • about 15 hours ago • 7
AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models By imomayiz and 4 others • about 20 hours ago • 6
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 216
Efficient Deep Learning: A Comprehensive Overview of Optimization Techniques 👐 📚 By Isayoften • Aug 26, 2024 • 74
makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7, 2024 • 98