view article Article Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance By tiiuae and 5 others • May 21 • 34
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 79
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published Jun 13 • 21
view article Article CodeAgents + Structure: A Better Way to Execute Actions By akseljoonas and 1 other • May 28 • 72
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published Apr 10 • 29
view article Article Open R1: How to use OlympicCoder locally for coding? By burtenshaw and 4 others • Mar 20 • 62
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 90
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 13
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 55
view article Article Llama can now see and run on your device - welcome Llama 3.2 By merve and 6 others • Sep 25, 2024 • 191
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy By medmekk and 5 others • Sep 18, 2024 • 265
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding Paper • 2408.11049 • Published Aug 20, 2024 • 13
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12, 2024 • 70