view article Article Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub By drbh and 6 others • 16 days ago • 101
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation Paper • 2506.04956 • Published 23 days ago • 3
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation Paper • 2506.04956 • Published 23 days ago • 3 • 1
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs Paper • 2504.07866 • Published Apr 10 • 11
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 175
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale Paper • 2505.03005 • Published May 5 • 34
Running on CPU Upgrade 211 211 MMLU-Pro Leaderboard 🥇 More advanced and challenging multi-task evaluation