Submitted by akhaliq 56 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs · 14 authors 11
Submitted by ArthurDouillard 27 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch · 14 authors 7
Submitted by lindsay-qu 21 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding · 9 authors 2
Submitted by davanstrien 19 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training · 2 authors 4
Submitted by WeiChow 18 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding · 6 authors 3
Submitted by Yuyang-z 17 SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer · 13 authors 2
Submitted by oaishi 7 CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation · 7 authors 2