Submitted by kuznetsoffandrey 107 Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models · 5 authors 19
Submitted by wujie10 65 Seedance 1.0: Exploring the Boundaries of Video Generation Models · 44 authors 2
Submitted by imryanxu 46 ComfyUI-R1: Exploring Reasoning Models for Workflow Generation · 8 authors 4
Submitted by akhaliq 46 Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation · 9 authors 2
Submitted by Hanyuezhuohua 45 Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation · 5 authors 2
Submitted by hassid 28 Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation · 3 authors 2
Submitted by LongMountain 24 SeerAttention-R: Sparse Attention Adaptation for Long Reasoning · 15 authors 2
Submitted by jy-yuan 16 Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning · 10 authors 2
Submitted by Lemoncoke 16 SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner · 9 authors 3
Submitted by zhenzhiwang 13 InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions · 8 authors 2
Submitted by niveck 13 Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games · 3 authors 2
Submitted by WaltonFuture 10 Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning · 7 authors 2
Submitted by guqiao 9 SAFE: Multitask Failure Detection for Vision-Language-Action Models · 7 authors 2
Submitted by taesiri 8 Hidden in plain sight: VLMs overlook their visual representations · 4 authors 1
Submitted by ashawkey 7 Efficient Part-level 3D Object Generation via Dual Volume Packing · 10 authors 2
Submitted by NikV09 6 UFM: A Simple Path towards Unified Dense Correspondence with Flow · 12 authors 2
Submitted by Zory 4 Can Vision Language Models Infer Human Gaze Direction? A Controlled Study · 10 authors 2
Submitted by sungwon95 3 Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models · 5 authors 2
Submitted by j-morano 2 MIRAGE: Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis · 10 authors 2
Submitted by wy1iu 2 Reparameterized LLM Training via Orthogonal Equivalence Transformation · 6 authors 2
Submitted by SushantGautam 1 Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy · 3 authors 2
Submitted by fangwu97 1 When to Trust Context: Self-Reflective Debates for Context Reliability · 8 authors 2
Submitted by Prakamya - TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games · 6 authors 2
Submitted by TreeForest - A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy · 13 authors 2