Submitted by akhaliq 54 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing · 4 authors 4
Submitted by gallilmaimon 45 Slamming: Training a Speech Language Model on One GPU in a Day · 3 authors 2
Submitted by Canyu 45 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks · 8 authors 3
Submitted by CheeryLJH 21 CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models · 18 authors 3
Submitted by Facico 20 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment · 7 authors 3
Submitted by amphora 19 Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning · 4 authors 2
Submitted by akhaliq 15 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers · 6 authors 3
Submitted by xw-eric 15 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models · 8 authors 2
Submitted by TianjinHuang 12 Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam · 11 authors 2
Submitted by xhyandwyy 10 Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration · 7 authors 2
Submitted by jianlanluo 9 Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation · 6 authors 2
Submitted by irenesolaiman 8 Beyond Release: Access Considerations for Generative AI Systems · 7 authors 2
Submitted by callanwu 7 Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties · 5 authors 4
Submitted by dalime 6 Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models · 6 authors 2
Submitted by GPaolo 6 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning · 5 authors 2
Submitted by peterji 2 Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation · 10 authors 2
Submitted by zouharvi 2 Early-Exit and Instant Confidence Translation Quality Estimation · 5 authors 2
Submitted by codezakh 1 MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use · 6 authors 2
Submitted by WillHeld 1 Mind the Gap! Static and Interactive Evaluations of Large Audio Models · 7 authors 2
Submitted by ludolara 1 Diagnosing COVID-19 Severity from Chest X-Ray Images Using ViT and CNN Architectures · 4 authors 2
Submitted by nielsr 1 M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment · 6 authors 2