Submitted by Xueqing 91 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation · 44 authors 3
Submitted by nicolaus625 54 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following · 5 authors 12 2
Submitted by LiuXR 44 LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs · 6 authors 22 3
Submitted by shun-zheng 39 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs · 12 authors 8
Submitted by mparvez 38 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team · 4 authors 2
Submitted by koustuvs 28 V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning · 30 authors 2
Submitted by zhangshaolei 27 Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model · 5 authors 319 2
Submitted by amsabour 20 Align Your Flow: Scaling Continuous-Time Flow Map Distillation · 3 authors 7
Submitted by yilunzhao 17 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure · 4 authors 6 3
Submitted by cetosignis 15 From Bytes to Ideas: Language Modeling with Autoregressive U-Nets · 6 authors 3
Submitted by ahmedheakl 11 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees · 5 authors 3 2
Submitted by zichenwen 10 EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models · 8 authors 2
Submitted by CostaliyA 10 CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios · 9 authors 10 2
Submitted by akhaliq 9 Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs · 46 authors 2
Submitted by Liuff23 9 xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations · 33 authors 60 2
Submitted by giannisdaras 9 Ambient Diffusion Omni: Training Good Models with Bad Data · 5 authors 23 2
Submitted by Xuandong 7 AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents · 4 authors 3
Submitted by XaiverZ 7 Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning · 3 authors 30 2
Submitted by Siyuc 6 Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders · 5 authors 2
Submitted by dsouzadaniel 4 Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers · 5 authors 2
Submitted by JJ-TMT 4 CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation · 4 authors 2
Submitted by hsichelin 4 EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction · 4 authors 7 2
Submitted by amanchadha 3 Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations · 15 authors 2
Submitted by BeileiCui 3 TR2M: Transferring Monocular Relative Depth to Metric Depth with Language Descriptions and Scale-Oriented Contrast · 4 authors 2
Submitted by ChetKao 3 Graph Counselor: Adaptive Graph Exploration via Multi-Agent Synergy to Enhance LLM Reasoning · 7 authors 2
Submitted by FaiyazAbdullah114708 2 VisText-Mosquito: A Multimodal Dataset and Benchmark for AI-Based Mosquito Breeding Site Detection and Reasoning · 7 authors 4 2
Submitted by MaxDu 1 DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance · 2 authors 21 2