GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published 15 days ago • 184
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published 15 days ago • 50
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation Paper • 2506.14028 • Published about 1 month ago • 91
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14 • 54
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published about 1 month ago • 253
Audio-Aware Large Language Models as Judges for Speaking Styles Paper • 2506.05984 • Published Jun 6 • 15
ChARM: Character-based Act-adaptive Reward Modeling for Advanced Role-Playing Language Agents Paper • 2505.23923 • Published May 29 • 7
Reverse Preference Optimization for Complex Instruction Following Paper • 2505.22172 • Published May 28 • 6