Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Paper • 2506.03106 • Published Jun 3 • 6 • 2
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Paper • 2505.17018 • Published May 22 • 15 • 2
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper • 2412.02611 • Published Dec 3, 2024 • 24 • 2
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? Paper • 2410.01623 • Published Oct 2, 2024 • 3 • 1