Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published 7 days ago • 22
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published 7 days ago • 22 • 1
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published 25 days ago • 88
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning Paper • 2503.18769 • Published Mar 24 • 10
Running 36 36 Open LMM Reasoning Leaderboard 🥇 A Leaderboard that demonstrates LMM reasoning capabilities
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 57
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published Feb 27 • 47