SOD: Step-wise On-policy Distillation for Small Language Model Agents Paper • 2605.07725 • Published 20 days ago • 13
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 9 days ago • 99
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 16 days ago • 191
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 15 days ago • 268
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification Paper • 2605.09269 • Published 18 days ago • 6
trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 Text Generation • 2.43M • Updated Dec 19, 2025 • 5.88M • 6
SciLT: Long-Tailed Classification in Scientific Image Domains Paper • 2604.03687 • Published Apr 4 • 8