ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published 2 days ago • 25
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Paper • 2506.18349 • Published 3 days ago • 8
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published 3 days ago • 45