RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Paper • 2506.08672 • Published 29 days ago • 31
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection Paper • 2505.16475 • Published May 22 • 2
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection Paper • 2505.16475 • Published May 22 • 2
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space Paper • 2505.13308 • Published May 19 • 26
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts Paper • 2503.22952 • Published Mar 29 • 18
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published Feb 26 • 30
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others • Dec 9, 2022 • 294