RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning Paper • 2507.07451 • Published about 1 month ago • 4
FlexOlmo: Open Language Models for Flexible Data Use Paper • 2507.07024 • Published about 1 month ago • 6
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization Paper • 2508.00222 • Published 8 days ago • 5
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning Paper • 2507.12508 • Published 24 days ago • 26
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published Jul 7 • 15
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner Paper • 2507.13332 • Published 23 days ago • 49
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper • 2507.07996 • Published 30 days ago • 32
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • 4 days ago • 414
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks Paper • 2507.23751 • Published 9 days ago • 1
Persona Vectors: Monitoring and Controlling Character Traits in Language Models Paper • 2507.21509 • Published 11 days ago • 25
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Paper • 2507.07484 • Published about 1 month ago • 17
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once Paper • 2507.10541 • Published 26 days ago • 29
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published 12 days ago • 75
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards Paper • 2507.09104 • Published 28 days ago • 17
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization Paper • 2507.12142 • Published 24 days ago • 36
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 58
Checklists Are Better Than Reward Models For Aligning Language Models Paper • 2507.18624 • Published 16 days ago • 2