The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles Paper • 2502.01081 • Published 3 days ago • 9
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information Paper • 2502.02095 • Published 2 days ago • 3
Improving Transformer World Models for Data-Efficient RL Paper • 2502.01591 • Published 2 days ago • 8
Fast Encoder-Based 3D from Casual Videos via Point Track Processing Paper • 2404.07097 • Published Apr 10, 2024 • 3
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning Paper • 2411.04983 • Published Nov 7, 2024 • 10
Reward-Guided Speculative Decoding for Efficient LLM Reasoning Paper • 2501.19324 • Published 5 days ago • 32
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training Paper • 2501.18511 • Published 6 days ago • 17
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation Paper • 2501.16609 • Published 9 days ago • 6
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published 9 days ago • 17
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 6 days ago • 48
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 8 days ago • 100
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 9 days ago • 24
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 15 days ago • 296
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 15 days ago • 39
view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 22