-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2501.19393
-
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper • 2501.19324 • Published • 30 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 76 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 17 -
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
Paper • 2501.18965 • Published • 5
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 22 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 100 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Paper • 2501.09751 • Published • 47 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 36 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 294 -
s1: Simple test-time scaling
Paper • 2501.19393 • Published • 76
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 37 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 45 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 35 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 46
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 99 -
Are Vision-Language Models Truly Understanding Multi-vision Sensor?
Paper • 2412.20750 • Published • 20 -
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper • 2412.21187 • Published • 37 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 98
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 40 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 99 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 81 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 24
-
STaR: Bootstrapping Reasoning With Reasoning
Paper • 2203.14465 • Published • 8 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 77 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 58
-
On Memorization of Large Language Models in Logical Reasoning
Paper • 2410.23123 • Published • 18 -
LLMs Do Not Think Step-by-step In Implicit Reasoning
Paper • 2411.15862 • Published • 8 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 77 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 30