-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 37 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 45 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 35 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2412.21187
-
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
Paper • 2501.03916 • Published • 14 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 89 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 81 -
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper • 2501.05366 • Published • 80
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 98 -
Are Vision-Language Models Truly Understanding Multi-vision Sensor?
Paper • 2412.20750 • Published • 20 -
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper • 2412.21187 • Published • 36 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 96
-
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 78 -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 27 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 31 -
Diving into Self-Evolving Training for Multimodal Reasoning
Paper • 2412.17451 • Published • 42
-
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 47 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 45 -
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper • 2412.21187 • Published • 36 -
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving
Paper • 2412.20735 • Published • 11
-
Meta-Learning a Dynamical Language Model
Paper • 1803.10631 • Published -
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation
Paper • 2003.11963 • Published -
BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model
Paper • 2212.04960 • Published • 1 -
Continuous Learning in a Hierarchical Multiscale Neural Network
Paper • 1805.05758 • Published • 1
-
On Memorization of Large Language Models in Logical Reasoning
Paper • 2410.23123 • Published • 18 -
LLMs Do Not Think Step-by-step In Implicit Reasoning
Paper • 2411.15862 • Published • 8 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 75 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 29
-
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
Paper • 2410.13639 • Published • 17 -
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch
Paper • 2410.18693 • Published • 40 -
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs
Paper • 2412.03205 • Published • 16 -
Free Process Rewards without Process Labels
Paper • 2412.01981 • Published • 31
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 53 -
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Paper • 2306.01693 • Published • 3 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 27