Testerpce
's Collections
Reasoning
updated
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
39
Prometheus: Inducing Fine-grained Evaluation Capability in Language
Models
Paper
•
2310.08491
•
Published
•
55
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
•
2411.04282
•
Published
•
35
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
•
2411.14432
•
Published
•
25
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
•
2412.15797
•
Published
•
18
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
100
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
113
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
•
2502.03275
•
Published
•
17
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
132
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
•
2502.10454
•
Published
•
7
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
•
2502.10458
•
Published
•
34
Entropy-Regularized Process Reward Model
Paper
•
2412.11006
•
Published
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings
from MCTS-Boosted Mathematical Reasoning
Paper
•
2412.15904
•
Published
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
•
2503.05592
•
Published
•
25
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
•
2503.10639
•
Published
•
47
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
•
2503.08525
•
Published
•
15
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
•
2503.17352
•
Published
•
21
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
•
2503.24235
•
Published
•
42
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
•
2503.22230
•
Published
•
41