sheikhjubair
's Collections
reasoning-agentic
updated
Paper
•
2412.16720
•
Published
•
34
LearnLM: Improving Gemini for Learning
Paper
•
2412.16429
•
Published
•
22
NILE: Internal Consistency Alignment in Large Language Models
Paper
•
2412.16686
•
Published
•
8
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
•
2412.16145
•
Published
•
39
Paper
•
2412.15115
•
Published
•
368
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward
Modeling
Paper
•
2412.15084
•
Published
•
13
Xmodel-2 Technical Report
Paper
•
2412.19638
•
Published
•
27
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
•
2503.16419
•
Published
•
75
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
•
2503.16219
•
Published
•
51
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical
Reasoning Models with OpenMathReasoning dataset
Paper
•
2504.16891
•
Published
•
22
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
•
2504.16078
•
Published
•
20
ToolRL: Reward is All Tool Learning Needs
Paper
•
2504.13958
•
Published
•
44
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper
•
2504.11536
•
Published
•
60
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and
Verifiable Mathematical Dataset for Advancing Reasoning
Paper
•
2504.11456
•
Published
•
13
Reasoning Models Can Be Effective Without Thinking
Paper
•
2504.09858
•
Published
•
12
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
Paper
•
2505.08311
•
Published
•
17
Are Reasoning Models More Prone to Hallucination?
Paper
•
2505.23646
•
Published
•
24
ATLAS: Learning to Optimally Memorize the Context at Test Time
Paper
•
2505.23735
•
Published
•
23
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM
Reasoning
Paper
•
2505.20561
•
Published
•
7