ugryumnik
's Collections
Backlog
updated
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper
•
2503.01785
•
Published
•
78
When an LLM is apprehensive about its answers -- and when its
uncertainty is justified
Paper
•
2503.01688
•
Published
•
21
Predictive Data Selection: The Data That Predicts Is the Data That
Teaches
Paper
•
2503.00808
•
Published
•
57
Chain of Draft: Thinking Faster by Writing Less
Paper
•
2502.18600
•
Published
•
48
Multi-Turn Code Generation Through Single-Step Rewards
Paper
•
2502.20380
•
Published
•
31
Self-rewarding correction for mathematical reasoning
Paper
•
2502.19613
•
Published
•
84
MPO: Boosting LLM Agents with Meta Plan Optimization
Paper
•
2503.02682
•
Published
•
27
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
277
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
120
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
114
AgentTuning: Enabling Generalized Agent Abilities for LLMs
Paper
•
2310.12823
•
Published
•
35
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
102
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
99
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
•
2501.03262
•
Published
•
99
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta
Chain-of-Though
Paper
•
2501.04682
•
Published
•
98
Agent Laboratory: Using LLM Agents as Research Assistants
Paper
•
2501.04227
•
Published
•
91
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
Task Synthesis
Paper
•
2412.19723
•
Published
•
89
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper
•
2501.18492
•
Published
•
87
Towards Best Practices for Open Datasets for LLM Training
Paper
•
2501.08365
•
Published
•
61
Paper
•
2412.15115
•
Published
•
365
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
89
Training Large Language Models to Reason in a Continuous Latent Space
Paper
•
2412.06769
•
Published
•
84
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
•
2411.04905
•
Published
•
124
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
140
Survey on Evaluation of LLM-based Agents
Paper
•
2503.16416
•
Published
•
89
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
•
2503.21460
•
Published
•
77
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
•
2503.21614
•
Published
•
39