RL_Papers in general - a Nazzaroth2 Collection

Nazzaroth2 's Collections

Reward Modeling

models to test out

RL_Papers in general

OCR

VLM RL Reasoning

LLM-External_information

llm_compression

LLM_Reasoning-ErrorCorrection

Loras

3D (nerfs, gaussians, generation etc.)

t2i consistency works

videogames_roleplay

small_or_multimodal_llm

manga_translation

RL_Papers in general

updated 15 days ago

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Paper • 2504.08672 • Published Apr 11 • 55
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Paper • 2504.12322 • Published Apr 11 • 28
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 85
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 117
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 178
Reasoning Models Better Express Their Confidence

Paper • 2505.14489 • Published May 20 • 20
VeriThinker: Learning to Verify Makes Reasoning Model Efficient

Paper • 2505.17941 • Published May 23 • 25
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 132
Reinforcement Pre-Training

Paper • 2506.08007 • Published about 1 month ago • 242
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

Paper • 2506.16141 • Published 21 days ago • 27