Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
VoladorLuYu 's Collections
Research on LLM
Generative Multiple Modality
AutoAgent
Super Alignment
Foundation Machine Learning
Graph Foundation Multimodal Models
Efficient LLM
Symbolic LLM Reasoning
Data-efficient LLMs
LLM4Recsys
LLM+Fusion
Understanding LLM
LLM Reports
LLM+TextGen
synthetic code generation
LLM+Math
Diffusion Models
LLM+Diffusion
LLM+Architecture
Data Synthesis
LLM+Self-Play RL

LLM+Self-Play RL

updated 22 days ago
Upvote
-

  • Training Language Models to Self-Correct via Reinforcement Learning

    Paper • 2409.12917 • Published Sep 19, 2024 • 139

  • Recursive Introspection: Teaching Language Model Agents How to Self-Improve

    Paper • 2407.18219 • Published Jul 25, 2024 • 3

  • Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

    Paper • 2408.16293 • Published Aug 29, 2024 • 27

  • Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

    Paper • 2409.04787 • Published Sep 7, 2024 • 1

  • Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

    Paper • 2401.02009 • Published Jan 4, 2024 • 1

  • Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

    Paper • 2503.07572 • Published Mar 10 • 45

  • SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

    Paper • 2504.19162 • Published Apr 27 • 15
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs