Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
daqc 's Collections
Dataset Best Practices
LRMs
Agents
Thinkers
Low-Resource Data
Reasoning LLMs
Multilingual
Read later
SLMs
Safety
Reinforcement
on-Device (phone)
Frameworks
Domain-specific

Reinforcement

updated Jan 27
Upvote
-

  • OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

    Paper • 2412.16849 • Published Dec 22, 2024 • 9

  • Offline Reinforcement Learning for LLM Multi-Step Reasoning

    Paper • 2412.16145 • Published Dec 20, 2024 • 39

  • DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Paper • 2501.12948 • Published Jan 22 • 404
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs