SRFT / README.md
nielsr's picture
nielsr HF Staff
Remove file information
119794a verified
|
raw
history blame
501 Bytes
metadata
base_model:
  - open-r1/Qwen2.5-Math-7B-RoPE-300k
  - Qwen/Qwen2.5-Math-7B
datasets:
  - Elliott/Openr1-Math-46k-8192
license: mit
pipeline_tag: text-generation
library_name: transformers
arxiv: 2506.19767

📄 Introduction

Supervised Reinforcement Fine-Tuning (SRFT) is a single-stage method that unifies both fine-tuning paradigms through entropy-aware weighting mechanisms.

Paper: arXiv

Project Website: SRFT