Supervised Reinforcement Fine-Tuning (SRFT) is a single-stage method that unifies both fine-tuning paradigms through entropy-aware weighting mechanisms.
Paper: arXiv
Project Website: SRFT
Chat template
Files info
Base model