Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
Minimal-RL
Online-DPO-R1
Decision-Tree Reward Models
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
Minimal-RL
updated
5 days ago
Upvote
1
RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
Text Generation
•
Updated
7 days ago
•
6
•
1
RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej
Text Generation
•
Updated
7 days ago
•
3
•
1
Upvote
1
Share collection
View history
Collection guide
Browse collections