Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
Online-DPO-R1
Decision-Tree Reward Models
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
RLHFlow MATH Process Reward Model
updated
Nov 9, 2024
This is a collection of datasets and models of process reward modeling.
Upvote
11
+1
RLHFlow/Mistral-PRM-Data
Viewer
•
Updated
Nov 9, 2024
•
273k
•
400
•
12
RLHFlow/Mistral-GSM8K-Test
Viewer
•
Updated
Nov 2, 2024
•
1.32k
•
33
RLHFlow/Mistral-MATH500-Test
Viewer
•
Updated
Nov 9, 2024
•
500
•
68
RLHFlow/Llama3.1-8B-PRM-Mistral-Data
Text Generation
•
Updated
Nov 9, 2024
•
896
•
•
10
RLHFlow/Deepseek-MATH500-Test
Viewer
•
Updated
Nov 9, 2024
•
500
•
127
RLHFlow/Mistral-ORM-Data
Viewer
•
Updated
Nov 9, 2024
•
273k
•
150
•
2
RLHFlow/Deepseek-GSM8K-Test
Viewer
•
Updated
Nov 3, 2024
•
1.32k
•
77
RLHFlow/Deepseek-ORM-Data-Pairwise
Viewer
•
Updated
Nov 4, 2024
•
36k
•
43
•
1
RLHFlow/Mistral-ORM-Data-Pairwise
Viewer
•
Updated
Nov 3, 2024
•
37.9k
•
19
RLHFlow/Deepseek-PRM-Data
Viewer
•
Updated
Nov 9, 2024
•
253k
•
123
•
13
RLHFlow/Llama3.1-8B-ORM-Deepseek-Data
Text Generation
•
Updated
Nov 9, 2024
•
263
•
1
RLHFlow/Llama3.1-8B-ORM-Mistral-Data
Text Generation
•
Updated
Nov 9, 2024
•
131
RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
Updated
Nov 9, 2024
•
37.7k
•
•
34
RLHFlow/Deepseek-ORM-Data
Viewer
•
Updated
Nov 9, 2024
•
253k
•
47
•
3
Upvote
11
+7
Share collection
View history
Collection guide
Browse collections