Training & test sets and finetuned models
AI & ML interests
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/
Recent Activity
models
35

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-hard
Updated

RLHFlow/Qwen2.5-Math-1-5B-Reinforce-Ada-balance-easy
2B
•
Updated
•
13

RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-easy
8B
•
Updated
•
11

RLHFlow/Qwen2.5-Math-7B-Reinforce-Ada-balance-hard
8B
•
Updated
•
7

RLHFlow/Qwen3-4B-Instruct-2507-Reinforce-Ada-balance-hard
4B
•
Updated
•
8

RLHFlow/Llama-3.2-3B-Instruct-Reinforce-Ada-balance-hard
4B
•
Updated
•
5

RLHFlow/Qwen2.5-Math-7B-Zero-RAFTpp
Text Generation
•
8B
•
Updated
•
2
•
1

RLHFlow/Qwen2.5-Math-7B-Zero-Reinforce-Rej
Text Generation
•
8B
•
Updated
•
1
•
1

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data
Text Generation
•
8B
•
Updated
•
3.03k
•
•
36

RLHFlow/Qwen2.5-7B-SFT
8B
•
Updated
•
1
datasets
88
RLHFlow/reinforce_ada_simple_prompt_1-5b
Viewer
•
Updated
•
25k
•
10
RLHFlow/reinforce_ada_hard_prompt_1-5b
Viewer
•
Updated
•
13.3k
•
9
RLHFlow/reinforce_ada_hard_prompt_llama
Viewer
•
Updated
•
15k
•
11
RLHFlow/reinforce_ada_easy_prompt
Viewer
•
Updated
•
24.3k
•
12
RLHFlow/reinforce_ada_hard_prompt
Viewer
•
Updated
•
15.7k
•
80
RLHFlow/self_rewarding_turn2_example
Updated
•
6
RLHFlow/self_rewarding_turn1_with_rewards_example
Updated
•
11
RLHFlow/self_rewarding_rl_prompt
Updated
•
11
RLHFlow/self_rewarding_sft_prompt
Viewer
•
Updated
•
40k
•
10
RLHFlow/self_rewarding_ift_example_raw_data1
Viewer
•
Updated
•
16.3k
•
6