RL with outcome reward + format reward. https://arxiv.org/abs/2505.15117
Bowen
PeterJinGo
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 16 hours ago
Archive-models/musique
published
a dataset
about 16 hours ago
Archive-models/musique
updated
a dataset
about 16 hours ago
Archive-models/nq_hotpotqa_train_search_sample10
Organizations
Collections
3
models
42
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-ppo-v0.3
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo-v0.3
Updated
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.3
Updated
•
2
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-grpo-v0.3
Updated
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo-v0.3
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-grpo-v0.3
Updated
•
2
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-grpo-v0.3
Updated
•
21
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-32b-em-grpo-v0.3
Updated
•
155
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-it-em-grpo-v0.3
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-grpo-v0.3
Updated
•
1
datasets
13
PeterJinGo/wiki-18-e5-index-HNSW64
Updated
•
142
PeterJinGo/wiki-18-bm25-index
Updated
•
84
PeterJinGo/nq_hotpotqa_train
Viewer
•
Updated
•
221k
•
366
•
2
PeterJinGo/wiki-18-e5-index
Updated
•
1.97k
PeterJinGo/wiki-18-corpus
Updated
•
1.47k
PeterJinGo/ultrafeedback_first_5000
Viewer
•
Updated
•
5k
•
3
PeterJinGo/gsm8k-chat
Viewer
•
Updated
•
7.47k
•
12
PeterJinGo/math-zeroshot-chat
Viewer
•
Updated
•
7.5k
•
10
PeterJinGo/math-zeroshot
Viewer
•
Updated
•
7.5k
•
17
PeterJinGo/math2
Viewer
•
Updated
•
7.5k
•
8