Exploration with a more stable RL pipeline with outcome-only reward and scaled-up LLMs.
Bowen
PeterJinGo
AI & ML interests
None yet
Recent Activity
updated
a model
4 days ago
rubricrm/qwen25-7B-LR1e-6-filtered-math10k-code8k-distill-claude0419
published
a model
4 days ago
rubricrm/qwen25-7B-LR1e-6-filtered-math10k-code8k-distill-claude0419
updated
a model
9 days ago
rubricrm/rubric_rm_qwen2.5_7B_LR1.0e-6_filtered_sky_code_8k_math_10k_rubric_evidence_classify_4k4k_PPO
Organizations
Collections
2
Preliminary checkpoints with outcome-only RL.
-
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 28 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-ppo
Updated • 151 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-grpo
Updated • 2 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-ppo
Updated • 27
models
31
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.2
Updated
•
3
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-it-em-ppo-v0.2
Updated
•
2
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.2
Updated
•
9
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo-v0.2
Updated
•
23
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.2
Updated
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-14b-it-em-ppo-v0.2
Updated
•
4
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-7b-it-em-ppo-v0.2
Updated
•
3
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-7b-em-ppo-v0.2
Updated
•
3
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-grpo-v0.2
Updated
•
20
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-grpo-v0.2
Updated
•
8
datasets
13
PeterJinGo/wiki-18-e5-index-HNSW64
Updated
•
69
PeterJinGo/wiki-18-bm25-index
Updated
•
119
PeterJinGo/nq_hotpotqa_train
Viewer
•
Updated
•
221k
•
457
•
2
PeterJinGo/wiki-18-e5-index
Updated
•
2.92k
PeterJinGo/wiki-18-corpus
Updated
•
1.88k
PeterJinGo/ultrafeedback_first_5000
Viewer
•
Updated
•
5k
•
10
PeterJinGo/gsm8k-chat
Viewer
•
Updated
•
7.47k
•
14
PeterJinGo/math-zeroshot-chat
Viewer
•
Updated
•
7.5k
•
18
PeterJinGo/math-zeroshot
Viewer
•
Updated
•
7.5k
•
18
PeterJinGo/math2
Viewer
•
Updated
•
7.5k
•
17