a-F1/math-qwen3-0.6b-reinforce-moa-3x1-unshared-actor_lr7.5e-7-epoch2-modenull Updated about 1 hour ago
a-F1/math-qwen3-0.6b-reinforce-moa-3x1-unshared-actor_lr7.5e-7-epoch2-modequality Updated about 20 hours ago
a-F1/aime_2024-DeepSeek-R1-Distill-Qwen-1.5B-beam_search-prm-completions Viewer • Updated May 10 • 4 • 41
a-F1/aime_2024-DeepSeek-R1-Distill-Qwen-1.5B-best_of_n-prm-completions Viewer • Updated May 10 • 4 • 14
a-F1/DeepSeek-R1-Distill-Qwen-1.5B-Llama3.1-8B-PRM-Deepseek-Data-best_of_n-prm-completions Updated May 9 • 7
a-F1/DeepSeek-R1-Distill-Qwen-1.5B-Llama3.1-8B-PRM-Deepseek-Data-beam_search-prm-completions Viewer • Updated May 8 • 8 • 19
a-F1/DeepSeek-R1-Distill-Qwen-7B-Llama3.1-8B-PRM-Deepseek-Data-best_of_n-prm-completions Viewer • Updated May 7 • 7 • 12