HanningZhang/Qwen-7B-grpo-plusplus-nocliphigher-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter9 Text Generation • Updated about 5 hours ago
HanningZhang/Qwen-7B-grpo-plusplus-nocliphigher-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter8 Text Generation • Updated about 16 hours ago • 80
HanningZhang/Qwen-7B-grpo-plusplus-nocliphigher-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter7 Text Generation • Updated about 16 hours ago • 4
HanningZhang/Qwen-7B-grpo-plusplus-nocliphigher-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter6 Text Generation • Updated about 16 hours ago • 4
HanningZhang/Qwen-7B-grpo-plusplus-nocliphigher-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter5 Text Generation • Updated about 16 hours ago • 4
HanningZhang/Qwen2.5-Math-7B-grpo-plusplus_em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter8 Text Generation • Updated 1 day ago • 5
HanningZhang/Qwen2.5-Math-7B-grpo-plusplus_em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter7 Text Generation • Updated 1 day ago • 5
HanningZhang/Qwen2.5-Math-7B-grpo-plusplus_em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter6 Text Generation • Updated 1 day ago • 5
HanningZhang/Qwen2.5-Math-7B-grpo-plusplus_em-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter5 Text Generation • Updated 1 day ago • 7
HanningZhang/Qwen-7B-grpo-plusplus-nocliphigher-sample1n8-sample8-filter1.0-insufficient0.0-a0.001-b2.0-iter4 Text Generation • Updated 2 days ago • 83
HanningZhang/scalebio_reasoning_nonthink_50k_with_system_and_cot Viewer • Updated 1 day ago • 50k • 6
HanningZhang/scalebio_reasoning_nonthink_20k_with_system_and_cot Viewer • Updated 1 day ago • 20k • 12