stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated 30 days ago • 28
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated 30 days ago • 34
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated 30 days ago • 81
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated 30 days ago • 25
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated 30 days ago • 26
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated 30 days ago • 50
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated 29 days ago • 23
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated 29 days ago • 23
stellalisy/rethink_rlvr_reproduce-format-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated 29 days ago • 34
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step50 Text Generation • 8B • Updated 29 days ago • 23
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated 29 days ago • 32
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step100 Text Generation • 8B • Updated 30 days ago • 30
stellalisy/rethink_rlvr_reproduce-random-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated 30 days ago • 58
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step150 Text Generation • 8B • Updated 29 days ago • 37