SoheylM/DeepSeek-R1-Distill-Qwen-14B-GRPO Text Generation • 15B • Updated about 11 hours ago • 16 • 1
AmberYifan/Qwen2.5-7B-Instruct-userfeedback-SFT-SPIN-iter1 Text Generation • 8B • Updated 7 days ago • 38 • 1
AmberYifan/Qwen2.5-7B-Instruct-userfeedback-SPIN-iter1 Text Generation • 8B • Updated 7 days ago • 31 • 1
AmberYifan/Qwen2.5-7B-Instruct-userfeedback-SPIN-iter2 Text Generation • 8B • Updated 7 days ago • 20 • 1
AmberYifan/Qwen2.5-7B-Instruct-userfeedback-4k-iter2 Text Generation • 8B • Updated 3 days ago • 15 • 1