jdqqjr/DeepSeek-R1-Distill-Qwen-1.5B-FactGRPO-2reward-SubLenCheck-SingleBox-0.3E-40_30_150-kl-rebuild 2B • Updated Mar 27 • 5
jdqqjr/DeepSeek-R1-Distill-Qwen-1.5B-FactGRPO-2reward-SubLenCheck-SingleBox-0.15E-40_30_150-kl-rebuild 2B • Updated Mar 26 • 7
jdqqjr/Mistral-7B-Instruct-v0.2-8epoch-merged-rlhf Text Generation • 7B • Updated Jul 8, 2024 • 7 • 1