Base Model: ReasoningEval/DeepSeek-R1-Distill-Qwen-7B-Huatuo-SFT-all | |
Training Epochs: 3 | |
Training Objective: RL | |
Training Data: ReasoningEval/Huatuo-RL |
Base Model: ReasoningEval/DeepSeek-R1-Distill-Qwen-7B-Huatuo-SFT-all | |
Training Epochs: 3 | |
Training Objective: RL | |
Training Data: ReasoningEval/Huatuo-RL |