open-r1/README · DeepSeek R1 Replication on Qwen 2.5 1.5B for Unstructured to Structured JSON Conversion

Feb 2

•

We reran the experiments on 8*H100 GPUS with improvements in prompt and format / answer reward (which we made on our own because the problemset was unstructured text to structured conversion). The model gave much better results than last time and it learned with time to earn formatting reward as well as json structuring reward.

Normally Qwen 2.5 1.5B was not giving well formatted {-think-} and {-answer-} blocks. Many times it was giving multiple and irrelevant things (even multiple blocks of think and answer) when prompted. Even the DeepSeek Distilled R1 on Qwen 2.5 1.5B was not giving well formatted blocks despite it being trained in that way only.

Here is the model link: https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b
Model ID: MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b

It is created on a dataset which we created synthetically.

Dataset Link: https://huggingface.co/datasets/bhaviktheslider/JSON-Unstructured-Structured
Dataset ID: bhaviktheslider/JSON-Unstructured-Structured

You can find metrics of the model training in tensorboard section on hf.

Time Taken: 30 hours approx for 450 steps

El-chapoo

Feb 4

i think the downside of thinking models is that even for simple question they may take alot of thinking tokens but i think we should have dataset to Train llms to figure out when to use thinking strategy and when to simply answer the question like regular llms do

bhaviktheslider

Feb 4

@El-chapoo in the case of the model which I trained above using R1 strategy I kept the formatting reward and answer reward rules very strict that it was able to come to an optimal completion length after training for 450 steps. You can find the metrics in the tensorboard section: https://huggingface.co/MasterControlAIML/DeepSeek-R1-Qwen-2.5-1.5b-Latest-Unstructured-To-Structured/tensorboard