Model Card for Model ID

This LoRA adapter was distilled from deepseek-ai/DeepSeek-R1 and uses meta-llama/Llama-3.1-70B-Instruct as a base. Despite being a mere rank-32 LoRA adapter on top of Llama-3.1-70B-Instruct, trained on less than 10k prompt/completion examples, it significantly outperforms the base model, gpt-4o, and typically Claude-3.5-Sonnet on MATH-500, AIME24, and GPQA Diamond, showing that system-2 style thinking can be cheaply trained in very small numbers of parameters.

Model Details

The model was extracted by running prompts (not completions) from the following datasets through DeepSeek-R1 to generate a dataset of prompt/completion pairs using unfat:

EleutherAI/hendrycks_math (train split)
PrimeIntellect/verifiable-coding-problems (train split, first 500 rows)
mlabonne/harmless_alpaca (train split, first 1k rows)
euclaise/logician (train split, first 1k rows)
isaiahbjork/cot-logic-reasoning (train split, first 1.4k rows)

We then modestly cleaned the generated dataset by stripping any completions that were missing closing </think> tags, and trained for 2 epochs using Together.ai's serverless finetunning platform. The total cost for extracting the data from glhf.chat's API, and training the model on Together, was less than $450.

We suspect even better performance could be achieved with larger training sets and improved data cleaning, e.g. formally-verifying R1 outputs and only training on correct answers, as OpenR1-Math-220k attempts.

Model Description

Developed by: @reissbaker
Funded by: Synthetic Lab
License: Apache 2.0
Finetuned from model: Llama 3.1 70B Instruct

How to Get Started with the Model

Run the model with one click on glhf.chat by copying this repo URL and launching the model.

Eval results

We used open-R1 to evaluate the models, since unlike some other popular eval frameworks, it has successfully reproduced the R1-distill evals from DeepSeek. The results are as follows:

Model	MATH-500	AIME24	GPQA Diamond
reissbaker/r1-llama-70b-distill-lora	86.8	20.0	61.6
meta-llama/Llama-3.1-70B-Instruct	56.6	10.0	45.9
gpt-4o-0513 †	74.6	9.3	49.9
Claude-3.5-Sonnet-1022 †	78.3	16.0	65.0
deepseek-ai/DeepSeek-R1-Distill-Llama-70B †	94.5	70.0	65.2

^{_{† Eval results reported by DeepSeek}}

Our LoRA significantly outperforms non-reasoning models on most benchmarks, and is a large improvement over the base Llama-3.1-70B-Instruct model. DeepSeek's full-parameter finetune (on larger datasets) significantly outperforms it on these tasks, however.

Training Hyperparameters

4e-4 LR
Rank 32
Alpha 16
0.01 dropout