Model Card for Model ID

This LoRA adapter was distilled from deepseek-ai/DeepSeek-R1 and uses meta-llama/Llama-3.1-70B-Instruct as a base. Despite being a mere rank-32 LoRA adapter on top of Llama-3.1-70B-Instruct, trained on less than 10k prompt/completion examples, it significantly outperforms the base model, gpt-4o, and typically Claude-3.5-Sonnet on MATH-500, AIME24, and GPQA Diamond, showing that system-2 style thinking can be cheaply trained in very small numbers of parameters.

Model Details

The model was extracted by running prompts (not completions) from the following datasets through DeepSeek-R1 to generate a dataset of prompt/completion pairs using unfat:

  • EleutherAI/hendrycks_math (train split)
  • PrimeIntellect/verifiable-coding-problems (train split, first 500 rows)
  • mlabonne/harmless_alpaca (train split, first 1k rows)
  • euclaise/logician (train split, first 1k rows)
  • isaiahbjork/cot-logic-reasoning (train split, first 1.4k rows)

We then modestly cleaned the generated dataset by stripping any completions that were missing closing </think> tags, and trained for 2 epochs using Together.ai's serverless finetunning platform. The total cost for extracting the data from glhf.chat's API, and training the model on Together, was less than $450.

We suspect even better performance could be achieved with larger training sets and improved data cleaning, e.g. formally-verifying R1 outputs and only training on correct answers, as OpenR1-Math-220k attempts.

Model Description

  • Developed by: @reissbaker
  • Funded by: Synthetic Lab
  • License: Apache 2.0
  • Finetuned from model: Llama 3.1 70B Instruct

How to Get Started with the Model

Run the model with one click on glhf.chat by copying this repo URL and launching the model.

Eval results

We used open-R1 to evaluate the models, since unlike some other popular eval frameworks, it has successfully reproduced the R1-distill evals from DeepSeek. The results are as follows:

Model MATH-500 AIME24 GPQA Diamond
reissbaker/r1-llama-70b-distill-lora 86.8 20.0 61.6
meta-llama/Llama-3.1-70B-Instruct 56.6 10.0 45.9
gpt-4o-0513 † 74.6 9.3 49.9
Claude-3.5-Sonnet-1022 † 78.3 16.0 65.0
deepseek-ai/DeepSeek-R1-Distill-Llama-70B † 94.5 70.0 65.2

† Eval results reported by DeepSeek

Our LoRA significantly outperforms non-reasoning models on most benchmarks, and is a large improvement over the base Llama-3.1-70B-Instruct model. DeepSeek's full-parameter finetune (on larger datasets) significantly outperforms it on these tasks, however.

Training Hyperparameters

  • 4e-4 LR
  • Rank 32
  • Alpha 16
  • 0.01 dropout
Downloads last month
6,511
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for reissbaker/r1-llama-70b-distill-lora

Adapter
(24)
this model