upup-ashton-wang's picture
Update README.md
3b20472 verified
metadata
license: apache-2.0
datasets:
  - knoveleng/open-s1
base_model:
  - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
library_name: peft
language:
  - en
  - zh
pipeline_tag: question-answering
tags:
  - reasoning

Introduction

Tina (Tiny Reasoning Models via LoRA) models are all fine-tuned adapters on the base model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This LoRA adapter in this repo is fine-tuned with the dataset knoveleng/open-s1. Please refer to our paper Tina: Tiny Reasoning Models via LoRA for more training details.

Example Usage

The Tina model is meant to be used in combination with the base model as a standard adapter. Particularly, we release all checkpoints we have for each Tina model and one could select different checkpoint to use by specifying the subfolder.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
  "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
  device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
  "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
)

model = PeftModel.from_pretrained(
  base_model,
  "Tina-Yi/R1-Distill-Qwen-1.5B-Open-RS1",
  subfolder="checkpoint-800" # checkpoint 800 is the best
)