Uploaded model

Developed by: tcotter
License: apache-2.0
Finetuned from model : unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit

Finetuned Qwen1.5B (R1 Distilled Version) on this dataset, which comes from this dataset but with an additional "summary" produced by an in-house synthetic data generator.

This LoRA is therefore a LoRA which helps the model return a "\n\nFinal Answer: ..." after it's reasoning and initial response steps.

See this paper for more details.

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.

tcotter
/

DeepSeek-R1-Qwen-1.5B-unsloth-bnb-4bit-LoRA-Adapter

Uploaded model

Model tree for tcotter/DeepSeek-R1-Qwen-1.5B-unsloth-bnb-4bit-LoRA-Adapter