|
--- |
|
base_model: |
|
- meta-llama/Llama-3.2-3B-Instruct |
|
tags: |
|
- text-generation-inference |
|
- reasoning |
|
- transformers |
|
- DeepSeek R1 |
|
- llama |
|
- gguf |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** johnnietien |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** meta-llama/Llama-3.2-1B-Instruct |
|
|
|
This is one of my first Reasoning model can have an “aha moment” same as DeepSeek’s R1. |
|
We've enhanced the entire GRPO process, making it use 80% less VRAM than Hugging Face + FA2. |
|
This allows you to reproduce R1-Zero's "aha moment" on just 7GB of VRAM using llama-3.2-1b. |
|
Please note, this isn’t fine-tuning DeepSeek’s R1 distilled models or using distilled data from R1 for tuning. |
|
This is converting a standard model into a full-fledged reasoning model using GRPO. |