Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ language:
|
|
18 |
- **License:** apache-2.0
|
19 |
- **Finetuned from model :** llama-3.2-1b-instruct-bnb-4bit
|
20 |
|
21 |
-
|
22 |
We've enhanced the entire GRPO process, making it use 80% less VRAM than Hugging Face + FA2.
|
23 |
This allows you to reproduce R1-Zero's "aha moment" on just 7GB of VRAM using llama-3.2-1b.
|
24 |
Please note, this isn’t fine-tuning DeepSeek’s R1 distilled models or using distilled data from R1 for tuning.
|
|
|
18 |
- **License:** apache-2.0
|
19 |
- **Finetuned from model :** llama-3.2-1b-instruct-bnb-4bit
|
20 |
|
21 |
+
This is one of my first Reasoning model can have an “aha moment” same as DeepSeek’s R1.
|
22 |
We've enhanced the entire GRPO process, making it use 80% less VRAM than Hugging Face + FA2.
|
23 |
This allows you to reproduce R1-Zero's "aha moment" on just 7GB of VRAM using llama-3.2-1b.
|
24 |
Please note, this isn’t fine-tuning DeepSeek’s R1 distilled models or using distilled data from R1 for tuning.
|