johnnietien commited on
Commit
87fb4f8
·
verified ·
1 Parent(s): a163a60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
- base_model: unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit
 
3
  tags:
4
  - text-generation-inference
5
  - reasoning
@@ -16,10 +17,10 @@ language:
16
 
17
  - **Developed by:** johnnietien
18
  - **License:** apache-2.0
19
- - **Finetuned from model :** llama-3.2-1b-instruct-bnb-4bit
20
 
21
  This is one of my first Reasoning model can have an “aha moment” same as DeepSeek’s R1.
22
  We've enhanced the entire GRPO process, making it use 80% less VRAM than Hugging Face + FA2.
23
  This allows you to reproduce R1-Zero's "aha moment" on just 7GB of VRAM using llama-3.2-1b.
24
  Please note, this isn’t fine-tuning DeepSeek’s R1 distilled models or using distilled data from R1 for tuning.
25
- This is converting a standard model into a full-fledged reasoning model using GRPO.
 
1
  ---
2
+ base_model:
3
+ - meta-llama/Llama-3.2-3B-Instruct
4
  tags:
5
  - text-generation-inference
6
  - reasoning
 
17
 
18
  - **Developed by:** johnnietien
19
  - **License:** apache-2.0
20
+ - **Finetuned from model :** meta-llama/Llama-3.2-1B-Instruct
21
 
22
  This is one of my first Reasoning model can have an “aha moment” same as DeepSeek’s R1.
23
  We've enhanced the entire GRPO process, making it use 80% less VRAM than Hugging Face + FA2.
24
  This allows you to reproduce R1-Zero's "aha moment" on just 7GB of VRAM using llama-3.2-1b.
25
  Please note, this isn’t fine-tuning DeepSeek’s R1 distilled models or using distilled data from R1 for tuning.
26
+ This is converting a standard model into a full-fledged reasoning model using GRPO.