johnnietien
/

JTR1-Llama32-1b-bestgsm8k-gguf-q4_k_m

text-generation-inference

Model card Files Files and versions Community

JTR1-Llama32-1b-bestgsm8k-gguf-q4_k_m / README.md

johnnietien's picture

Update README.md

87fb4f8 verified 4 months ago

|

history blame contribute delete

800 Bytes

	---
	base_model:
	- meta-llama/Llama-3.2-3B-Instruct
	tags:
	- text-generation-inference
	- reasoning
	- transformers
	- DeepSeek R1
	- llama
	- gguf
	license: apache-2.0
	language:
	- en
	---

	# Uploaded model

	- Developed by: johnnietien
	- License: apache-2.0
	- Finetuned from model : meta-llama/Llama-3.2-1B-Instruct

	This is one of my first Reasoning model can have an “aha moment” same as DeepSeek’s R1.
	We've enhanced the entire GRPO process, making it use 80% less VRAM than Hugging Face + FA2.
	This allows you to reproduce R1-Zero's "aha moment" on just 7GB of VRAM using llama-3.2-1b.
	Please note, this isn’t fine-tuning DeepSeek’s R1 distilled models or using distilled data from R1 for tuning.
	This is converting a standard model into a full-fledged reasoning model using GRPO.