Barcenas 3b GRPO

Based on alpindale/Llama-3.2-3B-Instruct And trained with dataset openai/gsm8k

The objective of this model is to test the novel GRPO training used in DeepSeek R1. Using the reinforcement learning (RL) algorithm to improve the reasoning capabilities of the Llama-3.2-3B-Instruct.

Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽

Downloads last month
40
Safetensors
Model size
3.21B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Danielbrdz/Barcenas-3b-GRPO

Finetuned
(320)
this model
Quantizations
2 models

Dataset used to train Danielbrdz/Barcenas-3b-GRPO