Barcenas 3b GRPO
Based on alpindale/Llama-3.2-3B-Instruct And trained with dataset openai/gsm8k
The objective of this model is to test the novel GRPO training used in DeepSeek R1. Using the reinforcement learning (RL) algorithm to improve the reasoning capabilities of the Llama-3.2-3B-Instruct.
Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽
- Downloads last month
- 40
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.