Barcenas 3b GRPO

Based on alpindale/Llama-3.2-3B-Instruct And trained with dataset openai/gsm8k

The objective of this model is to test the novel GRPO training used in DeepSeek R1. Using the reinforcement learning (RL) algorithm to improve the reasoning capabilities of the Llama-3.2-3B-Instruct.

Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽

Downloads last month: 40

Safetensors

Model size

3.21B params

Tensor type

FP16

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Danielbrdz/Barcenas-3b-GRPO

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(320)

this model

Quantizations

2 models

Danielbrdz
/

Barcenas-3b-GRPO

Model tree for Danielbrdz/Barcenas-3b-GRPO

Dataset used to train Danielbrdz/Barcenas-3b-GRPO