---
library_name: transformers
tags:
- trl
- grpo
license: apache-2.0
datasets:
- openai/gsm8k
language:
- en
base_model:
- google/gemma-3-4b-it
---
# Gemma-3-4b Reasoning R1 Model Card

Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO (Group Reward Policy Optimization), leveraging the DeepSeek-R1 methodology. This model card describes the instructed version specifically optimized for reasoning tasks.

The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible.

## Model Details

### Description

Gemma-3-4b Reasoning is a reasoning-focused fine-tuned model designed to excel in structured, logical problem-solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step-by-step and provide structured explanations.

### Training Dataset

- **GSM8K (English)**: Specialized dataset for mathematical and logical reasoning problems.

### Intended Use

#### Direct Use

The model is specifically designed for structured reasoning tasks, including:

- Mathematical and logical reasoning
- Multi-step problem solving
- Instruction-based reasoning

#### Out-of-scope Use

This model should not be used for unethical or malicious activities that breach legal and ethical standards.

## How to Use

The model uses structured XML templates for dialogue and reasoning tasks:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "ericrisco/gemma-3-4b-reasoning"

prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype=torch.bfloat16
)

messages = [{"role": "user", "content": prompt}]

input_text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(response)
```

# Performance

The **Gemma-3-4b Reasoning** model exhibits robust internal **Chain-of-Thought (CoT)** capabilities, consistently demonstrating detailed explanations and structured problem-solving skills across reasoning tasks.

## Limitations

The model is primarily optimized for **numeric and structured reasoning** and might produce less accurate or unexpected results when applied to unrelated tasks.

## Citations

- *Gemma Multimodal Reasoning Model* by Google  
- *GRPO Implementation* by TRL  

## Author

**Eric Risco**