--- library_name: transformers tags: - trl - grpo license: apache-2.0 datasets: - openai/gsm8k language: - en base_model: - google/gemma-3-4b-it --- # Gemma-3-4b Reasoning R1 Model Card Gemma-3-4b Reasoning is a transformer-based language model fine-tuned with GRPO (Group Reward Policy Optimization), leveraging the DeepSeek-R1 methodology. This model card describes the instructed version specifically optimized for reasoning tasks. The entire Gemma-3-4b Reasoning family is available under a permissive Apache 2.0 license. All training scripts and configurations used are publicly accessible. ## Model Details ### Description Gemma-3-4b Reasoning is a reasoning-focused fine-tuned model designed to excel in structured, logical problem-solving and mathematical reasoning. The training was performed on the GSM8K dataset using GRPO, enhancing the model's ability to reason step-by-step and provide structured explanations. ### Training Dataset - **GSM8K (English)**: Specialized dataset for mathematical and logical reasoning problems. ### Intended Use #### Direct Use The model is specifically designed for structured reasoning tasks, including: - Mathematical and logical reasoning - Multi-step problem solving - Instruction-based reasoning #### Out-of-scope Use This model should not be used for unethical or malicious activities that breach legal and ethical standards. ## How to Use The model uses structured XML templates for dialogue and reasoning tasks: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "ericrisco/gemma-3-4b-reasoning" prompt = "A cyclist travels 60 km in 3 hours at a constant speed. If he maintains the same speed, how many kilometers will he travel in 5 hours?" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype=torch.bfloat16 ) messages = [{"role": "user", "content": prompt}] input_text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=200) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` # Performance The **Gemma-3-4b Reasoning** model exhibits robust internal **Chain-of-Thought (CoT)** capabilities, consistently demonstrating detailed explanations and structured problem-solving skills across reasoning tasks. ## Limitations The model is primarily optimized for **numeric and structured reasoning** and might produce less accurate or unexpected results when applied to unrelated tasks. ## Citations - *Gemma Multimodal Reasoning Model* by Google - *GRPO Implementation* by TRL ## Author **Eric Risco**