metadata
base_model: bleta-meditor-27b
tags:
- text-generation-inference
- transformers
- gemma3
- reasoning
- mathematics
- grpo
license: apache-2.0
language:
- al
inference:
parameters:
temperature: 0.7
top_p: 0.95
top_k: 64
max_new_tokens: 512
Gemma 3 27B GRPO Reasoning Model
Model Description
- Developed by: klei aliaj
- Model type: Gemma 3 27B fine-tuned with GRPO for reasoning tasks
- License: apache-2.0
- Finetuned from model: Google's Gemma 3 27B instruction-tuned model
- Framework: Hugging Face Transformers
This model is a fine-tuned version of Google's Gemma 3 27B instruction-tuned model, enhanced using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities.
Capabilities & Training
Fine-tuning Approach
This model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
- Follow a specific reasoning format with dedicated sections for workings and solutions
- Produce correct mathematical solutions
- Show clear step-by-step reasoning processes
Special Formatting
The model has been trained to follow a specific reasoning format:
- Working out/reasoning sections are enclosed within
<start_working_out>
and<end_working_out>
tags - Final solutions are provided between
<SOLUTION>
and</SOLUTION>
tags
Training Configuration
- Framework: Hugging Face's TRL library
- Optimization: LoRA fine-tuning (r=8, alpha=8)
- Reward Functions: Format adherence, answer accuracy, and reasoning quality
Technical Specifications
Available Formats
This model is available in two formats:
- Standard adapter format (adapter_model.safetensors)
- GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
Gemma 3 Architecture Benefits
- 27B parameters, trained on 14 trillion tokens
- 128K context window (extended from 32K)
- QK normalization (replaced attention softcapping)
- 5 sliding + 1 global attention pattern
- 1024 sliding window attention
Limitations
- While this model excels at reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
- The model's performance might vary depending on problem complexity and wording.
- Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
Acknowledgments
- Google for developing the Gemma 3 model family
- Hugging Face for their TRL library and GRPO implementation
Citation
If you use this model in your research, please cite:
@misc{klei1_gemma3_grpo,
author = {klei1},
title = {Gemma 3 27B GRPO Reasoning Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/klei1/gemma-3-27b-grpo}}
}