File size: 3,186 Bytes
02502b3 5630eb5 02502b3 d3dbff4 02502b3 16ef304 02502b3 d3dbff4 2fc10a4 9336652 5630eb5 d3dbff4 2fc10a4 d3dbff4 02502b3 2fc10a4 d3dbff4 9336652 d3dbff4 9336652 d3dbff4 9336652 02502b3 9336652 d3dbff4 9336652 02502b3 d3dbff4 9336652 d3dbff4 9336652 d3dbff4 9336652 d3dbff4 9336652 d3dbff4 9336652 d3dbff4 9336652 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
---
base_model: bleta-meditor-27b
tags:
- text-generation-inference
- transformers
- albanian
- gemma3
- reasoning
- mathematics
- grpo
license: apache-2.0
language:
- al
inference:
parameters:
temperature: 0.7
top_p: 0.95
top_k: 64
max_new_tokens: 512
---
# Bleta-Meditor 27B GRPO Albanian Reasoning Model
## Model Description
- **Developed by:** klei aliaj
- **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
- **License:** apache-2.0
- **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
- **Language:** Albanian
- **Framework:** Hugging Face Transformers
This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.
## Capabilities & Training
### Fine-tuning Approach
This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
1. Follow a specific reasoning format with dedicated sections for workings and solutions
2. Produce correct mathematical solutions in Albanian
3. Show clear step-by-step reasoning processes
### Special Formatting
The model has been trained to follow a specific reasoning format:
- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
- Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags
### Training Configuration
- **Framework:** Hugging Face's TRL library
- **Optimization:** LoRA fine-tuning (r=8, alpha=8)
- **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
- **Language Focus:** Optimized for Albanian
## Technical Specifications
### Available Formats
This model is available in two formats:
- Standard adapter format (adapter_model.safetensors)
- GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
### Bleta-Meditor Architecture Benefits
- 27B parameters
- 128K context window
- QK normalization
- 5 sliding + 1 global attention pattern
- 1024 sliding window attention
- Albanian language optimization
## Limitations
- While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
- The model's performance might vary depending on problem complexity and wording.
- Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
## Acknowledgments
- Google for developing the Gemma 3 architecture
- Hugging Face for their TRL library and GRPO implementation
## Citation
If you use this model in your research, please cite:
```
@misc{klei_aliaj_bleta_meditor,
author = {Klei Aliaj},
title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
}
``` |