|
--- |
|
base_model: bleta-meditor-27b |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- albanian |
|
- gemma3 |
|
- reasoning |
|
- mathematics |
|
- grpo |
|
license: apache-2.0 |
|
language: |
|
- al |
|
inference: |
|
parameters: |
|
temperature: 0.7 |
|
top_p: 0.95 |
|
top_k: 64 |
|
max_new_tokens: 512 |
|
--- |
|
|
|
# Bleta-Meditor 27B GRPO Albanian Reasoning Model |
|
|
|
## Model Description |
|
- **Developed by:** klei aliaj |
|
- **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks |
|
- **License:** apache-2.0 |
|
- **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture) |
|
- **Language:** Albanian |
|
- **Framework:** Hugging Face Transformers |
|
|
|
This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture. |
|
|
|
## Capabilities & Training |
|
|
|
### Fine-tuning Approach |
|
This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to: |
|
|
|
1. Follow a specific reasoning format with dedicated sections for workings and solutions |
|
2. Produce correct mathematical solutions in Albanian |
|
3. Show clear step-by-step reasoning processes |
|
|
|
### Special Formatting |
|
The model has been trained to follow a specific reasoning format: |
|
- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags |
|
- Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags |
|
|
|
### Training Configuration |
|
- **Framework:** Hugging Face's TRL library |
|
- **Optimization:** LoRA fine-tuning (r=8, alpha=8) |
|
- **Reward Functions:** Format adherence, answer accuracy, and reasoning quality |
|
- **Language Focus:** Optimized for Albanian |
|
|
|
## Technical Specifications |
|
|
|
### Available Formats |
|
This model is available in two formats: |
|
- Standard adapter format (adapter_model.safetensors) |
|
- GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp |
|
|
|
### Bleta-Meditor Architecture Benefits |
|
- 27B parameters |
|
- 128K context window |
|
- QK normalization |
|
- 5 sliding + 1 global attention pattern |
|
- 1024 sliding window attention |
|
- Albanian language optimization |
|
|
|
## Limitations |
|
- While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems. |
|
- The model's performance might vary depending on problem complexity and wording. |
|
- Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain. |
|
|
|
## Acknowledgments |
|
- Google for developing the Gemma 3 architecture |
|
- Hugging Face for their TRL library and GRPO implementation |
|
|
|
## Citation |
|
If you use this model in your research, please cite: |
|
``` |
|
@misc{klei_aliaj_bleta_meditor, |
|
author = {Klei Aliaj}, |
|
title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model}, |
|
year = {2025}, |
|
publisher = {Hugging Face}, |
|
howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}} |
|
} |
|
``` |