bleta-meditor-27b / README.md
klei1's picture
Update README.md
d3dbff4 verified
|
raw
history blame
3.19 kB
---
base_model: bleta-meditor-27b
tags:
- text-generation-inference
- transformers
- albanian
- gemma3
- reasoning
- mathematics
- grpo
license: apache-2.0
language:
- al
inference:
parameters:
temperature: 0.7
top_p: 0.95
top_k: 64
max_new_tokens: 512
---
# Bleta-Meditor 27B GRPO Albanian Reasoning Model
## Model Description
- **Developed by:** klei aliaj
- **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
- **License:** apache-2.0
- **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
- **Language:** Albanian
- **Framework:** Hugging Face Transformers
This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.
## Capabilities & Training
### Fine-tuning Approach
This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
1. Follow a specific reasoning format with dedicated sections for workings and solutions
2. Produce correct mathematical solutions in Albanian
3. Show clear step-by-step reasoning processes
### Special Formatting
The model has been trained to follow a specific reasoning format:
- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
- Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags
### Training Configuration
- **Framework:** Hugging Face's TRL library
- **Optimization:** LoRA fine-tuning (r=8, alpha=8)
- **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
- **Language Focus:** Optimized for Albanian
## Technical Specifications
### Available Formats
This model is available in two formats:
- Standard adapter format (adapter_model.safetensors)
- GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
### Bleta-Meditor Architecture Benefits
- 27B parameters
- 128K context window
- QK normalization
- 5 sliding + 1 global attention pattern
- 1024 sliding window attention
- Albanian language optimization
## Limitations
- While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
- The model's performance might vary depending on problem complexity and wording.
- Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
## Acknowledgments
- Google for developing the Gemma 3 architecture
- Hugging Face for their TRL library and GRPO implementation
## Citation
If you use this model in your research, please cite:
```
@misc{klei_aliaj_bleta_meditor,
author = {Klei Aliaj},
title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
}
```