File size: 3,186 Bytes
02502b3
5630eb5
02502b3
 
 
d3dbff4
02502b3
 
 
 
 
 
16ef304
02502b3
 
 
 
 
 
 
 
d3dbff4
2fc10a4
9336652
5630eb5
d3dbff4
2fc10a4
d3dbff4
 
02502b3
2fc10a4
d3dbff4
9336652
 
 
 
d3dbff4
9336652
 
d3dbff4
9336652
 
 
 
 
 
 
 
02502b3
9336652
 
d3dbff4
9336652
 
 
02502b3
 
 
 
 
d3dbff4
 
 
 
9336652
 
d3dbff4
9336652
 
d3dbff4
9336652
 
 
 
d3dbff4
9336652
 
 
 
 
d3dbff4
 
 
9336652
 
d3dbff4
9336652
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
base_model: bleta-meditor-27b
tags:
- text-generation-inference
- transformers
- albanian
- gemma3
- reasoning
- mathematics
- grpo
license: apache-2.0
language:
- al
inference:
  parameters:
    temperature: 0.7
    top_p: 0.95
    top_k: 64
    max_new_tokens: 512
---

# Bleta-Meditor 27B GRPO Albanian Reasoning Model

## Model Description
- **Developed by:** klei aliaj
- **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
- **License:** apache-2.0
- **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
- **Language:** Albanian
- **Framework:** Hugging Face Transformers

This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.

## Capabilities & Training

### Fine-tuning Approach
This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:

1. Follow a specific reasoning format with dedicated sections for workings and solutions
2. Produce correct mathematical solutions in Albanian
3. Show clear step-by-step reasoning processes

### Special Formatting
The model has been trained to follow a specific reasoning format:
- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
- Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags

### Training Configuration
- **Framework:** Hugging Face's TRL library
- **Optimization:** LoRA fine-tuning (r=8, alpha=8)
- **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
- **Language Focus:** Optimized for Albanian

## Technical Specifications

### Available Formats
This model is available in two formats:
- Standard adapter format (adapter_model.safetensors)
- GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp

### Bleta-Meditor Architecture Benefits
- 27B parameters
- 128K context window
- QK normalization
- 5 sliding + 1 global attention pattern
- 1024 sliding window attention
- Albanian language optimization

## Limitations
- While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
- The model's performance might vary depending on problem complexity and wording.
- Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.

## Acknowledgments
- Google for developing the Gemma 3 architecture
- Hugging Face for their TRL library and GRPO implementation

## Citation
If you use this model in your research, please cite:
```
@misc{klei_aliaj_bleta_meditor,
  author = {Klei Aliaj},
  title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
}
```