klei1
/

bleta-meditor-27b

Text Generation

text-generation-inference

Model card Files Files and versions Community

bleta-meditor-27b / README.md

klei1's picture

Update README.md

d3dbff4 verified 3 months ago

|

3.19 kB

	---
	base_model: bleta-meditor-27b
	tags:
	- text-generation-inference
	- transformers
	- albanian
	- gemma3
	- reasoning
	- mathematics
	- grpo
	license: apache-2.0
	language:
	- al
	inference:
	parameters:
	temperature: 0.7
	top_p: 0.95
	top_k: 64
	max_new_tokens: 512
	---

	# Bleta-Meditor 27B GRPO Albanian Reasoning Model

	## Model Description
	- Developed by: klei aliaj
	- Model type: Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
	- License: apache-2.0
	- Finetuned from model: Bleta-Meditor 27B (based on Gemma 3 architecture)
	- Language: Albanian
	- Framework: Hugging Face Transformers

	This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.

	## Capabilities & Training

	### Fine-tuning Approach
	This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:

	1. Follow a specific reasoning format with dedicated sections for workings and solutions
	2. Produce correct mathematical solutions in Albanian
	3. Show clear step-by-step reasoning processes

	### Special Formatting
	The model has been trained to follow a specific reasoning format:
	- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
	- Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags

	### Training Configuration
	- Framework: Hugging Face's TRL library
	- Optimization: LoRA fine-tuning (r=8, alpha=8)
	- Reward Functions: Format adherence, answer accuracy, and reasoning quality
	- Language Focus: Optimized for Albanian

	## Technical Specifications

	### Available Formats
	This model is available in two formats:
	- Standard adapter format (adapter_model.safetensors)
	- GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp

	### Bleta-Meditor Architecture Benefits
	- 27B parameters
	- 128K context window
	- QK normalization
	- 5 sliding + 1 global attention pattern
	- 1024 sliding window attention
	- Albanian language optimization

	## Limitations
	- While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
	- The model's performance might vary depending on problem complexity and wording.
	- Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.

	## Acknowledgments
	- Google for developing the Gemma 3 architecture
	- Hugging Face for their TRL library and GRPO implementation

	## Citation
	If you use this model in your research, please cite:
	```
	@misc{klei_aliaj_bleta_meditor,
	author = {Klei Aliaj},
	title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
	}
	```