--- base_model: bleta-meditor-27b tags: - text-generation-inference - transformers - albanian - gemma3 - reasoning - mathematics - grpo license: apache-2.0 language: - al inference: parameters: temperature: 0.7 top_p: 0.95 top_k: 64 max_new_tokens: 512 --- # Bleta-Meditor 27B GRPO Albanian Reasoning Model ## Model Description - **Developed by:** klei aliaj - **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks - **License:** apache-2.0 - **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture) - **Language:** Albanian - **Framework:** Hugging Face Transformers This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture. ## Capabilities & Training ### Fine-tuning Approach This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to: 1. Follow a specific reasoning format with dedicated sections for workings and solutions 2. Produce correct mathematical solutions in Albanian 3. Show clear step-by-step reasoning processes ### Special Formatting The model has been trained to follow a specific reasoning format: - Working out/reasoning sections are enclosed within `` and `` tags - Final solutions are provided between `` and `` tags ### Training Configuration - **Framework:** Hugging Face's TRL library - **Optimization:** LoRA fine-tuning (r=8, alpha=8) - **Reward Functions:** Format adherence, answer accuracy, and reasoning quality - **Language Focus:** Optimized for Albanian ## Technical Specifications ### Available Formats This model is available in two formats: - Standard adapter format (adapter_model.safetensors) - GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp ### Bleta-Meditor Architecture Benefits - 27B parameters - 128K context window - QK normalization - 5 sliding + 1 global attention pattern - 1024 sliding window attention - Albanian language optimization ## Limitations - While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems. - The model's performance might vary depending on problem complexity and wording. - Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain. ## Acknowledgments - Google for developing the Gemma 3 architecture - Hugging Face for their TRL library and GRPO implementation ## Citation If you use this model in your research, please cite: ``` @misc{klei_aliaj_bleta_meditor, author = {Klei Aliaj}, title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}} } ```