klei1
/

bleta-meditor-27b

@@ -3,6 +3,7 @@ base_model: bleta-meditor-27b
 tags:
 - text-generation-inference
 - transformers
 - gemma3
 - reasoning
 - mathematics
@@ -18,24 +19,25 @@ inference:
     max_new_tokens: 512
 ---
-# Gemma 3 27B GRPO Reasoning Model
 ## Model Description
 - **Developed by:** klei aliaj
-- **Model type:** Gemma 3 27B fine-tuned with GRPO for reasoning tasks
 - **License:** apache-2.0
-- **Finetuned from model:** Google's Gemma 3 27B instruction-tuned model
 - **Framework:** Hugging Face Transformers
-This model is a fine-tuned version of Google's Gemma 3 27B instruction-tuned model, enhanced using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities.
 ## Capabilities & Training
 ### Fine-tuning Approach
-This model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
 1. Follow a specific reasoning format with dedicated sections for workings and solutions
-2. Produce correct mathematical solutions
 3. Show clear step-by-step reasoning processes
 ### Special Formatting
@@ -47,6 +49,7 @@ The model has been trained to follow a specific reasoning format:
 - **Framework:** Hugging Face's TRL library
 - **Optimization:** LoRA fine-tuning (r=8, alpha=8)
 - **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
 ## Technical Specifications
@@ -55,30 +58,31 @@ This model is available in two formats:
 - Standard adapter format (adapter_model.safetensors)
 - GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
-### Gemma 3 Architecture Benefits
-- 27B parameters, trained on 14 trillion tokens
-- 128K context window (extended from 32K)
-- QK normalization (replaced attention softcapping)
 - 5 sliding + 1 global attention pattern
 - 1024 sliding window attention
 ## Limitations
-- While this model excels at reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
 - The model's performance might vary depending on problem complexity and wording.
 - Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
 ## Acknowledgments
-- Google for developing the Gemma 3 model family
 - Hugging Face for their TRL library and GRPO implementation
 ## Citation
 If you use this model in your research, please cite:
 ```
-@misc{klei1_gemma3_grpo,
-  author = {klei1},
-  title = {Gemma 3 27B GRPO Reasoning Model},
   year = {2025},
   publisher = {Hugging Face},
-  howpublished = {\url{https://huggingface.co/klei1/gemma-3-27b-grpo}}
 }
 ```

 tags:
 - text-generation-inference
 - transformers
+- albanian
 - gemma3
 - reasoning
 - mathematics
     max_new_tokens: 512
 ---
+# Bleta-Meditor 27B GRPO Albanian Reasoning Model
 ## Model Description
 - **Developed by:** klei aliaj
+- **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
 - **License:** apache-2.0
+- **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
+- **Language:** Albanian
 - **Framework:** Hugging Face Transformers
+This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.
 ## Capabilities & Training
 ### Fine-tuning Approach
+This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
 1. Follow a specific reasoning format with dedicated sections for workings and solutions
+2. Produce correct mathematical solutions in Albanian
 3. Show clear step-by-step reasoning processes
 ### Special Formatting
 - **Framework:** Hugging Face's TRL library
 - **Optimization:** LoRA fine-tuning (r=8, alpha=8)
 - **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
+- **Language Focus:** Optimized for Albanian
 ## Technical Specifications
 - Standard adapter format (adapter_model.safetensors)
 - GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
+### Bleta-Meditor Architecture Benefits
+- 27B parameters
+- 128K context window
+- QK normalization
 - 5 sliding + 1 global attention pattern
 - 1024 sliding window attention
+- Albanian language optimization
 ## Limitations
+- While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
 - The model's performance might vary depending on problem complexity and wording.
 - Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
 ## Acknowledgments
+- Google for developing the Gemma 3 architecture
 - Hugging Face for their TRL library and GRPO implementation
 ## Citation
 If you use this model in your research, please cite:
 ```
+@misc{klei_aliaj_bleta_meditor,
+  author = {Klei Aliaj},
+  title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
   year = {2025},
   publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
 }
 ```