klei1
/

bleta-logjike-27b

@@ -1,21 +1,88 @@
 ---
-base_model: unsloth/gemma-3-27b-it-unsloth-bnb-4bit
 tags:
 - text-generation-inference
 - transformers
-- unsloth
 - gemma3
 license: apache-2.0
 language:
-- en
 ---
-# Uploaded finetuned  model
-- **Developed by:** klei1
 - **License:** apache-2.0
-- **Finetuned from model :** unsloth/gemma-3-27b-it-unsloth-bnb-4bit
-This gemma3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model: bleta-meditor-27b
 tags:
 - text-generation-inference
 - transformers
+- albanian
 - gemma3
+- reasoning
+- mathematics
+- grpo
 license: apache-2.0
 language:
+- al
+inference:
+  parameters:
+    temperature: 0.7
+    top_p: 0.95
+    top_k: 64
+    max_new_tokens: 512
 ---
+# Bleta-Meditor 27B GRPO Albanian Reasoning Model
+## Model Description
+- **Developed by:** klei aliaj
+- **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
 - **License:** apache-2.0
+- **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
+- **Language:** Albanian
+- **Framework:** Hugging Face Transformers
+This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.
+## Capabilities & Training
+### Fine-tuning Approach
+This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
+1. Follow a specific reasoning format with dedicated sections for workings and solutions
+2. Produce correct mathematical solutions in Albanian
+3. Show clear step-by-step reasoning processes
+### Special Formatting
+The model has been trained to follow a specific reasoning format:
+- Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
+- Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags
+### Training Configuration
+- **Framework:** Hugging Face's TRL library
+- **Optimization:** LoRA fine-tuning (r=8, alpha=8)
+- **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
+- **Language Focus:** Optimized for Albanian
+## Technical Specifications
+### Available Formats
+This model is available in two formats:
+- Standard adapter format (adapter_model.safetensors)
+- GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
+### Bleta-Meditor Architecture Benefits
+- 27B parameters
+- 128K context window
+- QK normalization
+- 5 sliding + 1 global attention pattern
+- 1024 sliding window attention
+- Albanian language optimization
+## Limitations
+- While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
+- The model's performance might vary depending on problem complexity and wording.
+- Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
+## Acknowledgments
+- Google for developing the Gemma 3 architecture
+- Hugging Face for their TRL library and GRPO implementation
+## Citation
+If you use this model in your research, please cite:
+```
+@misc{klei_aliaj_bleta_meditor,
+  author = {Klei Aliaj},
+  title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
+}
+```