klei1 commited on
Commit
493b043
·
verified ·
1 Parent(s): 4e2dd15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -8
README.md CHANGED
@@ -1,21 +1,88 @@
1
  ---
2
- base_model: unsloth/gemma-3-27b-it-unsloth-bnb-4bit
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
- - unsloth
7
  - gemma3
 
 
 
8
  license: apache-2.0
9
  language:
10
- - en
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** klei1
 
 
16
  - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/gemma-3-27b-it-unsloth-bnb-4bit
 
 
18
 
19
- This gemma3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: bleta-meditor-27b
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
+ - albanian
7
  - gemma3
8
+ - reasoning
9
+ - mathematics
10
+ - grpo
11
  license: apache-2.0
12
  language:
13
+ - al
14
+ inference:
15
+ parameters:
16
+ temperature: 0.7
17
+ top_p: 0.95
18
+ top_k: 64
19
+ max_new_tokens: 512
20
  ---
21
 
22
+ # Bleta-Meditor 27B GRPO Albanian Reasoning Model
23
 
24
+ ## Model Description
25
+ - **Developed by:** klei aliaj
26
+ - **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
27
  - **License:** apache-2.0
28
+ - **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
29
+ - **Language:** Albanian
30
+ - **Framework:** Hugging Face Transformers
31
 
32
+ This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.
33
 
34
+ ## Capabilities & Training
35
+
36
+ ### Fine-tuning Approach
37
+ This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
38
+
39
+ 1. Follow a specific reasoning format with dedicated sections for workings and solutions
40
+ 2. Produce correct mathematical solutions in Albanian
41
+ 3. Show clear step-by-step reasoning processes
42
+
43
+ ### Special Formatting
44
+ The model has been trained to follow a specific reasoning format:
45
+ - Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
46
+ - Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags
47
+
48
+ ### Training Configuration
49
+ - **Framework:** Hugging Face's TRL library
50
+ - **Optimization:** LoRA fine-tuning (r=8, alpha=8)
51
+ - **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
52
+ - **Language Focus:** Optimized for Albanian
53
+
54
+ ## Technical Specifications
55
+
56
+ ### Available Formats
57
+ This model is available in two formats:
58
+ - Standard adapter format (adapter_model.safetensors)
59
+ - GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
60
+
61
+ ### Bleta-Meditor Architecture Benefits
62
+ - 27B parameters
63
+ - 128K context window
64
+ - QK normalization
65
+ - 5 sliding + 1 global attention pattern
66
+ - 1024 sliding window attention
67
+ - Albanian language optimization
68
+
69
+ ## Limitations
70
+ - While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
71
+ - The model's performance might vary depending on problem complexity and wording.
72
+ - Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
73
+
74
+ ## Acknowledgments
75
+ - Google for developing the Gemma 3 architecture
76
+ - Hugging Face for their TRL library and GRPO implementation
77
+
78
+ ## Citation
79
+ If you use this model in your research, please cite:
80
+ ```
81
+ @misc{klei_aliaj_bleta_meditor,
82
+ author = {Klei Aliaj},
83
+ title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
84
+ year = {2025},
85
+ publisher = {Hugging Face},
86
+ howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
87
+ }
88
+ ```