klei1 commited on
Commit
d3dbff4
·
verified ·
1 Parent(s): 5630eb5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -16
README.md CHANGED
@@ -3,6 +3,7 @@ base_model: bleta-meditor-27b
3
  tags:
4
  - text-generation-inference
5
  - transformers
 
6
  - gemma3
7
  - reasoning
8
  - mathematics
@@ -18,24 +19,25 @@ inference:
18
  max_new_tokens: 512
19
  ---
20
 
21
- # Gemma 3 27B GRPO Reasoning Model
22
 
23
  ## Model Description
24
  - **Developed by:** klei aliaj
25
- - **Model type:** Gemma 3 27B fine-tuned with GRPO for reasoning tasks
26
  - **License:** apache-2.0
27
- - **Finetuned from model:** Google's Gemma 3 27B instruction-tuned model
 
28
  - **Framework:** Hugging Face Transformers
29
 
30
- This model is a fine-tuned version of Google's Gemma 3 27B instruction-tuned model, enhanced using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities.
31
 
32
  ## Capabilities & Training
33
 
34
  ### Fine-tuning Approach
35
- This model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
36
 
37
  1. Follow a specific reasoning format with dedicated sections for workings and solutions
38
- 2. Produce correct mathematical solutions
39
  3. Show clear step-by-step reasoning processes
40
 
41
  ### Special Formatting
@@ -47,6 +49,7 @@ The model has been trained to follow a specific reasoning format:
47
  - **Framework:** Hugging Face's TRL library
48
  - **Optimization:** LoRA fine-tuning (r=8, alpha=8)
49
  - **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
 
50
 
51
  ## Technical Specifications
52
 
@@ -55,30 +58,31 @@ This model is available in two formats:
55
  - Standard adapter format (adapter_model.safetensors)
56
  - GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
57
 
58
- ### Gemma 3 Architecture Benefits
59
- - 27B parameters, trained on 14 trillion tokens
60
- - 128K context window (extended from 32K)
61
- - QK normalization (replaced attention softcapping)
62
  - 5 sliding + 1 global attention pattern
63
  - 1024 sliding window attention
 
64
 
65
  ## Limitations
66
- - While this model excels at reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
67
  - The model's performance might vary depending on problem complexity and wording.
68
  - Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
69
 
70
  ## Acknowledgments
71
- - Google for developing the Gemma 3 model family
72
  - Hugging Face for their TRL library and GRPO implementation
73
 
74
  ## Citation
75
  If you use this model in your research, please cite:
76
  ```
77
- @misc{klei1_gemma3_grpo,
78
- author = {klei1},
79
- title = {Gemma 3 27B GRPO Reasoning Model},
80
  year = {2025},
81
  publisher = {Hugging Face},
82
- howpublished = {\url{https://huggingface.co/klei1/gemma-3-27b-grpo}}
83
  }
84
  ```
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
+ - albanian
7
  - gemma3
8
  - reasoning
9
  - mathematics
 
19
  max_new_tokens: 512
20
  ---
21
 
22
+ # Bleta-Meditor 27B GRPO Albanian Reasoning Model
23
 
24
  ## Model Description
25
  - **Developed by:** klei aliaj
26
+ - **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
27
  - **License:** apache-2.0
28
+ - **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
29
+ - **Language:** Albanian
30
  - **Framework:** Hugging Face Transformers
31
 
32
+ This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.
33
 
34
  ## Capabilities & Training
35
 
36
  ### Fine-tuning Approach
37
+ This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
38
 
39
  1. Follow a specific reasoning format with dedicated sections for workings and solutions
40
+ 2. Produce correct mathematical solutions in Albanian
41
  3. Show clear step-by-step reasoning processes
42
 
43
  ### Special Formatting
 
49
  - **Framework:** Hugging Face's TRL library
50
  - **Optimization:** LoRA fine-tuning (r=8, alpha=8)
51
  - **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
52
+ - **Language Focus:** Optimized for Albanian
53
 
54
  ## Technical Specifications
55
 
 
58
  - Standard adapter format (adapter_model.safetensors)
59
  - GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
60
 
61
+ ### Bleta-Meditor Architecture Benefits
62
+ - 27B parameters
63
+ - 128K context window
64
+ - QK normalization
65
  - 5 sliding + 1 global attention pattern
66
  - 1024 sliding window attention
67
+ - Albanian language optimization
68
 
69
  ## Limitations
70
+ - While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
71
  - The model's performance might vary depending on problem complexity and wording.
72
  - Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
73
 
74
  ## Acknowledgments
75
+ - Google for developing the Gemma 3 architecture
76
  - Hugging Face for their TRL library and GRPO implementation
77
 
78
  ## Citation
79
  If you use this model in your research, please cite:
80
  ```
81
+ @misc{klei_aliaj_bleta_meditor,
82
+ author = {Klei Aliaj},
83
+ title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
84
  year = {2025},
85
  publisher = {Hugging Face},
86
+ howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
87
  }
88
  ```