klei1 commited on
Commit
817e0c5
·
verified ·
1 Parent(s): 493b043

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -44
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- base_model: bleta-meditor-27b
3
  tags:
4
  - text-generation-inference
5
  - transformers
@@ -8,6 +8,8 @@ tags:
8
  - reasoning
9
  - mathematics
10
  - grpo
 
 
11
  license: apache-2.0
12
  language:
13
  - al
@@ -19,70 +21,96 @@ inference:
19
  max_new_tokens: 512
20
  ---
21
 
22
- # Bleta-Meditor 27B GRPO Albanian Reasoning Model
23
 
24
  ## Model Description
25
  - **Developed by:** klei aliaj
26
- - **Model type:** Bleta-Meditor 27B fine-tuned with GRPO for Albanian reasoning tasks
27
  - **License:** apache-2.0
28
- - **Finetuned from model:** Bleta-Meditor 27B (based on Gemma 3 architecture)
29
  - **Language:** Albanian
30
- - **Framework:** Hugging Face Transformers
31
 
32
- This model is a fine-tuned version of the Bleta-Meditor 27B model, specifically optimized for the Albanian language using Generative Rejection Policy Optimization (GRPO) to improve its reasoning capabilities. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture.
33
 
34
- ## Capabilities & Training
35
 
36
- ### Fine-tuning Approach
37
- This Albanian language model was fine-tuned using GRPO (Generative Rejection Policy Optimization), a reinforcement learning technique that trains models to optimize for specific reward functions. The model was trained to:
38
 
39
- 1. Follow a specific reasoning format with dedicated sections for workings and solutions
40
- 2. Produce correct mathematical solutions in Albanian
41
- 3. Show clear step-by-step reasoning processes
 
 
 
42
 
43
- ### Special Formatting
44
- The model has been trained to follow a specific reasoning format:
45
- - Working out/reasoning sections are enclosed within `<start_working_out>` and `<end_working_out>` tags
46
- - Final solutions are provided between `<SOLUTION>` and `</SOLUTION>` tags
 
47
 
48
- ### Training Configuration
49
- - **Framework:** Hugging Face's TRL library
50
- - **Optimization:** LoRA fine-tuning (r=8, alpha=8)
51
- - **Reward Functions:** Format adherence, answer accuracy, and reasoning quality
52
- - **Language Focus:** Optimized for Albanian
53
 
54
- ## Technical Specifications
 
 
 
 
 
 
 
 
 
55
 
56
- ### Available Formats
57
- This model is available in two formats:
58
- - Standard adapter format (adapter_model.safetensors)
59
- - GGUF 8-bit quantized format (bleta-meditor-27b-finetune.Q8_0.gguf) for use with llama.cpp
60
 
61
- ### Bleta-Meditor Architecture Benefits
 
 
 
 
62
  - 27B parameters
 
63
  - 128K context window
64
  - QK normalization
65
  - 5 sliding + 1 global attention pattern
66
  - 1024 sliding window attention
67
- - Albanian language optimization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
  ## Limitations
70
- - While this model excels at Albanian reasoning tasks, particularly mathematical problems, it may still occasionally provide incorrect solutions for complex problems.
71
- - The model's performance might vary depending on problem complexity and wording.
72
- - Like all language models, it may occasionally hallucinate or provide incorrect information outside its training domain.
73
 
74
  ## Acknowledgments
75
  - Google for developing the Gemma 3 architecture
76
- - Hugging Face for their TRL library and GRPO implementation
77
-
78
- ## Citation
79
- If you use this model in your research, please cite:
80
- ```
81
- @misc{klei_aliaj_bleta_meditor,
82
- author = {Klei Aliaj},
83
- title = {Bleta-Meditor 27B GRPO Albanian Reasoning Model},
84
- year = {2025},
85
- publisher = {Hugging Face},
86
- howpublished = {\url{https://huggingface.co/klei1/bleta-meditor-27b-finetune}}
87
- }
88
- ```
 
1
  ---
2
+ base_model: bleta-logjike-27b
3
  tags:
4
  - text-generation-inference
5
  - transformers
 
8
  - reasoning
9
  - mathematics
10
  - grpo
11
+ - gsm8k
12
+ - conversational
13
  license: apache-2.0
14
  language:
15
  - al
 
21
  max_new_tokens: 512
22
  ---
23
 
24
+ # Bleta-Logjike 27B Albanian Logical Reasoning Model
25
 
26
  ## Model Description
27
  - **Developed by:** klei aliaj
28
+ - **Model type:** Bleta-Logjike 27B optimized for Albanian logical reasoning
29
  - **License:** apache-2.0
30
+ - **Format:** Full-precision model (HuggingFace Transformers format)
31
  - **Language:** Albanian
32
+ - **Base architecture:** Based on Gemma 3 27B
33
 
34
+ This model is the full-precision version of the Bleta-Logjike 27B model, specifically optimized for logical reasoning tasks in the Albanian language. Bleta is an Albanian adaptation based on Google's Gemma 3 architecture, with this version focused on enhancing logical reasoning and problem-solving capabilities for Albanian speakers.
35
 
36
+ ## Capabilities & Features
37
 
38
+ ### Logical Reasoning Focus
39
+ This Albanian language model excels at:
40
 
41
+ 1. Logical analysis and deduction in Albanian
42
+ 2. Step-by-step problem solving
43
+ 3. Structured reasoning for complex problems
44
+ 4. Understanding logical relationships and dependencies
45
+ 5. Mathematical reasoning for grade-school level problems
46
+ 6. Conversational reasoning and explanations
47
 
48
+ ### Albanian Language Optimization
49
+ - Native support for Albanian grammar and vocabulary
50
+ - Understanding of Albanian cultural context
51
+ - Handling of Albanian-specific logical expressions and constructs
52
+ - Natural conversational abilities in Albanian
53
 
54
+ ## Training Methodology
 
 
 
 
55
 
56
+ ### GRPO Approach
57
+ This model was fine-tuned using Generative Rejection Policy Optimization (GRPO), a reinforcement learning technique that trains models to optimize for specific reward functions. GRPO allows the model to learn from feedback on its generated responses, improving reasoning quality over time by:
58
+
59
+ 1. Generating multiple candidate responses
60
+ 2. Evaluating responses against specific reward criteria
61
+ 3. Learning to prefer high-quality reasoning patterns
62
+ 4. Optimizing for step-by-step problem solving
63
+
64
+ ### GSM8K Dataset
65
+ The training utilized the GSM8K (Grade School Math 8K) dataset, which contains over 8,000 high-quality grade school math problems, requiring step-by-step reasoning to solve. The dataset provides:
66
 
67
+ - Diverse mathematical problem types
68
+ - Multi-step reasoning challenges
69
+ - Clear step-by-step solutions
70
+ - Grade-school level complexity
71
 
72
+ This dataset was adapted for Albanian language training to ensure the model can handle mathematical reasoning tasks in Albanian.
73
+
74
+ ## Technical Specifications
75
+
76
+ ### Model Architecture
77
  - 27B parameters
78
+ - Based on Gemma 3 architecture with Albanian adaptations
79
  - 128K context window
80
  - QK normalization
81
  - 5 sliding + 1 global attention pattern
82
  - 1024 sliding window attention
83
+
84
+ ### Usage Requirements
85
+ - Recommended minimum 48GB GPU VRAM for full-precision inference
86
+ - Compatible with Hugging Face Transformers library
87
+ - Can be loaded with 4-bit or 8-bit quantization for lower resource environments
88
+
89
+ ## Usage with Transformers
90
+
91
+ ```python
92
+ from transformers import AutoModelForCausalLM, AutoTokenizer
93
+
94
+ model_name = "klei1/bleta-logjike-27b"
95
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)
96
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
97
+
98
+ messages = [
99
+ {"role": "user", "content": "Si llogaritet sipërfaqja e një trekëndëshi?"}
100
+ ]
101
+
102
+ text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
103
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
104
+
105
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.95)
106
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
107
+ ```
108
 
109
  ## Limitations
110
+
111
+ This is the full-precision version of the model requiring significant computational resources. For deployment on consumer hardware, consider using the 8-bit quantized GGUF version available at klei1/bleta-logjike-27b-finetune.
 
112
 
113
  ## Acknowledgments
114
  - Google for developing the Gemma 3 architecture
115
+ - OpenAI for the GSM8K dataset
116
+ - Hugging Face for their TRL library and GRPO implementation