Update README.md
Browse files
README.md
CHANGED
@@ -12,53 +12,242 @@ tags:
|
|
12 |
- instruction-following
|
13 |
- text-generation
|
14 |
- merged_16bit
|
15 |
-
|
16 |
-
- gguf-my-repo
|
17 |
-
base_model: beetlware/Bee1reason-arabic-Qwen-14B
|
18 |
datasets:
|
19 |
- beetlware/arabic-reasoning-dataset-logic
|
20 |
---
|
21 |
|
22 |
-
#
|
23 |
-
This model was converted to GGUF format from [`beetlware/Bee1reason-arabic-Qwen-14B`](https://huggingface.co/beetlware/Bee1reason-arabic-Qwen-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
24 |
-
Refer to the [original model card](https://huggingface.co/beetlware/Bee1reason-arabic-Qwen-14B) for more details on the model.
|
25 |
|
26 |
-
##
|
27 |
-
Install llama.cpp through brew (works on Mac and Linux)
|
28 |
|
29 |
-
|
30 |
-
brew install llama.cpp
|
31 |
|
32 |
-
|
33 |
-
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
|
36 |
-
```bash
|
37 |
-
llama-cli --hf-repo loaiabdalslam/Bee1reason-arabic-Qwen-14B-Q4_K_M-GGUF --hf-file bee1reason-arabic-qwen-14b-q4_k_m.gguf -p "The meaning to life and the universe is"
|
38 |
-
```
|
39 |
|
40 |
-
|
41 |
-
```bash
|
42 |
-
llama-server --hf-repo loaiabdalslam/Bee1reason-arabic-Qwen-14B-Q4_K_M-GGUF --hf-file bee1reason-arabic-qwen-14b-q4_k_m.gguf -c 2048
|
43 |
-
```
|
44 |
|
45 |
-
|
|
|
|
|
46 |
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
|
52 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
```
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
```
|
56 |
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
```
|
59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
60 |
```
|
61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
62 |
```
|
63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
- instruction-following
|
13 |
- text-generation
|
14 |
- merged_16bit
|
15 |
+
base_model: unsloth/Qwen3-14B
|
|
|
|
|
16 |
datasets:
|
17 |
- beetlware/arabic-reasoning-dataset-logic
|
18 |
---
|
19 |
|
20 |
+
# Bee1reason-arabic-Qwen-14B: A Qwen3 14B Model Fine-tuned for Arabic Logical Reasoning
|
|
|
|
|
21 |
|
22 |
+
## Model Overview
|
|
|
23 |
|
24 |
+
**Bee1reason-arabic-Qwen-14B** is a Large Language Model (LLM) fine-tuned from the `unsloth/Qwen3-14B` base model (which itself is based on `Qwen/Qwen2-14B`). This model has been specifically tailored to enhance logical and deductive reasoning capabilities in the Arabic language, while also maintaining its general conversational abilities. The fine-tuning process utilized LoRA (Low-Rank Adaptation) with the [Unsloth](https://github.com/unslothai/unsloth) library for high training efficiency. The LoRA weights were then merged with the base model to produce this standalone 16-bit (float16) precision model.
|
|
|
25 |
|
26 |
+
**Key Features:**
|
27 |
+
* **Built on `unsloth/Qwen3-14B`:** Leverages the power and performance of the Qwen3 14-billion parameter base model.
|
28 |
+
* **Fine-tuned for Arabic Logical Reasoning:** Trained on a dataset containing Arabic logical reasoning tasks.
|
29 |
+
* **Conversational Format:** The model follows a conversational format, expecting user and assistant roles. It was trained on data that may include "thinking steps" (often within `<think>...</think>` tags) before providing the final answer, which is beneficial for tasks requiring explanation or complex inference.
|
30 |
+
* **Unsloth Efficiency:** The Unsloth library was used for the fine-tuning process, enabling faster training and reduced GPU memory consumption.
|
31 |
+
* **Merged 16-bit Model:** The final weights are a full float16 precision model, ready for direct use without needing to apply LoRA adapters to a separate base model.
|
32 |
|
33 |
+
## Training Data
|
|
|
|
|
|
|
34 |
|
35 |
+
The model was primarily fine-tuned on a custom Arabic logical reasoning dataset, `beetlware/arabic-reasoning-dataset-logic`, available on the Hugging Face Hub. This dataset includes tasks variés types of reasoning (deduction, induction, abduction), with each task comprising the question text, a proposed answer, and a detailed solution including thinking steps.
|
|
|
|
|
|
|
36 |
|
37 |
+
This data was converted into a conversational format for training, typically with:
|
38 |
+
1. **User Role:** Containing the problem/question text.
|
39 |
+
2. **Assistant Role:** Containing the detailed solution, including thinking steps (often within `<think>...</think>` tags) followed by the final answer.
|
40 |
|
41 |
+
## Fine-tuning Details
|
42 |
+
|
43 |
+
* **Base Model:** `unsloth/Qwen3-14B`
|
44 |
+
* **Fine-tuning Technique:** LoRA (Low-Rank Adaptation)
|
45 |
+
* `r` (rank): 32
|
46 |
+
* `lora_alpha`: 32
|
47 |
+
* `target_modules`: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]`
|
48 |
+
* `lora_dropout`: 0
|
49 |
+
* `bias`: "none"
|
50 |
+
* **Libraries Used:** Unsloth (for efficient model loading and PEFT application) and Hugging Face TRL (`SFTTrainer`)
|
51 |
+
* **Max Sequence Length (`max_seq_length`):** 2048 tokens
|
52 |
+
* **Training Parameters (example from notebook):**
|
53 |
+
* `per_device_train_batch_size`: 2
|
54 |
+
* `gradient_accumulation_steps`: 4 (simulating a total batch size of 8)
|
55 |
+
* `warmup_steps`: 5
|
56 |
+
* `max_steps`: 30 (in the notebook, adjustable for a full run)
|
57 |
+
* `learning_rate`: 2e-4 (recommended to reduce to 2e-5 for longer training runs)
|
58 |
+
* `optim`: "adamw_8bit"
|
59 |
+
* **Final Save:** LoRA weights were merged with the base model and saved in `merged_16bit` (float16) precision.
|
60 |
+
|
61 |
+
## How to Use (with Transformers)
|
62 |
+
|
63 |
+
Since this is a merged 16-bit model, you can load and use it directly with the `transformers` library:
|
64 |
+
|
65 |
+
```python
|
66 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
|
67 |
+
import torch
|
68 |
+
|
69 |
+
model_id = "beetlware/Bee1reason-arabic-Qwen-14B"
|
70 |
+
|
71 |
+
# Load the Tokenizer
|
72 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
73 |
|
74 |
+
# Load the Model
|
75 |
+
model = AutoModelForCausalLM.from_pretrained(
|
76 |
+
model_id,
|
77 |
+
torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported
|
78 |
+
device_map="auto", # Distributes the model on available devices (GPU/CPU)
|
79 |
+
)
|
80 |
+
|
81 |
+
# Ensure the model is in evaluation mode for inference
|
82 |
+
model.eval()
|
83 |
```
|
84 |
+
### --- Example for Inference with Thinking Steps (if the model was trained to produce them) ---
|
85 |
+
### Qwen3 models expect special tags for thinking <think>...</think>
|
86 |
+
### To enable thinking mode during inference (if supported by the fine-tuned model):
|
87 |
+
### You might need to craft the prompt to ask the model to think.
|
88 |
+
### Unsloth-trained Qwen3 models often respond to enable_thinking in tokenizer.apply_chat_template.
|
89 |
+
### For a merged model, its ability to show <think> depends on the training data.
|
90 |
+
```python
|
91 |
+
user_prompt_with_thinking_request = "استخدم التفكير المنطقي خطوة بخطوة: إذا كان لدي 4 تفاحات والشجرة فيها 20 تفاحة، فكم تفاحة لدي إجمالاً؟" # "Use step-by-step logical thinking: If I have 4 apples and the tree has 20 apples, how many apples do I have in total?"
|
92 |
+
|
93 |
+
messages_with_thinking = [
|
94 |
+
{"role": "user", "content": user_prompt_with_thinking_request}
|
95 |
+
]
|
96 |
+
|
97 |
+
# Apply the chat template
|
98 |
+
# Qwen3 uses a specific chat template. tokenizer.apply_chat_template is the correct way to format it.
|
99 |
+
chat_prompt_with_thinking = tokenizer.apply_chat_template(
|
100 |
+
messages_with_thinking,
|
101 |
+
tokenize=False,
|
102 |
+
add_generation_prompt=True # Important for adding the assistant's generation prompt
|
103 |
+
)
|
104 |
+
|
105 |
+
inputs_with_thinking = tokenizer(chat_prompt_with_thinking, return_tensors="pt").to(model.device)
|
106 |
+
|
107 |
+
print("\n--- Inference with Thinking Request (Example) ---")
|
108 |
+
streamer_think = TextStreamer(tokenizer, skip_prompt=True)
|
109 |
+
with torch.no_grad(): # Important to disable gradients during inference
|
110 |
+
outputs_think = model.generate(
|
111 |
+
**inputs_with_thinking,
|
112 |
+
max_new_tokens=512,
|
113 |
+
temperature=0.6, # Recommended settings for reasoning by Qwen team
|
114 |
+
top_p=0.95,
|
115 |
+
top_k=20,
|
116 |
+
pad_token_id=tokenizer.eos_token_id,
|
117 |
+
streamer=streamer_think
|
118 |
+
)
|
119 |
```
|
120 |
|
121 |
+
```python
|
122 |
+
# --- Example for Normal Inference (Conversation without explicit thinking request) ---
|
123 |
+
user_prompt_normal = "ما هي عاصمة مصر؟" # "What is the capital of Egypt?"
|
124 |
+
messages_normal = [
|
125 |
+
{"role": "user", "content": user_prompt_normal}
|
126 |
+
]
|
127 |
+
|
128 |
+
chat_prompt_normal = tokenizer.apply_chat_template(
|
129 |
+
messages_normal,
|
130 |
+
tokenize=False,
|
131 |
+
add_generation_prompt=True
|
132 |
+
)
|
133 |
+
inputs_normal = tokenizer(chat_prompt_normal, return_tensors="pt").to(model.device)
|
134 |
+
|
135 |
+
print("\n\n--- Normal Inference (Example) ---")
|
136 |
+
streamer_normal = TextStreamer(tokenizer, skip_prompt=True)
|
137 |
+
with torch.no_grad():
|
138 |
+
outputs_normal = model.generate(
|
139 |
+
**inputs_normal,
|
140 |
+
max_new_tokens=100,
|
141 |
+
temperature=0.7, # Recommended settings for normal chat
|
142 |
+
top_p=0.8,
|
143 |
+
top_k=20,
|
144 |
+
pad_token_id=tokenizer.eos_token_id,
|
145 |
+
streamer=streamer_normal
|
146 |
+
)
|
147 |
```
|
148 |
+
|
149 |
+
|
150 |
+
## Usage with VLLM (for High-Throughput Scaled Inference)
|
151 |
+
VLLM is a library for fast LLM inference. Since you saved the model as merged_16bit, it can be used with VLLM.
|
152 |
+
|
153 |
+
1. Install VLLM:
|
154 |
+
|
155 |
+
```bash
|
156 |
+
|
157 |
+
pip install vllm
|
158 |
```
|
159 |
+
(VLLM installation might have specific CUDA and PyTorch version requirements. Refer to the VLLM documentation for the latest installation prerequisites.)
|
160 |
+
|
161 |
+
2. Run the VLLM OpenAI-Compatible Server:
|
162 |
+
You can serve the model using VLLM's OpenAI-compatible API server, making it easy to integrate into existing applications.
|
163 |
+
|
164 |
+
```bash
|
165 |
+
python -m vllm.entrypoints.openai.api_server \
|
166 |
+
--model beetlware/Bee1reason-arabic-Qwen-14B \
|
167 |
+
--tokenizer beetlware/Bee1reason-arabic-Qwen-14B \
|
168 |
+
--dtype bfloat16 \
|
169 |
+
--max-model-len 2048 \
|
170 |
+
# --tensor-parallel-size N # If you have multiple GPUs
|
171 |
+
# --gpu-memory-utilization 0.9 # To adjust GPU memory usage
|
172 |
+
|
173 |
```
|
174 |
+
- Replace --dtype bfloat16 with float16 if needed.
|
175 |
+
- max-model-len should match the max_seq_length you used.
|
176 |
+
|
177 |
+
3. Send Requests to the VLLM Server:
|
178 |
+
Once the server is running (typically on http://localhost:8000), you can send requests using any OpenAI-compatible client, like the openai library:
|
179 |
+
```python
|
180 |
+
|
181 |
+
import openai
|
182 |
+
|
183 |
+
client = openai.OpenAI(
|
184 |
+
base_url="http://localhost:8000/v1", # VLLM server address
|
185 |
+
api_key="dummy_key" # VLLM doesn't require an actual API key by default
|
186 |
+
)
|
187 |
+
|
188 |
+
completion = client.chat.completions.create(
|
189 |
+
model="beetlware/Bee1reason-arabic-Qwen-14B", # Model name as specified in VLLM
|
190 |
+
messages=[
|
191 |
+
{"role": "user", "content": "اشرح نظرية النسبية العامة بكلمات بسيطة."} # "Explain the theory of general relativity in simple terms."
|
192 |
+
],
|
193 |
+
max_tokens=256,
|
194 |
+
temperature=0.7,
|
195 |
+
stream=True # To enable streaming
|
196 |
+
)
|
197 |
+
|
198 |
+
print("Streaming response from VLLM:")
|
199 |
+
full_response = ""
|
200 |
+
for chunk in completion:
|
201 |
+
if chunk.choices[0].delta.content is not None:
|
202 |
+
token = chunk.choices[0].delta.content
|
203 |
+
print(token, end="", flush=True)
|
204 |
+
full_response += token
|
205 |
+
print("\n--- End of stream ---")
|
206 |
+
|
207 |
```
|
208 |
+
|
209 |
+
|
210 |
+
# Limitations and Potential Biases
|
211 |
+
The model's performance is highly dependent on the quality and diversity of the training data. It may exhibit biases present in the data it was trained on.
|
212 |
+
Despite fine-tuning for logical reasoning, the model might still make errors on very complex or unfamiliar reasoning tasks.
|
213 |
+
The model may "hallucinate" or produce incorrect information, especially for topics not well-covered in its training data.
|
214 |
+
Capabilities in languages other than Arabic (if primarily trained on Arabic) might be limited.
|
215 |
+
|
216 |
+
|
217 |
+
# Additional Information
|
218 |
+
Developed by: [loai abdalslam/Organization - beetleware]
|
219 |
+
Upload/Release Date: [21-5-2025]
|
220 |
+
Contact / Issue Reporting: [[email protected]]
|
221 |
+
|
222 |
+
# Beetleware :
|
223 |
+
|
224 |
+
|
225 |
+
We are a software house and digital transformation service provider that was founded six years ago and is based in Saudi Arabia.
|
226 |
+
|
227 |
+
All rights reserved@2025
|
228 |
+
|
229 |
+
Our Offices
|
230 |
+
|
231 |
+
KSA Office
|
232 |
+
(+966) 54 597 3282
|
233 | |
234 |
+
|
235 |
+
Egypt Office
|
236 |
+
(+2) 010 67 256 306
|
237 | |
238 |
+
|
239 |
+
Oman Office
|
240 |
+
(+968) 9522 8632
|
241 |
+
|
242 |
+
|
243 |
+
|
244 |
+
|
245 |
+
# Uploaded model
|
246 |
+
|
247 |
+
- **Developed by:** beetlware AI Team
|
248 |
+
- **License:** apache-2.0
|
249 |
+
- **Finetuned from model :** unsloth/qwen3-14b-unsloth-bnb-4bit
|
250 |
+
|
251 |
+
This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
252 |
+
|
253 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|