|
--- |
|
language: ar |
|
license: apache-2.0 |
|
tags: |
|
- unsloth |
|
- qwen3 |
|
- qwen2 |
|
- 14b |
|
- arabic |
|
- logical-reasoning |
|
- conversational |
|
- instruction-following |
|
- text-generation |
|
- merged_16bit |
|
base_model: beetlware/Bee1reason-arabic-Qwen-14B |
|
datasets: |
|
- beetlware/arabic-reasoning-dataset-logic |
|
--- |
|
|
|
# Bee1reason-arabic-Qwen-14B: A Qwen3 14B Model Fine-tuned for Arabic Logical Reasoning |
|
|
|
## Model Overview |
|
|
|
**Bee1reason-arabic-Qwen-14B** is a Large Language Model (LLM) fine-tuned from the `unsloth/Qwen3-14B` base model (which itself is based on `Qwen/Qwen2-14B`). This model has been specifically tailored to enhance logical and deductive reasoning capabilities in the Arabic language, while also maintaining its general conversational abilities. The fine-tuning process utilized LoRA (Low-Rank Adaptation) with the [Unsloth](https://github.com/unslothai/unsloth) library for high training efficiency. The LoRA weights were then merged with the base model to produce this standalone 16-bit (float16) precision model. |
|
|
|
**Key Features:** |
|
* **Built on `unsloth/Qwen3-14B`:** Leverages the power and performance of the Qwen3 14-billion parameter base model. |
|
* **Fine-tuned for Arabic Logical Reasoning:** Trained on a dataset containing Arabic logical reasoning tasks. |
|
* **Conversational Format:** The model follows a conversational format, expecting user and assistant roles. It was trained on data that may include "thinking steps" (often within `<think>...</think>` tags) before providing the final answer, which is beneficial for tasks requiring explanation or complex inference. |
|
* **Unsloth Efficiency:** The Unsloth library was used for the fine-tuning process, enabling faster training and reduced GPU memory consumption. |
|
* **Merged 16-bit Model:** The final weights are a full float16 precision model, ready for direct use without needing to apply LoRA adapters to a separate base model. |
|
|
|
## Training Data |
|
|
|
The model was primarily fine-tuned on a custom Arabic logical reasoning dataset, `beetlware/arabic-reasoning-dataset-logic`, available on the Hugging Face Hub. This dataset includes tasks variés types of reasoning (deduction, induction, abduction), with each task comprising the question text, a proposed answer, and a detailed solution including thinking steps. |
|
|
|
This data was converted into a conversational format for training, typically with: |
|
1. **User Role:** Containing the problem/question text. |
|
2. **Assistant Role:** Containing the detailed solution, including thinking steps (often within `<think>...</think>` tags) followed by the final answer. |
|
|
|
## Fine-tuning Details |
|
|
|
* **Base Model:** `unsloth/Qwen3-14B` |
|
* **Fine-tuning Technique:** LoRA (Low-Rank Adaptation) |
|
* `r` (rank): 32 |
|
* `lora_alpha`: 32 |
|
* `target_modules`: `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]` |
|
* `lora_dropout`: 0 |
|
* `bias`: "none" |
|
* **Libraries Used:** Unsloth (for efficient model loading and PEFT application) and Hugging Face TRL (`SFTTrainer`) |
|
* **Max Sequence Length (`max_seq_length`):** 2048 tokens |
|
* **Training Parameters (example from notebook):** |
|
* `per_device_train_batch_size`: 2 |
|
* `gradient_accumulation_steps`: 4 (simulating a total batch size of 8) |
|
* `warmup_steps`: 5 |
|
* `max_steps`: 30 (in the notebook, adjustable for a full run) |
|
* `learning_rate`: 2e-4 (recommended to reduce to 2e-5 for longer training runs) |
|
* `optim`: "adamw_8bit" |
|
* **Final Save:** LoRA weights were merged with the base model and saved in `merged_16bit` (float16) precision. |
|
|
|
## How to Use (with Transformers) |
|
|
|
Since this is a merged 16-bit model, you can load and use it directly with the `transformers` library: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer |
|
import torch |
|
|
|
model_id = "beetlware/Bee1reason-arabic-Qwen-14B" |
|
|
|
# Load the Tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
# Load the Model |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported |
|
device_map="auto", # Distributes the model on available devices (GPU/CPU) |
|
) |
|
|
|
# Ensure the model is in evaluation mode for inference |
|
model.eval() |
|
``` |
|
|
|
```python |
|
user_prompt_with_thinking_request = "استخدم التفكير المنطقي خطوة بخطوة: إذا كان لدي 4 تفاحات والشجرة فيها 20 تفاحة، فكم تفاحة لدي إجمالاً؟" # "Use step-by-step logical thinking: If I have 4 apples and the tree has 20 apples, how many apples do I have in total?" |
|
|
|
messages_with_thinking = [ |
|
{"role": "user", "content": user_prompt_with_thinking_request} |
|
] |
|
|
|
# Apply the chat template |
|
# Qwen3 uses a specific chat template. tokenizer.apply_chat_template is the correct way to format it. |
|
chat_prompt_with_thinking = tokenizer.apply_chat_template( |
|
messages_with_thinking, |
|
tokenize=False, |
|
add_generation_prompt=True # Important for adding the assistant's generation prompt |
|
) |
|
|
|
inputs_with_thinking = tokenizer(chat_prompt_with_thinking, return_tensors="pt").to(model.device) |
|
|
|
print("\n--- Inference with Thinking Request (Example) ---") |
|
streamer_think = TextStreamer(tokenizer, skip_prompt=True) |
|
with torch.no_grad(): # Important to disable gradients during inference |
|
outputs_think = model.generate( |
|
**inputs_with_thinking, |
|
max_new_tokens=512, |
|
temperature=0.6, # Recommended settings for reasoning by Qwen team |
|
top_p=0.95, |
|
top_k=20, |
|
pad_token_id=tokenizer.eos_token_id, |
|
streamer=streamer_think |
|
) |
|
``` |
|
|
|
```python |
|
# --- Example for Normal Inference (Conversation without explicit thinking request) --- |
|
user_prompt_normal = "ما هي عاصمة مصر؟" # "What is the capital of Egypt?" |
|
messages_normal = [ |
|
{"role": "user", "content": user_prompt_normal} |
|
] |
|
|
|
chat_prompt_normal = tokenizer.apply_chat_template( |
|
messages_normal, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
inputs_normal = tokenizer(chat_prompt_normal, return_tensors="pt").to(model.device) |
|
|
|
print("\n\n--- Normal Inference (Example) ---") |
|
streamer_normal = TextStreamer(tokenizer, skip_prompt=True) |
|
with torch.no_grad(): |
|
outputs_normal = model.generate( |
|
**inputs_normal, |
|
max_new_tokens=100, |
|
temperature=0.7, # Recommended settings for normal chat |
|
top_p=0.8, |
|
top_k=20, |
|
pad_token_id=tokenizer.eos_token_id, |
|
streamer=streamer_normal |
|
) |
|
``` |
|
|
|
|
|
## Usage with VLLM (for High-Throughput Scaled Inference) |
|
VLLM is a library for fast LLM inference. Since you saved the model as merged_16bit, it can be used with VLLM. |
|
|
|
1. Install VLLM: |
|
|
|
```bash |
|
|
|
pip install vllm |
|
``` |
|
(VLLM installation might have specific CUDA and PyTorch version requirements. Refer to the VLLM documentation for the latest installation prerequisites.) |
|
|
|
2. Run the VLLM OpenAI-Compatible Server: |
|
You can serve the model using VLLM's OpenAI-compatible API server, making it easy to integrate into existing applications. |
|
|
|
```bash |
|
python -m vllm.entrypoints.openai.api_server \ |
|
--model beetlware/Bee1reason-arabic-Qwen-14B \ |
|
--tokenizer beetlware/Bee1reason-arabic-Qwen-14B \ |
|
--dtype bfloat16 \ |
|
--max-model-len 2048 \ |
|
# --tensor-parallel-size N # If you have multiple GPUs |
|
# --gpu-memory-utilization 0.9 # To adjust GPU memory usage |
|
|
|
``` |
|
- Replace --dtype bfloat16 with float16 if needed. |
|
- max-model-len should match the max_seq_length you used. |
|
|
|
3. Send Requests to the VLLM Server: |
|
Once the server is running (typically on http://localhost:8000), you can send requests using any OpenAI-compatible client, like the openai library: |
|
```python |
|
|
|
import openai |
|
|
|
client = openai.OpenAI( |
|
base_url="http://localhost:8000/v1", # VLLM server address |
|
api_key="dummy_key" # VLLM doesn't require an actual API key by default |
|
) |
|
|
|
completion = client.chat.completions.create( |
|
model="beetlware/Bee1reason-arabic-Qwen-14B", # Model name as specified in VLLM |
|
messages=[ |
|
{"role": "user", "content": "اشرح نظرية النسبية العامة بكلمات بسيطة."} # "Explain the theory of general relativity in simple terms." |
|
], |
|
max_tokens=256, |
|
temperature=0.7, |
|
stream=True # To enable streaming |
|
) |
|
|
|
print("Streaming response from VLLM:") |
|
full_response = "" |
|
for chunk in completion: |
|
if chunk.choices[0].delta.content is not None: |
|
token = chunk.choices[0].delta.content |
|
print(token, end="", flush=True) |
|
full_response += token |
|
print("\n--- End of stream ---") |
|
|
|
``` |
|
|
|
|
|
# Limitations and Potential Biases |
|
The model's performance is highly dependent on the quality and diversity of the training data. It may exhibit biases present in the data it was trained on. |
|
Despite fine-tuning for logical reasoning, the model might still make errors on very complex or unfamiliar reasoning tasks. |
|
The model may "hallucinate" or produce incorrect information, especially for topics not well-covered in its training data. |
|
Capabilities in languages other than Arabic (if primarily trained on Arabic) might be limited. |
|
|
|
|
|
# Additional Information |
|
Developed by: [loai abdalslam/Organization - beetleware] |
|
Upload/Release Date: [21-5-2025] |
|
Contact / Issue Reporting: [[email protected]] |
|
|
|
# Beetleware : |
|
|
|
|
|
We are a software house and digital transformation service provider that was founded six years ago and is based in Saudi Arabia. |
|
|
|
All rights reserved@2025 |
|
|
|
Our Offices |
|
|
|
KSA Office |
|
(+966) 54 597 3282 |
|
[email protected] |
|
|
|
Egypt Office |
|
(+2) 010 67 256 306 |
|
[email protected] |
|
|
|
Oman Office |
|
(+968) 9522 8632 |
|
|
|
|
|
|
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** beetlware AI Team |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** unsloth/qwen3-14b-unsloth-bnb-4bit |
|
|
|
This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |