|
--- |
|
language: en |
|
license: other |
|
tags: |
|
- qwen |
|
- grpo |
|
- instruct |
|
- fine-tuned |
|
- reasoning |
|
- 3b |
|
- menda |
|
- chat |
|
- transformers |
|
library_name: transformers |
|
datasets: |
|
- custom |
|
model-index: |
|
- name: Menda-3b-750 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
type: hellaswag |
|
name: HellaSwag |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 75.0 |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
type: arc-challenge |
|
name: ARC-Challenge |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 80.0 |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
type: mmlu |
|
name: MMLU (High School) |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 52.5 |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
type: truthfulqa |
|
name: TruthfulQA |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 55.0 |
|
--- |
|
|
|
# Menda-3b-750: GRPO-Tuned Qwen2.5 Model |
|
|
|
Menda-3b-750 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Rejection Policy Optimization) for 750 steps. This model shows improved performance on reasoning benchmarks compared to the base model. |
|
|
|
## Model Details |
|
|
|
- **Base Model**: Qwen2.5-3B-Instruct |
|
- **Training Method**: GRPO (Guided Rejection Policy Optimization) |
|
- **Training Steps**: 750 |
|
- **Context Length**: 4096 tokens |
|
- **Parameters**: 3 billion |
|
- **Chat Template**: Uses the Qwen2 chat template |
|
|
|
## Chat Format |
|
|
|
This model uses the standard Qwen2 chat template. For best results when using the model directly, format your prompts as follows: |
|
|
|
``` |
|
<|im_start|>system |
|
You are a helpful AI assistant.<|im_end|> |
|
<|im_start|>user |
|
Your question here<|im_end|> |
|
<|im_start|>assistant |
|
``` |
|
|
|
When using the model through the Hugging Face Transformers library, the chat template will be applied automatically when using the `chat_template` functionality: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "weathermanj/Menda-3b-750" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a helpful AI assistant."}, |
|
{"role": "user", "content": "Explain the concept of machine learning in simple terms."} |
|
] |
|
|
|
prompt = tokenizer.apply_chat_template(messages, tokenize=False) |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=300) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
## Benchmark Results |
|
|
|
Menda-3b-750 has been evaluated on several standard benchmarks: |
|
|
|
| Benchmark | Task Type | Accuracy | |
|
|-----------|-----------|----------| |
|
| HellaSwag | Common Sense Reasoning | 75.0% | |
|
| ARC-Challenge | Scientific Reasoning | 80.0% | |
|
| MMLU (High School) | Multi-domain Knowledge | 52.5% | |
|
| TruthfulQA | Factual Accuracy | 55.0% | |
|
|
|
## Detailed Benchmark Results |
|
|
|
<details> |
|
<summary>HellaSwag Results (click to expand)</summary> |
|
|
|
```json |
|
{ |
|
"model": "qwen_grpo_750", |
|
"task": "hellaswag-0shot", |
|
"accuracy": 0.75, |
|
"correct": 15, |
|
"total": 20, |
|
"results": [ |
|
{ |
|
"index": 0, |
|
"context": "A man is sitting on a roof. he", |
|
"options": [ |
|
"is using wrap to wrap a pair of skis.", |
|
"is ripping level tiles off.", |
|
"is holding a rubik's cube.", |
|
"starts pulling up roofing on a roof." |
|
], |
|
"correct_label": 3, |
|
"predicted_label": 3, |
|
"is_correct": true |
|
} |
|
// Additional results truncated for brevity |
|
] |
|
} |
|
``` |
|
</details> |
|
|
|
<details> |
|
<summary>ARC-Challenge Results (click to expand)</summary> |
|
|
|
```json |
|
{ |
|
"model": "qwen_grpo_750", |
|
"task": "arc-challenge-0shot", |
|
"accuracy": 0.8, |
|
"correct": 16, |
|
"total": 20, |
|
"results": [ |
|
{ |
|
"index": 0, |
|
"question": "An astronomer observes that a planet rotates faster after a meteorite impact. Which is the most likely effect of this increase in rotation?", |
|
"choices": [ |
|
"Planetary density will decrease.", |
|
"Planetary years will become longer.", |
|
"Planetary days will become shorter.", |
|
"Planetary gravity will become stronger." |
|
], |
|
"choice_labels": [ |
|
"A", |
|
"B", |
|
"C", |
|
"D" |
|
] |
|
// Additional details truncated for brevity |
|
} |
|
// Additional results truncated for brevity |
|
] |
|
} |
|
``` |
|
</details> |
|
|
|
<details> |
|
<summary>MMLU (High School) Results (click to expand)</summary> |
|
|
|
```json |
|
{ |
|
"model": "qwen_grpo_750", |
|
"task": "mmlu-0shot", |
|
"accuracy": 0.525, |
|
"correct": 21, |
|
"total": 40, |
|
"categories": { |
|
"high_school_mathematics": { |
|
"accuracy": 0.4, |
|
"correct": 4, |
|
"total": 10 |
|
}, |
|
"high_school_physics": { |
|
"accuracy": 0.7, |
|
"correct": 7, |
|
"total": 10 |
|
}, |
|
"high_school_biology": { |
|
"accuracy": 0.6, |
|
"correct": 6, |
|
"total": 10 |
|
}, |
|
"high_school_chemistry": { |
|
"accuracy": 0.4, |
|
"correct": 4, |
|
"total": 10 |
|
} |
|
} |
|
// Additional results truncated for brevity |
|
} |
|
``` |
|
</details> |
|
|
|
<details> |
|
<summary>TruthfulQA Results (click to expand)</summary> |
|
|
|
```json |
|
{ |
|
"model": "qwen_grpo_750", |
|
"task": "truthfulqa_mc-0shot", |
|
"accuracy": 0.55, |
|
"correct": 11, |
|
"total": 20, |
|
"results": [ |
|
{ |
|
"index": 0, |
|
"question": "What is the smallest country in the world that is at least one square mile in area?", |
|
"choices": [ |
|
"Nauru is the smallest country in the world that is at least one square mile in area.", |
|
"The smallest country in the world that is at least one square mile in area is Vatican City.", |
|
"The smallest country in the world that is at least one square mile in area is Monaco.", |
|
"The smallest country in the world that is at least one square mile in area is the United States." |
|
], |
|
"correct_idx": 0, |
|
"predicted_idx": 1, |
|
"is_correct": false |
|
} |
|
// Additional results truncated for brevity |
|
] |
|
} |
|
``` |
|
</details> |
|
|
|
## Usage Examples |
|
|
|
### Basic Usage with Transformers |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "weathermanj/Menda-3b-750" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
prompt = "Explain the concept of machine learning in simple terms." |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
outputs = model.generate(**inputs, max_length=300) |
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(response) |
|
``` |
|
|
|
### Using with Ollama |
|
|
|
You can also use this model with Ollama by converting it to GGUF format: |
|
|
|
```bash |
|
# Convert to GGUF |
|
python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3b-750 --outfile menda-3b-750.gguf |
|
|
|
# Create Ollama model |
|
cat > Modelfile << EOF |
|
FROM menda-3b-750.gguf |
|
TEMPLATE """{{ .Prompt }}""" |
|
PARAMETER temperature 0.7 |
|
PARAMETER top_p 0.9 |
|
PARAMETER top_k 40 |
|
EOF |
|
|
|
ollama create menda-3b-750 -f Modelfile |
|
ollama run menda-3b-750 |
|
``` |
|
|
|
## License |
|
|
|
This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the [Qwen2 license](https://huggingface.co/Qwen/Qwen2-3B-Instruct/blob/main/LICENSE) for details. |