README.md · weathermanj/Menda-3B-250 at main

Menda-3B-250 / README.md

weathermanj

Upload README.md with huggingface_hub

577a31e verified 3 months ago

preview code

raw

history blame contribute delete

6.5 kB

	---
	language: en
	license: other
	tags:
	- qwen
	- grpo
	- instruct
	- fine-tuned
	- reasoning
	- 3b
	- menda
	- chat
	- transformers
	library_name: transformers
	datasets:
	- gsm8k
	model-index:
	- name: Menda-3B-250
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	type: arc-challenge
	name: ARC-Challenge
	metrics:
	- name: Accuracy
	type: accuracy
	value: 50.0
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	type: boolq
	name: BoolQ
	metrics:
	- name: Accuracy
	type: accuracy
	value: 80.0
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	type: hellaswag
	name: HellaSwag
	metrics:
	- name: Accuracy
	type: accuracy
	value: 40.0
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	type: mmlu
	name: MMLU (Overall)
	metrics:
	- name: Accuracy
	type: accuracy
	value: 68.95
	---

	# Menda-3B-250: GRPO-Tuned Qwen2.5 Model

	Menda-3B-250 is a fine-tuned version of Qwen2.5-3B-Instruct, trained with GRPO (Guided Reinforcement from Preference Optimization) for 250 steps. This model shows improved performance on reasoning benchmarks compared to the base model.

	## Model Details

	- Base Model: Qwen/Qwen2.5-3B-Instruct
	- Training Method: GRPO (Guided Reinforcement from Preference Optimization)
	- Training Steps: 250
	- Parameters: 3 billion
	- Context Length: 32K tokens
	- Training Data: GSM8K (mathematical reasoning)
	- Chat Template: Uses the Qwen2 chat template

	## Chat Format

	This model uses the standard Qwen2 chat template. For best results when using the model directly, format your prompts as follows:

	```
	<\|im_start\|>system
	You are a helpful AI assistant.<\|im_end\|>
	<\|im_start\|>user
	Your question here<\|im_end\|>
	<\|im_start\|>assistant
	```

	When using the model through the Hugging Face Transformers library, the chat template will be applied automatically when using the `chat_template` functionality:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "weathermanj/Menda-3B-250"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	messages = [
	{"role": "system", "content": "You are a helpful AI assistant."},
	{"role": "user", "content": "Explain the concept of machine learning in simple terms."}
	]

	prompt = tokenizer.apply_chat_template(messages, tokenize=False)
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=300)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Benchmark Results

	Menda-3B-250 has been evaluated on several standard benchmarks:

	\| Benchmark \| Task Type \| Accuracy \|
	\|-----------\|-----------\|----------\|
	\| ARC-Challenge \| Scientific Reasoning \| 50.0% \|
	\| BoolQ \| Reading Comprehension \| 80.0% \|
	\| HellaSwag \| Common Sense Reasoning \| 40.0% \|
	\| Lambada \| Text Completion \| 70.0% \|
	\| PIQA \| Physical Reasoning \| 90.0% \|
	\| Winogrande \| Commonsense Reasoning \| 90.0% \|

	### MMLU Performance

	\| MMLU Category \| Score \|
	\|---------------\|-------\|
	\| Overall \| 68.95% \|
	\| Humanities \| 76.92% \|
	\| Social Sciences \| 75.83% \|
	\| STEM \| 60.00% \|
	\| Other \| 67.69% \|

	## Key Strengths

	- Highest MMLU Score: This checkpoint achieves the highest overall MMLU score (68.95%) among all checkpoints in the training progression.
	- Strong Humanities Performance: Exceptional performance in humanities subjects (76.92%).
	- Efficient Training: Achieves impressive results with minimal training (only 250 steps).
	- Balanced Capabilities: Maintains strong performance across diverse tasks without significant trade-offs.

	## Usage Examples

	### Basic Usage with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "weathermanj/Menda-3B-250"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	prompt = "Explain the concept of machine learning in simple terms."
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=300)
	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Chat Usage with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "weathermanj/Menda-3B-250"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	messages = [
	{"role": "system", "content": "You are a helpful AI assistant."},
	{"role": "user", "content": "Give me a short introduction to large language models."}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	### Using with Ollama

	You can also use this model with Ollama by converting it to GGUF format:

	```bash
	# Convert to GGUF
	python -m llama_cpp.convert_hf_to_gguf weathermanj/Menda-3B-250 --outfile menda-3b-250.gguf

	# Create Ollama model
	cat > Modelfile << EOF
	FROM menda-3b-250.gguf
	TEMPLATE """{{ .Prompt }}"""
	PARAMETER temperature 0.7
	PARAMETER top_p 0.9
	PARAMETER top_k 40
	EOF

	ollama create menda-3b-250 -f Modelfile
	ollama run menda-3b-250
	```

	## Training Configuration

	The model was trained using the GRPO methodology with the following configuration:

	- LoRA Rank: 128
	- Learning Rate: 5e-6
	- Optimizer: AdamW (8-bit)
	- Batch Size: 8 per device
	- Gradient Accumulation Steps: 4
	- Training Samples: 100 examples from GSM8K

	## License

	This model inherits the license of the base Qwen2.5-3B-Instruct model. Please refer to the [Qwen2 license](https://huggingface.co/Qwen/Qwen2-3B-Instruct/blob/main/LICENSE) for details.