QwQ-0.5B-SFT-Draft / README.md

bol20162021

Update README.md

1270148 verified 6 months ago

3.76 kB

	---
	license: apache-2.0
	datasets:
	- amphora/QwQ-LongCoT-130K-2
	- PowerInfer/QWQ-LONGCOT-500K
	- PowerInfer/LONGCOT-Refine-500K
	language:
	- en
	metrics:
	- perplexity
	base_model:
	- Qwen/Qwen2.5-0.5B-Instruct
	library_name: transformers
	---
	## Model Details:

	- Base Model: Qwen/Qwen2-0.5B-Instruct
	- Teacher Model: Qwen/QwQ-32B-Preview
	- Distillation Framework: Instruction Tuning
	- Task Type: Conversational AI / Causal Language Modeling
	- Parameters: 0.5B
	- Special Features:
	- Integrated gradient checkpointing for efficient training
	- Step-by-step reasoning capabilities for better problem-solving

	---

	## Training:

	QwQ-0.5B-Distilled was trained using the amphora/QwQ-LongCoT-130K-2, PowerInfer/QWQ-LONGCOT-500K, and PowerInfer/LONGCOT-Refine-500K with supervised finetuning. This model can be used as a competitive reasoning model on edge devices as well as a draft model for Qwen/QwQ-32B-Preview.
	### Training Progress:
	[▓▓▓▓▓▓▓▓▓▓] 100%

	## Example Usage:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	# Model name
	model_name = "kz919/QwQ-0.5B-Distilled-SFT"
	# Load the model
	print(f"Starting to load the model {model_name} into memory")
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map={"": 0}
	)
	# Load the tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	# Define the prompt
	prompt = "How many r in strawberry."
	messages = [
	{"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
	{"role": "user", "content": prompt}
	]
	# Tokenize the input
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
	# Generate a response
	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=4096
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]
	# Decode the response
	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	---

	## Applications:

	1. Conversational Assistants:
	Suitable for AI chatbots that require reasoning and long-context understanding.

	2. Educational Tools:
	Provides step-by-step explanations, making it ideal for learning environments.

	3. Creative Writing:
	Assists in generating coherent, contextually aware long-form content.

	4. Technical Support:
	Handles complex customer queries with precision and clarity.

	---

	## Draft model for Qwen/QwQ-32B-Preview:

	This model can be used as a draft model for [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) in sepculative decoding. We observe out of 5 tokens it generates, on average 3 tokens are accepted for math queries and 2.3 tokens are accepted for general reasoning queries.

	---

	## Limitations:

	- While distilled for efficiency, performance on highly complex reasoning tasks may slightly trail the teacher model.
	- This model could still be under trained, merely a proof of concept. Don't yell at me if it's outputing nonesense.
	---

	## Citation:

	If you use this model in your research or applications, please cite it as:

	```bibtex
	@model{qwq_0.5B_distilled,
	author = {Kaizhao Liang},
	title = {Mini-QwQ: A Reasoning Model for Edge Devices},
	year = {2024},
	publisher = {Hugging Face},
	version = {1.0}
	}
	```

	---

	This model is an example of how efficient fine-tuning and distillation methods can deliver robust conversational AI capabilities in a smaller, more manageable footprint.