SemiQwenn - Distilled Qwen2.5 0.5B

Model Description

SemiQwenn is a distilled version of Devstral knowledge transferred to the efficient Qwen2.5-0.5B architecture through Supervised Fine-Tuning (SFT) distillation. The model learns from Devstral's responses on the training dataset, effectively capturing the teacher model's capabilities while maintaining the computational efficiency of the smaller Qwen2.5-0.5B architecture. This model was created as part of a datathon project focused on efficient language model training and deployment.

Model Details

  • Model Name: SemiQwenn
  • Student Model: Qwen2.5-0.5B
  • Teacher Model: Devstral
  • Model Size: 0.5 billion parameters
  • Training Method: SFT (Supervised Fine-Tuning) Distillation with LoRA adapters
  • Language(s): English (primary), with multilingual capabilities inherited from base model
  • License: Same as base Qwen2.5 model
  • Model Type: Causal Language Model

Training Details

Training Data

  • Dataset: Code Alpaca + GSM8K (30k samples)
  • Training Split: Stratified split for balanced learning
  • Data Format: JSONL format with instruction-response pairs

Training Configuration

  • Training Method: LoRA (Low-Rank Adaptation)
  • Teacher Model: Devstral (for SFT distillation)
  • Training Framework: Transformers/PEFT
  • Hardware: GPU-optimized training

Training Process

  • Fine-tuned using SFT distillation from Devstral (teacher) to Qwen2.5-0.5B (student)
  • LoRA adapters applied to the student model and merged for final deployment
  • Optimized to transfer Devstral's knowledge to the more efficient Qwen architecture

Performance

GSM8K Evaluation Results

  • Model demonstrates competitive performance on mathematical reasoning tasks
  • Evaluation results available in project evaluation files
  • Comparison with base models and teacher model included

Resource Usage

  • Significantly more efficient than larger models
  • Optimized for deployment in resource-constrained environments
  • Fast inference times while maintaining quality

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model
model = AutoModelForCausalLM.from_pretrained("alfiwillianz/SemiQwenn-0.5b")
tokenizer = AutoTokenizer.from_pretrained("alfiwillianz/SemiQwenn-0.5b")

# Example usage
prompt = "Solve this math problem: What is 15 * 24?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Architecture

  • Architecture: Transformer-based decoder-only model
  • Attention: Multi-head attention mechanism
  • Vocabulary Size: Inherited from Qwen2.5 tokenizer
  • Context Length: Supports extended context as per base model

Intended Use

Primary Use Cases

  • Mathematical reasoning and problem solving
  • Code generation and understanding
  • Educational applications
  • Research in efficient language models

Out-of-Scope Uses

  • This model should not be used for generating harmful, biased, or inappropriate content
  • Not suitable for high-stakes decision making without human oversight
  • Not designed for real-time critical applications

Limitations and Biases

  • As a 0.5B parameter model, it has limitations compared to larger models
  • May inherit biases from training data and base model
  • Performance may vary on tasks outside the training distribution
  • Limited by the knowledge cutoff of the base model

Ethical Considerations

  • Model outputs should be reviewed for accuracy, especially in educational contexts
  • Users should be aware of potential biases and limitations
  • Appropriate safeguards should be implemented for production use

Citation

If you use SemiQwenn in your research or applications, please cite:

@misc{semiqwenn2025,
  title={SemiQwenn: A Distilled Qwen2.5 0.5B Model},
  author={Alfi Willianz},
  year={2025},
  note={Knowledge distilled model based on Qwen2.5-0.5B}
}

Acknowledgments

  • Built upon Qwen2.5 by Alibaba Cloud
  • Training methodology inspired by knowledge distillation techniques
  • Part of Datathon 2025 project on efficient language models

Model Files

This directory contains:

  • Merged model weights combining LoRA adapters with base model
  • Tokenizer configuration
  • Model configuration files
  • Training artifacts and logs

Contact

For questions about this model or the training process, please refer to the project documentation or contact the development team.


Downloads last month
10
Safetensors
Model size
494M params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alfiwillianz/SemiQwenn-0.5b

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(375)
this model
Quantizations
1 model

Datasets used to train alfiwillianz/SemiQwenn-0.5b

Collection including alfiwillianz/SemiQwenn-0.5b