codelion/Qwen2-0.5B-Instruct-accuracy-recovery-lora

🎯 Accuracy Recovery LoRA Adapter

This LoRA adapter helps recover accuracy when using INT4 quantized versions of Qwen/Qwen2-0.5B-Instruct. It was trained using self-distillation with Magpie-generated data.

πŸ“Š Performance Metrics

  • Base Model: Qwen/Qwen2-0.5B-Instruct
  • Quantization: INT4 with NF4
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Training Samples: 46
  • Target Performance Gap: <5% perplexity increase

πŸ”§ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Load base model with quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-0.5B-Instruct",
    quantization_config=quantization_config,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen2-0.5B-Instruct-accuracy-recovery-lora")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")

# Use the model
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

πŸ§ͺ Training Details

  • Method: Self-distillation using Magpie data generation
  • Framework: PEFT + LoRA
  • Loss Function: Combined KL divergence + MSE loss
  • Temperature: 4.0
  • Alpha (distillation weight): 0.8

πŸ“ˆ Expected Benefits

  • βœ… Maintains accuracy close to FP16 baseline
  • βœ… ~75% reduction in memory usage
  • βœ… 2-3x faster inference than FP16
  • βœ… Easy to integrate with existing workflows

🏷️ Related


This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities.

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for codelion/Qwen2-0.5B-Instruct-accuracy-recovery-lora

Base model

Qwen/Qwen2-0.5B
Adapter
(488)
this model