codelion/Qwen2-0.5B-Instruct-accuracy-recovery-lora
π― Accuracy Recovery LoRA Adapter
This LoRA adapter helps recover accuracy when using INT4 quantized versions of Qwen/Qwen2-0.5B-Instruct. It was trained using self-distillation with Magpie-generated data.
π Performance Metrics
- Base Model: Qwen/Qwen2-0.5B-Instruct
- Quantization: INT4 with NF4
- LoRA Rank: 16
- LoRA Alpha: 32
- Training Samples: 46
- Target Performance Gap: <5% perplexity increase
π§ Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Load base model with quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2-0.5B-Instruct",
quantization_config=quantization_config,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen2-0.5B-Instruct-accuracy-recovery-lora")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
# Use the model
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
π§ͺ Training Details
- Method: Self-distillation using Magpie data generation
- Framework: PEFT + LoRA
- Loss Function: Combined KL divergence + MSE loss
- Temperature: 4.0
- Alpha (distillation weight): 0.8
π Expected Benefits
- β Maintains accuracy close to FP16 baseline
- β ~75% reduction in memory usage
- β 2-3x faster inference than FP16
- β Easy to integrate with existing workflows
π·οΈ Related
- Dataset: codelion/Qwen2-0.5B-Instruct-magpie
- Base Model: Qwen/Qwen2-0.5B-Instruct
- Framework: PEFT
This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities.
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support