Model Card for Model ID

Model Details

  • Base Model: Qwen2-0.5B
  • Fine-tuning Method: Direct Preference Optimization (DPO)
  • Framework: Unsloth
  • Quantization: 4-bit QLoRA (during training)

Uses

from transformers import AutoTokenizer
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "VinitT/Qwen2-0.5B-DPO",
    dtype = None,
    load_in_4bit = False,
)

messages = [{"role": "user", "content": "Hello,how can i develop a habit of drawing daily?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
# Decode only the new response (not the prompt)
prompt_len = inputs["input_ids"].shape[-1]
response = tokenizer.decode(outputs[0][prompt_len:], skip_special_tokens=True)

print(response.strip())
Downloads last month
35
Safetensors
Model size
321M params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VinitT/Qwen2-0.5B-DPO

Base model

Qwen/Qwen2-0.5B
Quantized
(80)
this model

Dataset used to train VinitT/Qwen2-0.5B-DPO