DPO training is performed using the Align-Anything framework, with the PKU-Alignment/align-anything text-to-text dataset.

DPO training report: https://api.wandb.ai/links/nlp-amct/uifw66p5

Downloads last month
1
Safetensors
Model size
494M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ll922/Qwen2.5-0.5B-Instruct-Align-Anything-DPO

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(352)
this model

Dataset used to train ll922/Qwen2.5-0.5B-Instruct-Align-Anything-DPO