metadata
license: apache-2.0
datasets:
- PKU-Alignment/align-anything
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
DPO training is performed using the Align-Anything framework, with the PKU-Alignment/align-anything text-to-text dataset.
DPO training report: https://api.wandb.ai/links/nlp-amct/uifw66p5