DPO training is performed using the Align-Anything framework, with the PKU-Alignment/align-anything text-to-text dataset.
DPO training report: https://api.wandb.ai/links/nlp-amct/uifw66p5
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support