v0.2 Training: SFT only or SFT+DPO?
#15
by
weizechen
- opened
Hi. I've read that the v0.1 documentation mentions SFT+DPO training, while v0.2 only refers to SFT. The alignment handbook also lacks a DPO recipe. Was DPO used for v0.2? Thanks!
Hi, we only used SFT for v0.2
Thanks for the reply!
weizechen
changed discussion status to
closed