Kyleyee/Qwen2.5-1.5B-PPO-hh-retrain-reward-without-eoschange Text Generation • 2B • Updated Apr 25 • 2
Kyleyee/Qwen2-0.5B-DRDPO-imdb-subsft-reverse-preference Text Generation • 0.5B • Updated Mar 21 • 10 •