reward model - a ByRookie Collection

ByRookie 's Collections

kd

pretrain data selectection

llm length control

dataset

reward model

updated Oct 7, 2024

HelpSteer2-Preference: Complementing Ratings with Preferences

Paper • 2410.01257 • Published Oct 2, 2024 • 25