sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification
•
8B
•
Updated
•
2.06k
•
59
We train the reward model as the maximum likelihood estimation of the Bradley-Terry model.