sfairXC/FsfairX-LLaMA3-RM-v0.1
Text Classification
•
8B
•
Updated
•
2.35k
•
60
We train the reward model as the maximum likelihood estimation of the Bradley-Terry model.