How do I use this model?
#3
by
treehugg3
- opened
I'm getting logits that look like this after running the model:[-3.492682456970215,3.688016653060913]
How do I interpret them? Do I just run a softmax and learn that the reward model likes this response? I wonder how I should compare two different answers. Thank you!
Sorry, I made a mistake and was running some kind of distilbert model when I meant to be running this model. The resulting single logit like 1.7047538243223
is returned, and a higher number means the reward model favors it more than another output.