allenai/Llama-3.1-8B-Instruct-RM-RB2 · How do I use this model?

~~I'm getting logits that look like this after running the model:~~
~~[-3.492682456970215,3.688016653060913]~~

~~How do I interpret them? Do I just run a softmax and learn that the reward model likes this response? I wonder how I should compare two different answers. Thank you!~~

Sorry, I made a mistake and was running some kind of distilbert model when I meant to be running this model. The resulting single logit like 1.7047538243223 is returned, and a higher number means the reward model favors it more than another output.