Models used in CHARM: Calibrating Reward Models With Chatbot Arena Scores.
shawnxzhu
shawnxzhu
·
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 19 hours ago
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVR
liked
a dataset
3 months ago
TIGER-Lab/WebInstruct-verified
updated
a dataset
4 months ago
shawnxzhu/DSAA6000Q-Mistral-7B-Instruct-v0.2-lima-dpo
Organizations
None yet