arxiv:2501.14249
Hugh Zhang
hugh-scale
AI & ML interests
None yet
Recent Activity
authored
a paper
2 days ago
Humanity's Last Exam
authored
a paper
5 months ago
Chain-of-Thought Reasoning is a Policy Improvement Operator
authored
a paper
5 months ago
Q-Probe: A Lightweight Approach to Reward Maximization for Language
Models
Organizations
models
None public yet