arxiv:2512.24618
Xiaoyu Tan
WIlliam1900
AI & ML interests
None yet
Recent Activity
authored
a paper
about 3 hours ago
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive
Exploration for Agentic Reinforcement Learning
authored
a paper
about 3 hours ago
The Choice of Divergence: A Neglected Key to Mitigating Diversity
Collapse in Reinforcement Learning with Verifiable Reward
authored
a paper
about 3 hours ago
AURORA:Automated Training Framework of Universal Process Reward Models
via Ensemble Prompting and Reverse Verification