Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers (https://arxiv.org/abs/2509.23152)
Zhicheng YANG
yangzhch6
AI & ML interests
reasoning with LLMs
Recent Activity
updated a model 5 days ago
yangzhch6/maxrl-qwen3-4b-base-dapo-bs128-n16-stepp400 published a model 5 days ago
yangzhch6/maxrl-qwen3-4b-base-dapo-bs128-n16-stepp400 upvoted a paper 9 days ago
ViewFusion: Structured Spatial Thinking Chains for Multi-View ReasoningOrganizations
None yet