Xiao Liang's picture

Xiao Liang

MasterVito

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 8 hours ago

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

authored a paper about 15 hours ago

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

upvoted a paper about 18 hours ago

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

View all activity

Organizations

upvoted a paper about 8 hours ago

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration

Paper • 2508.13755 • Published 6 days ago • 1

upvoted a paper about 18 hours ago

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published 6 days ago • 27

upvoted a paper 14 days ago

WideSearch: Benchmarking Agentic Broad Info-Seeking

Paper • 2508.07999 • Published 14 days ago • 104

upvoted a paper 25 days ago

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

Paper • 2507.19427 • Published Jul 25 • 18

upvoted 2 papers 28 days ago

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published 28 days ago • 79

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 290

upvoted 2 papers about 1 month ago

Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters

Paper • 2507.13618 • Published Jul 18 • 15

Pixels, Patterns, but No Poetry: To See The World like Humans

Paper • 2507.16863 • Published Jul 21 • 68

upvoted 3 papers 2 months ago

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Paper • 2506.08989 • Published Jun 10 • 15

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Paper • 2506.14245 • Published Jun 17 • 40

TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression

Paper • 2506.02678 • Published Jun 3 • 5

upvoted a collection 2 months ago

SwS

The official collections for SwS. • 0 items • Updated Jun 14 • 1

upvoted a paper 6 months ago

Process-based Self-Rewarding Language Models

Paper • 2503.03746 • Published Mar 5 • 40