Penghui Qi's picture

3 19 3

Penghui Qi

QPHutu

·

QPHutu

AI & ML interests

None yet

Recent Activity

upvoted a paper 43 minutes ago

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

upvoted a paper 3 days ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

updated a collection 8 days ago

View all activity

Organizations

upvoted a paper 43 minutes ago

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published 2 days ago • 18

upvoted a paper 3 days ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published 3 days ago • 36

updated a collection 8 days ago

LLM Self-Play

2 items • Updated 8 days ago

updated a collection 29 days ago

LLM Agent

3 items • Updated 29 days ago

updated a collection about 1 month ago

LLM Agent

3 items • Updated 29 days ago

upvoted a paper about 1 month ago

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 27

updated a collection about 1 month ago

LLM Agent

3 items • Updated 29 days ago

upvoted 2 papers about 1 month ago

Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28 • 53

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

updated a collection about 1 month ago

LLM Pretraining

3 items • Updated May 27

upvoted a paper about 1 month ago

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 23

authored a paper about 1 month ago

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 35

upvoted a paper about 1 month ago

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 35

commented a paper about 1 month ago

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 35 •

upvoted an article 3 months ago

Article

双流并行(DualPipe) 没有双流会更好

By

•

Feb 28

• 7

authored a paper 3 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 52

upvoted a paper 3 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 52

authored a paper 4 months ago

PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization

Paper • 2503.01328 • Published Mar 3 • 16