Penghui Qi's picture

3 19 3

Penghui Qi

QPHutu

·

QPHutu

AI & ML interests

None yet

Recent Activity

upvoted a paper about 5 hours ago

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

upvoted a paper 3 days ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

updated a collection 8 days ago

View all activity

Organizations

upvoted a paper about 5 hours ago

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published 2 days ago • 25

upvoted a paper 3 days ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published 4 days ago • 36

upvoted 5 papers about 1 month ago

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 27

Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28 • 53

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 23

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 35

upvoted an article 3 months ago

Article

双流并行(DualPipe) 没有双流会更好

By

•

Feb 28

• 7

upvoted a paper 3 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 52

upvoted a paper 4 months ago

PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization

Paper • 2503.01328 • Published Mar 3 • 16

upvoted an article 4 months ago

Article

DualPipe could be better without the Dual

By

•

Feb 28

• 17

upvoted 5 collections 7 months ago

⚓️ Sailor Language Models

Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab. • 17 items • Updated Dec 3, 2024 • 17

💡 DICE

Self-alignment with DPO Implicit Rewards • 5 items • Updated Jul 28, 2024 • 9

📈 Scaling Laws with Vocabulary

Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11, 2024 • 6

🧬 RegMix: Data Mixture as Regression

Automatic data mixture method for large language model pre-training • 10 items • Updated Jul 26, 2024 • 8

🔱 Sailor2 Language Models

Sailing in South-East Asia with Inclusive Multilingual LLMs • 34 items • Updated about 1 month ago • 28

upvoted a paper 8 months ago

Balancing Pipeline Parallelism with Vocabulary Parallelism

Paper • 2411.05288 • Published Nov 8, 2024 • 20

upvoted a paper about 1 year ago

Pipeline Parallelism with Controllable Memory

Paper • 2405.15362 • Published May 24, 2024 • 3

upvoted a paper over 1 year ago

Zero Bubble Pipeline Parallelism

Paper • 2401.10241 • Published Nov 30, 2023 • 25