Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published 2 days ago • 25
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published 4 days ago • 36
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19 • 35
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26 • 52
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization Paper • 2503.01328 • Published Mar 3 • 16
⚓️ Sailor Language Models Collection Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab. • 17 items • Updated Dec 3, 2024 • 17
📈 Scaling Laws with Vocabulary Collection Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11, 2024 • 6
🧬 RegMix: Data Mixture as Regression Collection Automatic data mixture method for large language model pre-training • 10 items • Updated Jul 26, 2024 • 8
🔱 Sailor2 Language Models Collection Sailing in South-East Asia with Inclusive Multilingual LLMs • 34 items • Updated about 1 month ago • 28
Balancing Pipeline Parallelism with Vocabulary Parallelism Paper • 2411.05288 • Published Nov 8, 2024 • 20