COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published 8 days ago • 41
meta-llama/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text • Updated 6 days ago • 657k • • 775
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published 18 days ago • 43
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13 • 27