COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper β’ 2504.05535 β’ Published 8 days ago β’ 41
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper β’ 2503.22230 β’ Published 18 days ago β’ 43
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper β’ 2503.10460 β’ Published Mar 13 β’ 27
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 β’ 385
EasyRAG: Efficient Retrieval-Augmented Generation Framework for Automated Network Operations Paper β’ 2410.10315 β’ Published Oct 14, 2024 β’ 3
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper β’ 2501.12326 β’ Published Jan 21 β’ 57
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper β’ 2503.06749 β’ Published Mar 9 β’ 27
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper β’ 2503.04872 β’ Published Mar 6 β’ 14
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! Paper β’ 2502.07374 β’ Published Feb 11 β’ 39
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper β’ 2501.08313 β’ Published Jan 14 β’ 285
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper β’ 2501.07301 β’ Published Jan 13 β’ 99
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published Jan 8 β’ 275
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper β’ 2412.18619 β’ Published Dec 16, 2024 β’ 58
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper β’ 2412.06559 β’ Published Dec 9, 2024 β’ 83
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper β’ 2411.07975 β’ Published Nov 12, 2024 β’ 31