SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published 19 days ago • 17
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published 7 days ago • 48
Reward Models Collection Nemotron reward models. For use in RLHF pipelines and LLM-as-a-Judge • 8 items • Updated about 22 hours ago • 11
ERNIE 4.5 Collection collection of ERNIE 4.5 models. "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights. • 23 items • Updated 5 days ago • 143
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published 13 days ago • 58
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published 15 days ago • 56
AceReason Collection Math and Code reasoning model trained through reinforcement learning (RL) • 7 items • Updated about 22 hours ago • 13
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 22 items • Updated 16 days ago • 66
CodeI/O Collection Collection for CodeI/O @ https://codei-o.github.io/ • 16 items • Updated May 6 • 7
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 14 days ago • 62
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Apr 28 • 500