view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 638
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Paper • 2505.24298 • Published May 30 • 27
view article Article OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve By codelion • May 20 • 38
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Jul 14 • 63
INTELLECT-2 Collection INTELLECT-2 is a 32 billion parameter language model with globally distributed reinforcement learning. • 3 items • Updated Jul 14 • 24
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12, 2024 • 68
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 145
Contrastive Prefence Learning: Learning from Human Feedback without RL Paper • 2310.13639 • Published Oct 20, 2023 • 25
Democratizing Reasoning Ability: Tailored Learning from Large Language Model Paper • 2310.13332 • Published Oct 20, 2023 • 16
Eureka: Human-Level Reward Design via Coding Large Language Models Paper • 2310.12931 • Published Oct 19, 2023 • 26