Fast and Simplex: 2-Simplicial Attention in Triton Paper β’ 2507.02754 β’ Published 2 days ago β’ 15
ERNIE 4.5 Collection collection of ERNIE 4.5 models. "-Paddle" models use PaddlePaddle weights, while "-PT" models use Transformer-style PyTorch weights. β’ 23 items β’ Updated 3 days ago β’ 139
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper β’ 2506.20920 β’ Published 10 days ago β’ 57
view article Article π€ππ¬π₯οΈπ Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other β’ 14 days ago β’ 55
Essential-Web v1.0: 24T tokens of organized web data Paper β’ 2506.14111 β’ Published 19 days ago β’ 40
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy Paper β’ 2506.13284 β’ Published 20 days ago β’ 23
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper β’ 2506.09250 β’ Published 25 days ago β’ 28
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices β’ 22 items β’ Updated 14 days ago β’ 66
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper β’ 2506.05209 β’ Published about 1 month ago β’ 42
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text β’ 4 items β’ Updated 30 days ago β’ 26
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others β’ Dec 23, 2024 β’ 20
π©βπ» OlympicCoder Collection Reasoning datasets and models for competitive coding β’ 4 items β’ Updated May 13 β’ 19
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated May 19 β’ 154
view article Article Interactive Tools for machine learning, deep learning, and math By Suzana β’ May 26 β’ 44
view article Article The Transformers Library: standardizing model definitions By lysandre and 3 others β’ May 15 β’ 115