Running 2.59k 2.59k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated Nov 6, 2024 • 128
view article Article Releasing Common Corpus: the largest public domain dataset for training LLMs By Pclanglais • Mar 20, 2024 • 24
Saul-7B: A pioneering Large Language Model for Law Collection We introduce SaulLM-7B, a LLM tailored for the legal domain trained on 30 billion tokens of legal data. Released under MIT License. • 4 items • Updated Mar 7, 2024 • 18
togethercomputer/RedPajama-INCITE-Instruct-3B-v1 Text Generation • Updated May 9, 2023 • 544 • 93