Running 116 116 TxT360: Trillion Extracted Text 📖 Create a large-scale deduplicated text dataset for LLM training
Running 2.96k 2.96k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2 Text Classification • Updated Jun 26 • 403 • 28