Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AdinaYΒ 
posted an update 3 days ago
Post
695
CCI4.0-M2 πŸ“Š A powerful dataset with 3 specialized subsets, released by
BAAIBeijing

BAAI/cci40-68199d90bbc798680df16d7c

✨ M2-Base: 3.5TB web data (EN/ZH), with LLM-augmented content, APACHE2.0
✨ M2-CoT: 4.2TB of auto-synthesized CoT reasoning data
✨ M2-Extra: domain-specific knowledge

In this post