Post
695
CCI4.0-M2 π A powerful dataset with 3 specialized subsets, released by
BAAIBeijing
BAAI/cci40-68199d90bbc798680df16d7c
β¨ M2-Base: 3.5TB web data (EN/ZH), with LLM-augmented content, APACHE2.0
β¨ M2-CoT: 4.2TB of auto-synthesized CoT reasoning data
β¨ M2-Extra: domain-specific knowledge
BAAIBeijing
BAAI/cci40-68199d90bbc798680df16d7c
β¨ M2-Base: 3.5TB web data (EN/ZH), with LLM-augmented content, APACHE2.0
β¨ M2-CoT: 4.2TB of auto-synthesized CoT reasoning data
β¨ M2-Extra: domain-specific knowledge