SLM pretrained from scratch
AI & ML interests
None defined yet.
Recent Activity
a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets.
-
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer • Updated • 958M • 18.4k • 51 -
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training
Paper • 2501.08197 • Published • 9 -
opencsg/chinese-fineweb-edu-v2
Viewer • Updated • 188M • 14.3k • 70 -
opencsg/chinese-fineweb-edu
Viewer • Updated • 84.6M • 16.9k • 109
codeLlama finetune by OpenCSG
synthetic datasets
SLM pretrained from scratch
a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets.
-
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer • Updated • 958M • 18.4k • 51 -
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training
Paper • 2501.08197 • Published • 9 -
opencsg/chinese-fineweb-edu-v2
Viewer • Updated • 188M • 14.3k • 70 -
opencsg/chinese-fineweb-edu
Viewer • Updated • 84.6M • 16.9k • 109
codeLlama finetune by OpenCSG
starcoder finetuned by OpenCSG
synthetic datasets