Fill-Mask
Transformers
PyTorch
modernbert
orionweller SidTheChillGuy commited on
Commit
6385d5b
·
verified ·
1 Parent(s): 4faebf2

Dataset Link Fix (#7)

Browse files

- Dataset Link Fix (7adb93c9681ea13bc5662f72b582b5e94c35c293)


Co-authored-by: Siddhant Mahajan <[email protected]>

Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -90,8 +90,8 @@ mmBERT training data is publicly available across different phases:
90
  | Phase | Dataset | Tokens | Description |
91
  |:------|:--------|:-------|:------------|
92
  | Pre-training P1 | [mmbert-pretrain-p1](https://huggingface.co/datasets/jhu-clsp/mmbert-pretrain-p1-fineweb2-langs) | 2.3T | 60 languages, foundational training |
93
- | Pre-training P2 | [mmbert-pretrain-p2](https://huggingface.co/datasets/jhu-clsp/mmbert-pretrain-p2-fineweb2-langs) | - | Extension data for pre-training phase |
94
- | Pre-training P3 | [mmbert-pretrain-p3](https://huggingface.co/datasets/jhu-clsp/mmbert-pretrain-p3-fineweb2-langs) | - | Final pre-training data |
95
  | Mid-training | [mmbert-midtraining](https://huggingface.co/datasets/jhu-clsp/mmbert-midtraining-data) | 600B | 110 languages, context extension to 8K |
96
  | Decay Phase | [mmbert-decay](https://huggingface.co/datasets/jhu-clsp/mmbert-decay-data) | 100B | 1833 languages, premium quality |
97
 
 
90
  | Phase | Dataset | Tokens | Description |
91
  |:------|:--------|:-------|:------------|
92
  | Pre-training P1 | [mmbert-pretrain-p1](https://huggingface.co/datasets/jhu-clsp/mmbert-pretrain-p1-fineweb2-langs) | 2.3T | 60 languages, foundational training |
93
+ | Pre-training P2 | [mmbert-pretrain-p2](https://huggingface.co/datasets/jhu-clsp/mmBERT-pretrain-p2-fineweb2-remaining) | - | Extension data for pre-training phase |
94
+ | Pre-training P3 | [mmbert-pretrain-p3](https://huggingface.co/datasets/jhu-clsp/mmBERT-pretrain-p3-others) | - | Final pre-training data |
95
  | Mid-training | [mmbert-midtraining](https://huggingface.co/datasets/jhu-clsp/mmbert-midtraining-data) | 600B | 110 languages, context extension to 8K |
96
  | Decay Phase | [mmbert-decay](https://huggingface.co/datasets/jhu-clsp/mmbert-decay-data) | 100B | 1833 languages, premium quality |
97