TensorFlow Model Garden LMs
AI & ML interests
Language Model Pretraining, TensorFlow Model Garden
Recent Activity
🏡 TensorFlow Model Garden LMs
This organization showcases language model pretraining with the awesome TensorFlow Model Garden library.
The following LMs are currently supported:
- BERT Pretraining - see pretraining instructions
- Token Dropping for efficient BERT Pretraining - see pretraining instructions
- Training ELECTRA Augmented with Multi-word Selection (TEAMS) - see pretraining instructions
🍷 FineWeb-LMs
Following LMs were pretrained on the (10BT subset) of the famous FineWeb and FineWeb-Edu dataset:
- BERT-based - find the best model checkpoint here
- Token Dropping BERT-based - find the best model checkpoint here
- TEAMS-based - fine the best model checkpoint here
📊 ScandEval Evaluation
To find the best checkpoints and compare our FineWeb-LMs to other models (BERT, ELECTRA and RoBERTa) we perform an evaluation using the great ScandEval library.
Model ID | Avg. Score | CoNLL-En | SST5 | ScaLA-En | SQuAD |
---|---|---|---|---|---|
model-garden-lms/bert-base-finewebs-951k | 69.41 | 89.25 ± 0.4 / 88.9 ± 0.37 | 58.17 ± 1.26 / 59.86 ± 1.65 | 58.83 ± 3.46 / 78.22 ± 2.11 | 55.66 ± 1.19 / 66.36 ± 1.42 |
model-garden-lms/bert-base-token-dropping-finewebs-901k | 68.01 | 88.98 ± 0.64 / 88.67 ± 0.55 | 57.79 ± 1.31 / 58.91 ± 1.85 | 54.25 ± 6.3 / 75.73 ± 3.54 | 54.4 ± 0.72 / 65.31 ± 1.01 |
model-garden-lms/teams-base-finewebs-1m | 72.64 | 89.27 ± 0.41 / 88.82 ± 0.41 | 59.58 ± 0.64 / 62.63 ± 3.0 | 66.72 ± 0.94 / 83.01 ± 0.45 | 59.95 ± 0.71 / 71.13 ± 0.58 |
google-bert/bert-base-cased | 62.26 | 87.39 ± 0.79 / 87.11 ± 0.66 | 54.49 ± 1.36 / 53.22 ± 1.15 | 52.08 ± 2.13 / 74.52 ± 1.31 | 38.63 ± 2.1 / 50.68 ± 1.87 |
google/electra-base-discriminator | 69.26 | 87.82 ± 0.69 / 86.83 ± 0.62 | 62.3 ± 1.12 / 55.93 ± 0.67 | 62.61 ± 1.21 / 80.85 ± 0.59 | 52.51 ± 0.86 / 65.2 ± 0.85 |
FacebookAI/roberta-base | 68.96 | 90.35 ± 0.23 / 90.14 ± 0.2 | 60.95 ± 1.4 / 57.52 ± 1.97 | 50.64 ± 1.69 / 74.55 ± 0.9 | 57.82 ± 1.35 / 69.68 ± 1.02 |
The TEAMS model outperforms RoBERTa and ELECTRA, which were trained on much more data and pretraining steps. All detailed results can be found in this dataset repository.
❤️ Acknowledgements
This repository is the outcome of the last two years of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.
Made from Bavarian Oberland with ❤️ and 🥨.