TensorFlow Model Garden LMs

community

AI & ML interests

Language Model Pretraining, TensorFlow Model Garden

Recent Activity

🏡 TensorFlow Model Garden LMs

This organization showcases language model pretraining with the awesome TensorFlow Model Garden library.

The following LMs are currently supported:

🍷 FineWeb-LMs

Following LMs were pretrained on the (10BT subset) of the famous FineWeb and FineWeb-Edu dataset:

📊 ScandEval Evaluation

To find the best checkpoints and compare our FineWeb-LMs to other models (BERT, ELECTRA and RoBERTa) we perform an evaluation using the great ScandEval library.

Model ID Avg. Score CoNLL-En SST5 ScaLA-En SQuAD
model-garden-lms/bert-base-finewebs-951k 69.41 89.25 ± 0.4 / 88.9 ± 0.37 58.17 ± 1.26 / 59.86 ± 1.65 58.83 ± 3.46 / 78.22 ± 2.11 55.66 ± 1.19 / 66.36 ± 1.42
model-garden-lms/bert-base-token-dropping-finewebs-901k 68.01 88.98 ± 0.64 / 88.67 ± 0.55 57.79 ± 1.31 / 58.91 ± 1.85 54.25 ± 6.3 / 75.73 ± 3.54 54.4 ± 0.72 / 65.31 ± 1.01
model-garden-lms/teams-base-finewebs-1m 72.64 89.27 ± 0.41 / 88.82 ± 0.41 59.58 ± 0.64 / 62.63 ± 3.0 66.72 ± 0.94 / 83.01 ± 0.45 59.95 ± 0.71 / 71.13 ± 0.58
google-bert/bert-base-cased 62.26 87.39 ± 0.79 / 87.11 ± 0.66 54.49 ± 1.36 / 53.22 ± 1.15 52.08 ± 2.13 / 74.52 ± 1.31 38.63 ± 2.1 / 50.68 ± 1.87
google/electra-base-discriminator 69.26 87.82 ± 0.69 / 86.83 ± 0.62 62.3 ± 1.12 / 55.93 ± 0.67 62.61 ± 1.21 / 80.85 ± 0.59 52.51 ± 0.86 / 65.2 ± 0.85
FacebookAI/roberta-base 68.96 90.35 ± 0.23 / 90.14 ± 0.2 60.95 ± 1.4 / 57.52 ± 1.97 50.64 ± 1.69 / 74.55 ± 0.9 57.82 ± 1.35 / 69.68 ± 1.02

The TEAMS model outperforms RoBERTa and ELECTRA, which were trained on much more data and pretraining steps. All detailed results can be found in this dataset repository.

❤️ Acknowledgements

This repository is the outcome of the last two years of working with TPUs from the awesome TRC program and the TensorFlow Model Garden library.

Made from Bavarian Oberland with ❤️ and 🥨.