--- license: mit datasets: - bookcorpus/bookcorpus language: - en library_name: transformers --- * The GPT -2 model was trained on the BookCorpus dataset for 60K steps. * No position embedding was used (NoPE). * [Here](https://wandb.ai/a-arun283-iit-madras/gpt-2-BooKcorpus-WarmUpLr/reports/Pretraining-GPT-2---Vmlldzo5MDY3MDk5) is the wandb report * This is for educational purposes only.