• The GPT -2 model was trained on the BookCorpus dataset for 60K steps.
  • No position embedding was used (NoPE).
  • Here is the wandb report
  • This is for educational purposes only.
Downloads last month
20
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train arun-AiBharat/gpt-2-bookcorpus