- The GPT -2 model was trained on the BookCorpus dataset for 60K steps.
- No position embedding was used (NoPE).
- Here is the wandb report
- This is for educational purposes only.
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support