This is a GPT-2 model trained in llm.c on FineWeb.

A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/481. He also has a Python implementation.

Example use with ollama:

ollama run hf.co/ShawnGiese/gpt2_124M_fineweb10 "My super smart team of cloud computing experts are welcoming many new customers. The next thing you might hear about this amazing team is"

Technical

This model has a context window of 1024 tokens.

It was trained on 10 billion tokens from FineWeb and has 124 million parameters. The full GPT-2 model has 1.5 billion parameters, GPT-3 has 175 billion parameters, GPT-4 has upwards of 1 trillion parameters.

Some of the actual GPT-2 code and data specs are released by OpenAI... check them out for a more full featured test GPT and info about their sources. https://github.com/openai/gpt-2/blob/master/model_card.md

Comments

In case anyone tries to build this, be sure to add an hour for data sharding and maybe an extra half hour for installing everything and then downloading the results.

Building this model across eight GPUs only took 25GB VRAM in each, so A100 40GB GPUs should be more than enough. In AWS cloud, I believe this to be around a p4d.24xlarge though I used https://lambda.ai/.

If using a cloud based system, consider using a terminal multiplexer like tmux, just in case of a disconnection.

There were reports of people using a single Nvidia RTX 4090 to build this in around 24 hours. The link above has a lot of info.

Downloads last month
28
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ShawnGiese/gpt2_124M_fineweb10