ShawnGiese/gpt2_124M_fineweb10

This is a GPT-2 model trained using llm.c on FineWeb.

A lot more detailed information is here: https://github.com/karpathy/llm.c/discussions/481. He also has a Python implementation.

Example use with ollama:

ollama run hf.co/ShawnGiese/gpt2_124M_fineweb10 "My super smart team of cloud computing experts are welcoming many new customers. The next thing you might hear about this amazing team is"

Technical

This model has a context window of 1024 tokens.

It was trained on 10 billion tokens from FineWeb and has 124 million parameters. Model comparison:

124 million parameters my model
1,500 million parameters full GPT-2
175,000 million parameters GPT-3
1,000,000+ million parameters GPT-4

Some of the actual GPT-2 code and data specs are released by OpenAI... check them out for a more full featured test GPT and info about their sources. https://github.com/openai/gpt-2/blob/master/model_card.md

Comments

In case anyone tries to build this, be sure to add an hour for data sharding and maybe an extra half hour for installing everything and then downloading the results.

Building this model across eight GPUs only took 25GB VRAM in each, so A100 40GB GPUs should be more than enough. In AWS cloud, I believe this to be around a p4d.24xlarge though I used https://lambda.ai/.

If using a cloud based system, consider using a terminal multiplexer like tmux, just in case of a disconnection.

There were reports of people using a single Nvidia RTX 4090 to build this in around 24 hours. The link above has a lot of info.

ShawnGiese
/

gpt2_124M_fineweb10

Technical

Comments

Dataset used to train ShawnGiese/gpt2_124M_fineweb10