gradientai
/

Llama-3-8B-Instruct-Gradient-1048k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

leo-pekelis-gradient commited on Apr 29

Commit

698e052

•

1 Parent(s): 404ab0b

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -25,6 +25,7 @@ This model extends LLama-3 8B's context length from 8k to > 1040K, developed by
 **Infra:**
 We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 1048k tokens on [Crusoe Energy](https://huggingface.co/crusoeai) high performance L40S cluster.
 Notably, we layered parallelism on top of Ring Attention with a custom network topology to better leverage large GPU clusters in the face of network bottlenecks from passing many KV blocks between devices. This gave us a 33x speedup in model training (compare 524k and 1048k to 65k and 262k in the table below).
 **Data:**

 **Infra:**
 We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 1048k tokens on [Crusoe Energy](https://huggingface.co/crusoeai) high performance L40S cluster.
 Notably, we layered parallelism on top of Ring Attention with a custom network topology to better leverage large GPU clusters in the face of network bottlenecks from passing many KV blocks between devices. This gave us a 33x speedup in model training (compare 524k and 1048k to 65k and 262k in the table below).
 **Data:**