Criztov commited on
Commit
a92aab6
1 Parent(s): 56438be

NVIDIA framework and contribution updates.

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -30,7 +30,8 @@ tags:
30
 
31
  ## Model Summary
32
 
33
- StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
 
34
 
35
  - **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
36
  - **Paper:** [Link](https://huggingface.co/datasets/bigcode/the-stack-v2/)
@@ -135,11 +136,11 @@ The model has been trained on source code from 600+ programming languages. The p
135
 
136
  ## Hardware
137
 
138
- - **GPUs:** 1024 A100
139
 
140
  ## Software
141
 
142
- - **Framework:** [NeMo](https://github.com/NVIDIA/NeMo)
143
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
144
 
145
  # License
 
30
 
31
  ## Model Summary
32
 
33
+ StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages from [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2-train), with opt-out requests excluded. The model uses [Grouped Query Attention](https://arxiv.org/abs/2305.13245), [a context window of 16,384 tokens](https://arxiv.org/abs/2205.14135) with [a sliding window attention of 4,096 tokens](https://arxiv.org/abs/2004.05150v2), and was trained using the [Fill-in-the-Middle objective](https://arxiv.org/abs/2207.14255) on 4+ trillion tokens.
34
+ The model was trained with [NVIDIA NeMo™ Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/) using the [NVIDIA Eos Supercomputer](https://blogs.nvidia.com/blog/eos/) built with [NVIDIA DGX H100](https://www.nvidia.com/en-us/data-center/dgx-h100/) systems.
35
 
36
  - **Project Website:** [bigcode-project.org](https://www.bigcode-project.org)
37
  - **Paper:** [Link](https://huggingface.co/datasets/bigcode/the-stack-v2/)
 
136
 
137
  ## Hardware
138
 
139
+ - **GPUs:** 1024 x H100
140
 
141
  ## Software
142
 
143
+ - **Framework:** [NeMo Framework](https://www.nvidia.com/en-us/ai-data-science/generative-ai/nemo-framework/)
144
  - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
145
 
146
  # License