Simingh commited on
Commit
6f7974c
·
verified ·
1 Parent(s): 0270c4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -4
README.md CHANGED
@@ -41,7 +41,21 @@ library_name: transformers
41
  | OpenCoder-1.5B-Instruct | 4K | 🤗 [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
42
  | OpenCoder-8B-Instruct | 8K | 🤗 [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
43
 
44
- ## 3. Benchmarks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  **Note:** For the detailed evaluation results, please refer to [our paper](https://arxiv.org/pdf/2411.04905).
47
 
@@ -65,7 +79,7 @@ library_name: transformers
65
  | MultiPL-E (AVG) | 57.5 | 71.0 | -->
66
 
67
 
68
- ## 4. Inference
69
 
70
  ### Inference with Huggingface's Transformers
71
 
@@ -90,11 +104,11 @@ print(result)
90
 
91
  <!-- ### Inference with vLLM (recommended) -->
92
 
93
- ## 5. License
94
 
95
  OpenCoder series (including Base and Chat) support commercial applications under a permissive [License](https://huggingface.co/infly/OpenCoder-8B-Base/blob/main/LICENSE).
96
 
97
- ## 6. Citation
98
  ```
99
  @inproceedings{Huang2024OpenCoderTO,
100
  title={OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models},
 
41
  | OpenCoder-1.5B-Instruct | 4K | 🤗 [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
42
  | OpenCoder-8B-Instruct | 8K | 🤗 [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
43
 
44
+
45
+ ## 3. Datasets
46
+
47
+ ### Pre-training
48
+
49
+ | Dataset | Size | Download |
50
+ |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
51
+ | fineweb-code-corpus | 148 GB | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-code-corpus) |
52
+ | fineweb-math-corpus | 10 GB | 🤗 [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-math-corpus) |
53
+
54
+
55
+ **This is not the end; we are organizing the remaining data and uploading it progressively.**
56
+
57
+
58
+ ## 4. Benchmarks
59
 
60
  **Note:** For the detailed evaluation results, please refer to [our paper](https://arxiv.org/pdf/2411.04905).
61
 
 
79
  | MultiPL-E (AVG) | 57.5 | 71.0 | -->
80
 
81
 
82
+ ## 5. Inference
83
 
84
  ### Inference with Huggingface's Transformers
85
 
 
104
 
105
  <!-- ### Inference with vLLM (recommended) -->
106
 
107
+ ## 6. License
108
 
109
  OpenCoder series (including Base and Chat) support commercial applications under a permissive [License](https://huggingface.co/infly/OpenCoder-8B-Base/blob/main/LICENSE).
110
 
111
+ ## 7. Citation
112
  ```
113
  @inproceedings{Huang2024OpenCoderTO,
114
  title={OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models},