Files changed (1) hide show
  1. README.md +239 -233
README.md CHANGED
@@ -1,234 +1,240 @@
1
- ---
2
- language:
3
- - en
4
- license: other
5
- license_name: qwen
6
- license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
7
- library_name: transformers
8
- base_model:
9
- - Qwen/Qwen2.5-32B-Instruct
10
- datasets:
11
- - Magpie-Align/Magpie-Pro-300K-Filtered
12
- model-index:
13
- - name: TheBeagle-v2beta-32B-MGS
14
- results:
15
- - task:
16
- type: text-generation
17
- name: Text Generation
18
- dataset:
19
- name: IFEval (0-Shot)
20
- type: HuggingFaceH4/ifeval
21
- args:
22
- num_few_shot: 0
23
- metrics:
24
- - type: inst_level_strict_acc and prompt_level_strict_acc
25
- value: 45.03
26
- name: strict accuracy
27
- source:
28
- url: >-
29
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
30
- name: Open LLM Leaderboard
31
- - task:
32
- type: text-generation
33
- name: Text Generation
34
- dataset:
35
- name: BBH (3-Shot)
36
- type: BBH
37
- args:
38
- num_few_shot: 3
39
- metrics:
40
- - type: acc_norm
41
- value: 58.07
42
- name: normalized accuracy
43
- source:
44
- url: >-
45
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
46
- name: Open LLM Leaderboard
47
- - task:
48
- type: text-generation
49
- name: Text Generation
50
- dataset:
51
- name: MATH Lvl 5 (4-Shot)
52
- type: hendrycks/competition_math
53
- args:
54
- num_few_shot: 4
55
- metrics:
56
- - type: exact_match
57
- value: 39.43
58
- name: exact match
59
- source:
60
- url: >-
61
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
62
- name: Open LLM Leaderboard
63
- - task:
64
- type: text-generation
65
- name: Text Generation
66
- dataset:
67
- name: GPQA (0-shot)
68
- type: Idavidrein/gpqa
69
- args:
70
- num_few_shot: 0
71
- metrics:
72
- - type: acc_norm
73
- value: 20.13
74
- name: acc_norm
75
- source:
76
- url: >-
77
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
78
- name: Open LLM Leaderboard
79
- - task:
80
- type: text-generation
81
- name: Text Generation
82
- dataset:
83
- name: MuSR (0-shot)
84
- type: TAUR-Lab/MuSR
85
- args:
86
- num_few_shot: 0
87
- metrics:
88
- - type: acc_norm
89
- value: 24.5
90
- name: acc_norm
91
- source:
92
- url: >-
93
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
94
- name: Open LLM Leaderboard
95
- - task:
96
- type: text-generation
97
- name: Text Generation
98
- dataset:
99
- name: MMLU-PRO (5-shot)
100
- type: TIGER-Lab/MMLU-Pro
101
- config: main
102
- split: test
103
- args:
104
- num_few_shot: 5
105
- metrics:
106
- - type: acc
107
- value: 54.57
108
- name: accuracy
109
- source:
110
- url: >-
111
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
112
- name: Open LLM Leaderboard
113
- ---
114
-
115
- # TheBeagle-v2beta-32B-MGS
116
- This model is an experimental version of our latest innovation: `MGS`. Its up to you to figure out what does it means, but its very explicit.
117
- We didn't applied our known `UNA` algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique.
118
- ![TheBeagle-v2-MGS](https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS/resolve/main/TheBeagle-v2-MGS.png)
119
-
120
- ## CHANGELOG
121
- **UPDATE**: 26/Oct
122
- * Updated `tokenizer_config.json` (from the base_model)
123
- * Regenerated Quants (being uploaded)
124
- * Re-submitted Leaderboard Evaluation, MATH & IFeval have relevant updates
125
- * Aligned LICENSE with `Qwen` terms.
126
-
127
- ## MGS
128
- MGS stands for... Many-Geeks-Searching... and thats it. Hint: `1+1 is 2, and 1+1 is not 3`
129
-
130
- We still believe on 1-Epoch should be enough, so we just did 1 Epoch only.
131
-
132
- ## Dataset
133
- Used here the first decent (corpora & size) dataset on the hub: `Magpie-Align/Magpie-Pro-300K-Filtered`
134
- Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate.
135
-
136
- It achieves the following results on the evaluation set:
137
- - Loss: 0.5378 (1 Epoch), outperforming the baseline model.
138
- ## Quants
139
-
140
- [All versions available](https://huggingface.co/fblgit/TheBeagle-v2beta-MGS-GGUF/tree/main)
141
-
142
- ## Licensing terms:
143
-
144
- **On top of the Qwen LICENSE, we add an extra term for derivatives to include "Beagle" or "MGS" on the model name, this will help us to track better the study. Thank you**
145
-
146
- ## Training
147
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
148
-
149
- ### Training hyperparameters
150
-
151
- The following hyperparameters were used during training:
152
- - learning_rate: 8e-05
153
- - train_batch_size: 2
154
- - eval_batch_size: 2
155
- - seed: 42
156
- - distributed_type: multi-GPU
157
- - num_devices: 8
158
- - gradient_accumulation_steps: 4
159
- - total_train_batch_size: 64
160
- - total_eval_batch_size: 16
161
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
162
- - lr_scheduler_type: cosine
163
- - lr_scheduler_warmup_steps: 25
164
- - num_epochs: 1
165
-
166
- ### Training results
167
-
168
- | Training Loss | Epoch | Step | Validation Loss |
169
- |:-------------:|:------:|:----:|:---------------:|
170
- | 9.8642 | 0.0012 | 1 | 0.7195 |
171
- | 2.077 | 0.0507 | 42 | 0.6161 |
172
- | 1.0325 | 0.1014 | 84 | 0.6093 |
173
- | 0.8945 | 0.1520 | 126 | 0.5962 |
174
- | 0.8532 | 0.2027 | 168 | 0.5869 |
175
- | 0.8185 | 0.2534 | 210 | 0.5805 |
176
- | 0.81 | 0.3041 | 252 | 0.5719 |
177
- | 0.7901 | 0.3548 | 294 | 0.5663 |
178
- | 0.7766 | 0.4054 | 336 | 0.5618 |
179
- | 0.7687 | 0.4561 | 378 | 0.5590 |
180
- | 0.7443 | 0.5068 | 420 | 0.5564 |
181
- | 0.7494 | 0.5575 | 462 | 0.5525 |
182
- | 0.7787 | 0.6081 | 504 | 0.5485 |
183
- | 0.7381 | 0.6588 | 546 | 0.5466 |
184
- | 0.7359 | 0.7095 | 588 | 0.5444 |
185
- | 0.7447 | 0.7602 | 630 | 0.5435 |
186
- | 0.7378 | 0.8109 | 672 | 0.5415 |
187
- | 0.7302 | 0.8615 | 714 | 0.5398 |
188
- | 0.7476 | 0.9122 | 756 | 0.5391 |
189
- | 0.715 | 0.9629 | 798 | 0.5378 |
190
-
191
-
192
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) without chat template.
193
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__TheBeagle-v2beta-32B-MGS)
194
-
195
- | Metric |Value|
196
- |-------------------|----:|
197
- |Avg. |40.29|
198
- |IFEval (0-Shot) |45.03|
199
- |BBH (3-Shot) |58.07|
200
- |MATH Lvl 5 (4-Shot)|39.43|
201
- |GPQA (0-shot) |20.13|
202
- |MuSR (0-shot) |24.50|
203
- |MMLU-PRO (5-shot) |54.57|
204
-
205
- ## Thanks
206
- - Qwen Team for their outstanding model
207
- - MagPie Team for contributing plenty of datasets
208
- - Cybertron Cloud Compute
209
-
210
- # Citations
211
- ```
212
- @misc{thebeagle-v2,
213
- title={TheBeagle v2: MGS},
214
- author={Xavier Murias},
215
- year={2024},
216
- publisher = {HuggingFace},
217
- journal = {HuggingFace repository},
218
- howpublished = {\url{https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS}},
219
- }
220
- @misc{qwen2.5,
221
- title = {Qwen2.5: A Party of Foundation Models},
222
- url = {https://qwenlm.github.io/blog/qwen2.5/},
223
- author = {Qwen Team},
224
- month = {September},
225
- year = {2024}
226
- }
227
-
228
- @article{qwen2,
229
- title={Qwen2 Technical Report},
230
- author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
231
- journal={arXiv preprint arXiv:2407.10671},
232
- year={2024}
233
- }
 
 
 
 
 
 
234
  ```
 
1
+ ---
2
+ language:
3
+ - zho
4
+ - eng
5
+ - fra
6
+ - spa
7
+ - por
8
+ - deu
9
+ - ita
10
+ - rus
11
+ - jpn
12
+ - kor
13
+ - vie
14
+ - tha
15
+ - ara
16
+ license: other
17
+ license_name: qwen
18
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
19
+ library_name: transformers
20
+ base_model:
21
+ - Qwen/Qwen2.5-32B-Instruct
22
+ datasets:
23
+ - Magpie-Align/Magpie-Pro-300K-Filtered
24
+ model-index:
25
+ - name: TheBeagle-v2beta-32B-MGS
26
+ results:
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: IFEval (0-Shot)
32
+ type: HuggingFaceH4/ifeval
33
+ args:
34
+ num_few_shot: 0
35
+ metrics:
36
+ - type: inst_level_strict_acc and prompt_level_strict_acc
37
+ value: 45.03
38
+ name: strict accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: BBH (3-Shot)
47
+ type: BBH
48
+ args:
49
+ num_few_shot: 3
50
+ metrics:
51
+ - type: acc_norm
52
+ value: 58.07
53
+ name: normalized accuracy
54
+ source:
55
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
56
+ name: Open LLM Leaderboard
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: MATH Lvl 5 (4-Shot)
62
+ type: hendrycks/competition_math
63
+ args:
64
+ num_few_shot: 4
65
+ metrics:
66
+ - type: exact_match
67
+ value: 39.43
68
+ name: exact match
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: GPQA (0-shot)
77
+ type: Idavidrein/gpqa
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 20.13
83
+ name: acc_norm
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MuSR (0-shot)
92
+ type: TAUR-Lab/MuSR
93
+ args:
94
+ num_few_shot: 0
95
+ metrics:
96
+ - type: acc_norm
97
+ value: 24.5
98
+ name: acc_norm
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: MMLU-PRO (5-shot)
107
+ type: TIGER-Lab/MMLU-Pro
108
+ config: main
109
+ split: test
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 54.57
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/TheBeagle-v2beta-32B-MGS
118
+ name: Open LLM Leaderboard
119
+ ---
120
+
121
+ # TheBeagle-v2beta-32B-MGS
122
+ This model is an experimental version of our latest innovation: `MGS`. Its up to you to figure out what does it means, but its very explicit.
123
+ We didn't applied our known `UNA` algorithm to the forward pass, but they are entirely compatible and operates in different parts of the neural network and in different ways, tho they both can be seen as a regularization technique.
124
+ ![TheBeagle-v2-MGS](https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS/resolve/main/TheBeagle-v2-MGS.png)
125
+
126
+ ## CHANGELOG
127
+ **UPDATE**: 26/Oct
128
+ * Updated `tokenizer_config.json` (from the base_model)
129
+ * Regenerated Quants (being uploaded)
130
+ * Re-submitted Leaderboard Evaluation, MATH & IFeval have relevant updates
131
+ * Aligned LICENSE with `Qwen` terms.
132
+
133
+ ## MGS
134
+ MGS stands for... Many-Geeks-Searching... and thats it. Hint: `1+1 is 2, and 1+1 is not 3`
135
+
136
+ We still believe on 1-Epoch should be enough, so we just did 1 Epoch only.
137
+
138
+ ## Dataset
139
+ Used here the first decent (corpora & size) dataset on the hub: `Magpie-Align/Magpie-Pro-300K-Filtered`
140
+ Kudos to the Magpie team to contribute with some decent stuff that I personally think is very good to ablate.
141
+
142
+ It achieves the following results on the evaluation set:
143
+ - Loss: 0.5378 (1 Epoch), outperforming the baseline model.
144
+ ## Quants
145
+
146
+ [All versions available](https://huggingface.co/fblgit/TheBeagle-v2beta-MGS-GGUF/tree/main)
147
+
148
+ ## Licensing terms:
149
+
150
+ **On top of the Qwen LICENSE, we add an extra term for derivatives to include "Beagle" or "MGS" on the model name, this will help us to track better the study. Thank you**
151
+
152
+ ## Training
153
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
154
+
155
+ ### Training hyperparameters
156
+
157
+ The following hyperparameters were used during training:
158
+ - learning_rate: 8e-05
159
+ - train_batch_size: 2
160
+ - eval_batch_size: 2
161
+ - seed: 42
162
+ - distributed_type: multi-GPU
163
+ - num_devices: 8
164
+ - gradient_accumulation_steps: 4
165
+ - total_train_batch_size: 64
166
+ - total_eval_batch_size: 16
167
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
168
+ - lr_scheduler_type: cosine
169
+ - lr_scheduler_warmup_steps: 25
170
+ - num_epochs: 1
171
+
172
+ ### Training results
173
+
174
+ | Training Loss | Epoch | Step | Validation Loss |
175
+ |:-------------:|:------:|:----:|:---------------:|
176
+ | 9.8642 | 0.0012 | 1 | 0.7195 |
177
+ | 2.077 | 0.0507 | 42 | 0.6161 |
178
+ | 1.0325 | 0.1014 | 84 | 0.6093 |
179
+ | 0.8945 | 0.1520 | 126 | 0.5962 |
180
+ | 0.8532 | 0.2027 | 168 | 0.5869 |
181
+ | 0.8185 | 0.2534 | 210 | 0.5805 |
182
+ | 0.81 | 0.3041 | 252 | 0.5719 |
183
+ | 0.7901 | 0.3548 | 294 | 0.5663 |
184
+ | 0.7766 | 0.4054 | 336 | 0.5618 |
185
+ | 0.7687 | 0.4561 | 378 | 0.5590 |
186
+ | 0.7443 | 0.5068 | 420 | 0.5564 |
187
+ | 0.7494 | 0.5575 | 462 | 0.5525 |
188
+ | 0.7787 | 0.6081 | 504 | 0.5485 |
189
+ | 0.7381 | 0.6588 | 546 | 0.5466 |
190
+ | 0.7359 | 0.7095 | 588 | 0.5444 |
191
+ | 0.7447 | 0.7602 | 630 | 0.5435 |
192
+ | 0.7378 | 0.8109 | 672 | 0.5415 |
193
+ | 0.7302 | 0.8615 | 714 | 0.5398 |
194
+ | 0.7476 | 0.9122 | 756 | 0.5391 |
195
+ | 0.715 | 0.9629 | 798 | 0.5378 |
196
+
197
+
198
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) without chat template.
199
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__TheBeagle-v2beta-32B-MGS)
200
+
201
+ | Metric |Value|
202
+ |-------------------|----:|
203
+ |Avg. |40.29|
204
+ |IFEval (0-Shot) |45.03|
205
+ |BBH (3-Shot) |58.07|
206
+ |MATH Lvl 5 (4-Shot)|39.43|
207
+ |GPQA (0-shot) |20.13|
208
+ |MuSR (0-shot) |24.50|
209
+ |MMLU-PRO (5-shot) |54.57|
210
+
211
+ ## Thanks
212
+ - Qwen Team for their outstanding model
213
+ - MagPie Team for contributing plenty of datasets
214
+ - Cybertron Cloud Compute
215
+
216
+ # Citations
217
+ ```
218
+ @misc{thebeagle-v2,
219
+ title={TheBeagle v2: MGS},
220
+ author={Xavier Murias},
221
+ year={2024},
222
+ publisher = {HuggingFace},
223
+ journal = {HuggingFace repository},
224
+ howpublished = {\url{https://huggingface.co/fblgit/TheBeagle-v2beta-32B-MGS}},
225
+ }
226
+ @misc{qwen2.5,
227
+ title = {Qwen2.5: A Party of Foundation Models},
228
+ url = {https://qwenlm.github.io/blog/qwen2.5/},
229
+ author = {Qwen Team},
230
+ month = {September},
231
+ year = {2024}
232
+ }
233
+
234
+ @article{qwen2,
235
+ title={Qwen2 Technical Report},
236
+ author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
237
+ journal={arXiv preprint arXiv:2407.10671},
238
+ year={2024}
239
+ }
240
  ```