Files changed (1) hide show
  1. README.md +202 -196
README.md CHANGED
@@ -1,197 +1,203 @@
1
- ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- library_name: transformers
6
- tags:
7
- - mergekit
8
- - merge
9
- - lazymergekit
10
- base_model:
11
- - Qwen/Qwen2.5-32B-Instruct
12
- license_name: tongyi-qianwen
13
- license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
14
- pipeline_tag: text-generation
15
- model-index:
16
- - name: BigQwen2.5-52B-Instruct
17
- results:
18
- - task:
19
- type: text-generation
20
- name: Text Generation
21
- dataset:
22
- name: IFEval (0-Shot)
23
- type: HuggingFaceH4/ifeval
24
- args:
25
- num_few_shot: 0
26
- metrics:
27
- - type: inst_level_strict_acc and prompt_level_strict_acc
28
- value: 79.29
29
- name: strict accuracy
30
- source:
31
- url: >-
32
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
33
- name: Open LLM Leaderboard
34
- - task:
35
- type: text-generation
36
- name: Text Generation
37
- dataset:
38
- name: BBH (3-Shot)
39
- type: BBH
40
- args:
41
- num_few_shot: 3
42
- metrics:
43
- - type: acc_norm
44
- value: 59.81
45
- name: normalized accuracy
46
- source:
47
- url: >-
48
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
49
- name: Open LLM Leaderboard
50
- - task:
51
- type: text-generation
52
- name: Text Generation
53
- dataset:
54
- name: MATH Lvl 5 (4-Shot)
55
- type: hendrycks/competition_math
56
- args:
57
- num_few_shot: 4
58
- metrics:
59
- - type: exact_match
60
- value: 17.82
61
- name: exact match
62
- source:
63
- url: >-
64
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
65
- name: Open LLM Leaderboard
66
- - task:
67
- type: text-generation
68
- name: Text Generation
69
- dataset:
70
- name: GPQA (0-shot)
71
- type: Idavidrein/gpqa
72
- args:
73
- num_few_shot: 0
74
- metrics:
75
- - type: acc_norm
76
- value: 6.94
77
- name: acc_norm
78
- source:
79
- url: >-
80
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
81
- name: Open LLM Leaderboard
82
- - task:
83
- type: text-generation
84
- name: Text Generation
85
- dataset:
86
- name: MuSR (0-shot)
87
- type: TAUR-Lab/MuSR
88
- args:
89
- num_few_shot: 0
90
- metrics:
91
- - type: acc_norm
92
- value: 10.45
93
- name: acc_norm
94
- source:
95
- url: >-
96
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
97
- name: Open LLM Leaderboard
98
- - task:
99
- type: text-generation
100
- name: Text Generation
101
- dataset:
102
- name: MMLU-PRO (5-shot)
103
- type: TIGER-Lab/MMLU-Pro
104
- config: main
105
- split: test
106
- args:
107
- num_few_shot: 5
108
- metrics:
109
- - type: acc
110
- value: 50.22
111
- name: accuracy
112
- source:
113
- url: >-
114
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
115
- name: Open LLM Leaderboard
116
- ---
117
- # BigQwen2.5-52B-Instruct
118
-
119
- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/98GiKtmH1AtHHbIbOUH4Y.jpeg)
120
-
121
- BigQwen2.5-52B-Instruct is a [Qwen/Qwen2-32B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
122
-
123
- It applies the [mlabonne/Meta-Llama-3-120B-Instruct](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct/) recipe.
124
-
125
- I made it due to popular demand but I haven't tested it so use it at your own risk. ¯\\\_(ツ)_/¯
126
-
127
- ## 🔍 Applications
128
-
129
- It might be good for creative writing tasks. I recommend a context length of 32k but you can go up to 131,072 tokens in theory.
130
-
131
- ## 🏆 Evaluation
132
-
133
- | Metric |BigQwen2.5-Echo-47B-Instruct|**BigQwen2.5-52B-Instruct**|Qwen2.5-32B-Instruct|
134
- |-------------------|----:|----:|----:|
135
- |Avg. |30.31|37.42|36.17|
136
- |IFEval (0-Shot) |73.57|79.29|83.46|
137
- |BBH (3-Shot) |44.52|59.81|56.49|
138
- |MATH Lvl 5 (4-Shot)| 3.47|17.82|0|
139
- |GPQA (0-shot) | 8.61| 6.94|11.74|
140
- |MuSR (0-shot) |10.19|10.45|13.5|
141
- |MMLU-PRO (5-shot) |41.49|50.22|51.85|
142
-
143
- ## 🧩 Configuration
144
-
145
- The following YAML configuration was used to produce this model:
146
-
147
- ```yaml
148
- slices:
149
- - sources:
150
- - layer_range: [0, 16]
151
- model: Qwen/Qwen2.5-32B-Instruct
152
- - sources:
153
- - layer_range: [8, 24]
154
- model: Qwen/Qwen2.5-32B-Instruct
155
- - sources:
156
- - layer_range: [16, 32]
157
- model: Qwen/Qwen2.5-32B-Instruct
158
- - sources:
159
- - layer_range: [24, 40]
160
- model: Qwen/Qwen2.5-32B-Instruct
161
- - sources:
162
- - layer_range: [32, 48]
163
- model: Qwen/Qwen2.5-32B-Instruct
164
- - sources:
165
- - layer_range: [40, 56]
166
- model: Qwen/Qwen2.5-32B-Instruct
167
- - sources:
168
- - layer_range: [56, 64]
169
- model: Qwen/Qwen2.5-32B-Instruct
170
- merge_method: passthrough
171
- dtype: bfloat16
172
- ```
173
-
174
- ## 💻 Usage
175
-
176
- ```python
177
- !pip install -qU transformers accelerate
178
-
179
- from transformers import AutoTokenizer
180
- import transformers
181
- import torch
182
-
183
- model = "mlabonne/BigQwen2.5-52B-Instruct"
184
- messages = [{"role": "user", "content": "What is a large language model?"}]
185
-
186
- tokenizer = AutoTokenizer.from_pretrained(model)
187
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
188
- pipeline = transformers.pipeline(
189
- "text-generation",
190
- model=model,
191
- torch_dtype=torch.float16,
192
- device_map="auto",
193
- )
194
-
195
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
196
- print(outputs[0]["generated_text"])
 
 
 
 
 
 
197
  ```
 
1
+ ---
2
+ language:
3
+ - zho
4
+ - eng
5
+ - fra
6
+ - spa
7
+ - por
8
+ - deu
9
+ - ita
10
+ - rus
11
+ - jpn
12
+ - kor
13
+ - vie
14
+ - tha
15
+ - ara
16
+ license: apache-2.0
17
+ library_name: transformers
18
+ tags:
19
+ - mergekit
20
+ - merge
21
+ - lazymergekit
22
+ base_model:
23
+ - Qwen/Qwen2.5-32B-Instruct
24
+ license_name: tongyi-qianwen
25
+ license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
26
+ pipeline_tag: text-generation
27
+ model-index:
28
+ - name: BigQwen2.5-52B-Instruct
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: IFEval (0-Shot)
35
+ type: HuggingFaceH4/ifeval
36
+ args:
37
+ num_few_shot: 0
38
+ metrics:
39
+ - type: inst_level_strict_acc and prompt_level_strict_acc
40
+ value: 79.29
41
+ name: strict accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: BBH (3-Shot)
50
+ type: BBH
51
+ args:
52
+ num_few_shot: 3
53
+ metrics:
54
+ - type: acc_norm
55
+ value: 59.81
56
+ name: normalized accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: MATH Lvl 5 (4-Shot)
65
+ type: hendrycks/competition_math
66
+ args:
67
+ num_few_shot: 4
68
+ metrics:
69
+ - type: exact_match
70
+ value: 17.82
71
+ name: exact match
72
+ source:
73
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: GPQA (0-shot)
80
+ type: Idavidrein/gpqa
81
+ args:
82
+ num_few_shot: 0
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 6.94
86
+ name: acc_norm
87
+ source:
88
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: MuSR (0-shot)
95
+ type: TAUR-Lab/MuSR
96
+ args:
97
+ num_few_shot: 0
98
+ metrics:
99
+ - type: acc_norm
100
+ value: 10.45
101
+ name: acc_norm
102
+ source:
103
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: MMLU-PRO (5-shot)
110
+ type: TIGER-Lab/MMLU-Pro
111
+ config: main
112
+ split: test
113
+ args:
114
+ num_few_shot: 5
115
+ metrics:
116
+ - type: acc
117
+ value: 50.22
118
+ name: accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/BigQwen2.5-52B-Instruct
121
+ name: Open LLM Leaderboard
122
+ ---
123
+ # BigQwen2.5-52B-Instruct
124
+
125
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/98GiKtmH1AtHHbIbOUH4Y.jpeg)
126
+
127
+ BigQwen2.5-52B-Instruct is a [Qwen/Qwen2-32B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) self-merge made with [MergeKit](https://github.com/arcee-ai/mergekit/tree/main).
128
+
129
+ It applies the [mlabonne/Meta-Llama-3-120B-Instruct](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct/) recipe.
130
+
131
+ I made it due to popular demand but I haven't tested it so use it at your own risk. ¯\\\_(ツ)_/¯
132
+
133
+ ## 🔍 Applications
134
+
135
+ It might be good for creative writing tasks. I recommend a context length of 32k but you can go up to 131,072 tokens in theory.
136
+
137
+ ## 🏆 Evaluation
138
+
139
+ | Metric |BigQwen2.5-Echo-47B-Instruct|**BigQwen2.5-52B-Instruct**|Qwen2.5-32B-Instruct|
140
+ |-------------------|----:|----:|----:|
141
+ |Avg. |30.31|37.42|36.17|
142
+ |IFEval (0-Shot) |73.57|79.29|83.46|
143
+ |BBH (3-Shot) |44.52|59.81|56.49|
144
+ |MATH Lvl 5 (4-Shot)| 3.47|17.82|0|
145
+ |GPQA (0-shot) | 8.61| 6.94|11.74|
146
+ |MuSR (0-shot) |10.19|10.45|13.5|
147
+ |MMLU-PRO (5-shot) |41.49|50.22|51.85|
148
+
149
+ ## 🧩 Configuration
150
+
151
+ The following YAML configuration was used to produce this model:
152
+
153
+ ```yaml
154
+ slices:
155
+ - sources:
156
+ - layer_range: [0, 16]
157
+ model: Qwen/Qwen2.5-32B-Instruct
158
+ - sources:
159
+ - layer_range: [8, 24]
160
+ model: Qwen/Qwen2.5-32B-Instruct
161
+ - sources:
162
+ - layer_range: [16, 32]
163
+ model: Qwen/Qwen2.5-32B-Instruct
164
+ - sources:
165
+ - layer_range: [24, 40]
166
+ model: Qwen/Qwen2.5-32B-Instruct
167
+ - sources:
168
+ - layer_range: [32, 48]
169
+ model: Qwen/Qwen2.5-32B-Instruct
170
+ - sources:
171
+ - layer_range: [40, 56]
172
+ model: Qwen/Qwen2.5-32B-Instruct
173
+ - sources:
174
+ - layer_range: [56, 64]
175
+ model: Qwen/Qwen2.5-32B-Instruct
176
+ merge_method: passthrough
177
+ dtype: bfloat16
178
+ ```
179
+
180
+ ## 💻 Usage
181
+
182
+ ```python
183
+ !pip install -qU transformers accelerate
184
+
185
+ from transformers import AutoTokenizer
186
+ import transformers
187
+ import torch
188
+
189
+ model = "mlabonne/BigQwen2.5-52B-Instruct"
190
+ messages = [{"role": "user", "content": "What is a large language model?"}]
191
+
192
+ tokenizer = AutoTokenizer.from_pretrained(model)
193
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
194
+ pipeline = transformers.pipeline(
195
+ "text-generation",
196
+ model=model,
197
+ torch_dtype=torch.float16,
198
+ device_map="auto",
199
+ )
200
+
201
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
202
+ print(outputs[0]["generated_text"])
203
  ```