lbourdois commited on
Commit
0bed66e
·
verified ·
1 Parent(s): 830a474

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +209 -198
README.md CHANGED
@@ -1,199 +1,210 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - zh
6
- base_model:
7
- - Qwen/Qwen2.5-14B
8
- - Qwen/Qwen2.5-14B-Instruct
9
- - Qwen/Qwen2.5-14B-Instruct-1M
10
- - tanliboy/lambda-qwen2.5-14b-dpo-test
11
- - arcee-ai/SuperNova-Medius
12
- - arcee-ai/Virtuoso-Small-v2
13
- - Azure99/Blossom-V6-14B
14
- - Qwen/Qwen2.5-Coder-14B
15
- - Qwen/Qwen2.5-Coder-14B-Instruct
16
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
17
- - huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
18
- pipeline_tag: text-generation
19
- tags:
20
- - merge
21
- ---
22
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/zx2LWe9rip2AVr76BH4Er.png)
23
- # Qwen2.5-14B-YOYO-V4
24
-
25
- *[Qwen2.5-14B-YOYO-V5 Officially Released!](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-V5)*
26
-
27
- **Key Highlights:**
28
-
29
- *1. Richer Knowledge & Improved Instruction Compliance*
30
-
31
- *2. Integrated Code Model and R1 Distillation for Improved Coding/Reasoning*
32
-
33
- *3. 1M-Token Long Context Window*
34
-
35
-
36
- ## First stage:
37
-
38
- ```yaml
39
- merge_method: sce
40
- models:
41
- # Pivot model
42
- - model: Qwen/Qwen2.5-14B-Instruct-1M
43
- # Target models
44
- - model: Qwen/Qwen2.5-14B
45
- base_model: Qwen/Qwen2.5-14B-Instruct-1M
46
- parameters:
47
- select_topk: 1
48
- dtype: bfloat16
49
- tokenizer_source: base
50
- normalize: true
51
- int8_mask: true
52
- name: Qwen2.5-14B-1M
53
- ```
54
- ```yaml
55
- models:
56
- - model: tanliboy/lambda-qwen2.5-14b-dpo-test
57
- parameters:
58
- density: 1
59
- weight: 1
60
- lambda: 0.9
61
- merge_method: della
62
- base_model: Qwen2.5-14B-1M
63
- parameters:
64
- density: 1
65
- weight: 1
66
- lambda: 0.9
67
- normalize: true
68
- int8_mask: true
69
- dtype: bfloat16
70
- tokenizer_source: base
71
- name: Qwen2.5-14B-1M-della
72
- ```
73
- ## Second stage:
74
-
75
- ```yaml
76
- models:
77
- - model: Qwen/Qwen2.5-14B-Instruct
78
- parameters:
79
- density: 1
80
- weight: 1
81
- lambda: 0.9
82
- - model: Qwen/Qwen2.5-14B-Instruct-1M
83
- parameters:
84
- density: 1
85
- weight: 1
86
- lambda: 0.9
87
- merge_method: della
88
- base_model: arcee-ai/Virtuoso-Small-v2
89
- parameters:
90
- density: 1
91
- weight: 1
92
- lambda: 0.9
93
- normalize: true
94
- int8_mask: true
95
- dtype: bfloat16
96
- tokenizer_source: base
97
- name: Qwen2.5-14B-YOYO-della1
98
- ```
99
- ```yaml
100
- models:
101
- - model: Qwen/Qwen2.5-14B-Instruct
102
- parameters:
103
- density: 1
104
- weight: 1
105
- lambda: 0.9
106
- - model: Qwen/Qwen2.5-14B-Instruct-1M
107
- parameters:
108
- density: 1
109
- weight: 1
110
- lambda: 0.9
111
- merge_method: della
112
- base_model: arcee-ai/SuperNova-Medius
113
- parameters:
114
- density: 1
115
- weight: 1
116
- lambda: 0.9
117
- normalize: true
118
- int8_mask: true
119
- dtype: bfloat16
120
- tokenizer_source: base
121
- name: Qwen2.5-14B-YOYO-della2
122
- ```
123
- ```yaml
124
- models:
125
- - model: Qwen/Qwen2.5-14B-Instruct
126
- parameters:
127
- density: 1
128
- weight: 1
129
- lambda: 0.9
130
- - model: Qwen/Qwen2.5-14B-Instruct-1M
131
- parameters:
132
- density: 1
133
- weight: 1
134
- lambda: 0.9
135
- merge_method: della
136
- base_model: Azure99/Blossom-V6-14B
137
- parameters:
138
- density: 1
139
- weight: 1
140
- lambda: 0.9
141
- normalize: true
142
- int8_mask: true
143
- dtype: bfloat16
144
- tokenizer_source: base
145
- name: Qwen2.5-14B-YOYO-della3
146
- ```
147
- ## Third stage:
148
-
149
- ### Step 1:
150
- ```yaml
151
- models:
152
- - model: Qwen/Qwen2.5-Coder-14B-Instruct
153
- parameters:
154
- density: 1
155
- weight: 1
156
- lambda: 0.9
157
- merge_method: della
158
- base_model: Qwen/Qwen2.5-Coder-14B
159
- parameters:
160
- density: 1
161
- weight: 1
162
- lambda: 0.9
163
- normalize: true
164
- int8_mask: true
165
- dtype: bfloat16
166
- tokenizer_source: base
167
- name: Qwen2.5-Coder-14B-della
168
- ```
169
- ### Step 2:
170
- ```yaml
171
- merge_method: model_stock
172
- base_model: Qwen/Qwen2.5-14B-Instruct
173
- models:
174
- - model: Qwen2.5-Coder-14B-della
175
- - model: arcee-ai/Virtuoso-Small-v2
176
- - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
177
- - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
178
- dtype: bfloat16
179
- tokenizer_source: base
180
- int8_mask: true
181
- normalize: true
182
- name: Qwen2.5-14B-mst
183
- ```
184
- ## Final stage:
185
-
186
- ```yaml
187
- merge_method: model_stock
188
- base_model: Qwen2.5-14B-1M-della
189
- models:
190
- - model: Qwen2.5-14B-della1
191
- - model: Qwen2.5-14B-della2
192
- - model: Qwen2.5-14B-della3
193
- - model: Qwen2.5-14B-mst
194
- dtype: bfloat16
195
- tokenizer_source: base
196
- int8_mask: true
197
- normalize: true
198
- name: YOYO-AI/Qwen2.5-14B-YOYO-V4
 
 
 
 
 
 
 
 
 
 
 
199
  ```
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-14B
19
+ - Qwen/Qwen2.5-14B-Instruct
20
+ - Qwen/Qwen2.5-14B-Instruct-1M
21
+ - tanliboy/lambda-qwen2.5-14b-dpo-test
22
+ - arcee-ai/SuperNova-Medius
23
+ - arcee-ai/Virtuoso-Small-v2
24
+ - Azure99/Blossom-V6-14B
25
+ - Qwen/Qwen2.5-Coder-14B
26
+ - Qwen/Qwen2.5-Coder-14B-Instruct
27
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
28
+ - huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
29
+ pipeline_tag: text-generation
30
+ tags:
31
+ - merge
32
+ ---
33
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e174e202fa032de4143324/zx2LWe9rip2AVr76BH4Er.png)
34
+ # Qwen2.5-14B-YOYO-V4
35
+
36
+ *[Qwen2.5-14B-YOYO-V5 Officially Released!](https://huggingface.co/YOYO-AI/Qwen2.5-14B-YOYO-V5)*
37
+
38
+ **Key Highlights:**
39
+
40
+ *1. Richer Knowledge & Improved Instruction Compliance*
41
+
42
+ *2. Integrated Code Model and R1 Distillation for Improved Coding/Reasoning*
43
+
44
+ *3. 1M-Token Long Context Window*
45
+
46
+
47
+ ## First stage:
48
+
49
+ ```yaml
50
+ merge_method: sce
51
+ models:
52
+ # Pivot model
53
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
54
+ # Target models
55
+ - model: Qwen/Qwen2.5-14B
56
+ base_model: Qwen/Qwen2.5-14B-Instruct-1M
57
+ parameters:
58
+ select_topk: 1
59
+ dtype: bfloat16
60
+ tokenizer_source: base
61
+ normalize: true
62
+ int8_mask: true
63
+ name: Qwen2.5-14B-1M
64
+ ```
65
+ ```yaml
66
+ models:
67
+ - model: tanliboy/lambda-qwen2.5-14b-dpo-test
68
+ parameters:
69
+ density: 1
70
+ weight: 1
71
+ lambda: 0.9
72
+ merge_method: della
73
+ base_model: Qwen2.5-14B-1M
74
+ parameters:
75
+ density: 1
76
+ weight: 1
77
+ lambda: 0.9
78
+ normalize: true
79
+ int8_mask: true
80
+ dtype: bfloat16
81
+ tokenizer_source: base
82
+ name: Qwen2.5-14B-1M-della
83
+ ```
84
+ ## Second stage:
85
+
86
+ ```yaml
87
+ models:
88
+ - model: Qwen/Qwen2.5-14B-Instruct
89
+ parameters:
90
+ density: 1
91
+ weight: 1
92
+ lambda: 0.9
93
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
94
+ parameters:
95
+ density: 1
96
+ weight: 1
97
+ lambda: 0.9
98
+ merge_method: della
99
+ base_model: arcee-ai/Virtuoso-Small-v2
100
+ parameters:
101
+ density: 1
102
+ weight: 1
103
+ lambda: 0.9
104
+ normalize: true
105
+ int8_mask: true
106
+ dtype: bfloat16
107
+ tokenizer_source: base
108
+ name: Qwen2.5-14B-YOYO-della1
109
+ ```
110
+ ```yaml
111
+ models:
112
+ - model: Qwen/Qwen2.5-14B-Instruct
113
+ parameters:
114
+ density: 1
115
+ weight: 1
116
+ lambda: 0.9
117
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
118
+ parameters:
119
+ density: 1
120
+ weight: 1
121
+ lambda: 0.9
122
+ merge_method: della
123
+ base_model: arcee-ai/SuperNova-Medius
124
+ parameters:
125
+ density: 1
126
+ weight: 1
127
+ lambda: 0.9
128
+ normalize: true
129
+ int8_mask: true
130
+ dtype: bfloat16
131
+ tokenizer_source: base
132
+ name: Qwen2.5-14B-YOYO-della2
133
+ ```
134
+ ```yaml
135
+ models:
136
+ - model: Qwen/Qwen2.5-14B-Instruct
137
+ parameters:
138
+ density: 1
139
+ weight: 1
140
+ lambda: 0.9
141
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
142
+ parameters:
143
+ density: 1
144
+ weight: 1
145
+ lambda: 0.9
146
+ merge_method: della
147
+ base_model: Azure99/Blossom-V6-14B
148
+ parameters:
149
+ density: 1
150
+ weight: 1
151
+ lambda: 0.9
152
+ normalize: true
153
+ int8_mask: true
154
+ dtype: bfloat16
155
+ tokenizer_source: base
156
+ name: Qwen2.5-14B-YOYO-della3
157
+ ```
158
+ ## Third stage:
159
+
160
+ ### Step 1:
161
+ ```yaml
162
+ models:
163
+ - model: Qwen/Qwen2.5-Coder-14B-Instruct
164
+ parameters:
165
+ density: 1
166
+ weight: 1
167
+ lambda: 0.9
168
+ merge_method: della
169
+ base_model: Qwen/Qwen2.5-Coder-14B
170
+ parameters:
171
+ density: 1
172
+ weight: 1
173
+ lambda: 0.9
174
+ normalize: true
175
+ int8_mask: true
176
+ dtype: bfloat16
177
+ tokenizer_source: base
178
+ name: Qwen2.5-Coder-14B-della
179
+ ```
180
+ ### Step 2:
181
+ ```yaml
182
+ merge_method: model_stock
183
+ base_model: Qwen/Qwen2.5-14B-Instruct
184
+ models:
185
+ - model: Qwen2.5-Coder-14B-della
186
+ - model: arcee-ai/Virtuoso-Small-v2
187
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
188
+ - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
189
+ dtype: bfloat16
190
+ tokenizer_source: base
191
+ int8_mask: true
192
+ normalize: true
193
+ name: Qwen2.5-14B-mst
194
+ ```
195
+ ## Final stage:
196
+
197
+ ```yaml
198
+ merge_method: model_stock
199
+ base_model: Qwen2.5-14B-1M-della
200
+ models:
201
+ - model: Qwen2.5-14B-della1
202
+ - model: Qwen2.5-14B-della2
203
+ - model: Qwen2.5-14B-della3
204
+ - model: Qwen2.5-14B-mst
205
+ dtype: bfloat16
206
+ tokenizer_source: base
207
+ int8_mask: true
208
+ normalize: true
209
+ name: YOYO-AI/Qwen2.5-14B-YOYO-V4
210
  ```