Triangle104 commited on
Commit
71b9553
·
verified ·
1 Parent(s): 17e63d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -377
README.md CHANGED
@@ -18,409 +18,66 @@ tags:
18
  ---
19
 
20
  # Triangle104/EVA-Qwen2.5-7B-v0.1-Q5_K_S-GGUF
21
- This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
22
- Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) for more details on the model.
23
-
24
-
25
-
26
-
27
-
28
-
29
-
30
-
31
-
32
-
33
 
34
  ---
35
-
36
-
37
  Model details:
38
-
39
-
40
  -
 
 
41
 
 
42
 
 
 
43
 
 
 
 
 
 
44
 
 
 
 
45
 
 
46
 
47
-
48
-
49
- A RP/storywriting
50
- specialist model, full-parameter finetune of Qwen2.5-7B on mixture of
51
- synthetic and natural data.
52
-
53
-
54
-
55
-
56
-
57
-
58
-
59
-
60
- It uses Celeste 70B
61
- 0.1 data mixture, greatly expanding it to improve
62
-
63
-
64
-
65
- versatility,
66
- creativity and "flavor" of the resulting model.
67
-
68
-
69
-
70
-
71
-
72
-
73
-
74
-
75
-
76
-
77
-
78
-
79
-
80
-
81
-
82
-
83
-
84
-
85
-
86
-
87
-
88
-
89
-
90
-
91
-
92
-
93
-
94
-
95
-
96
-
97
-
98
-
99
-
100
-
101
-
102
-
103
-
104
-
105
- Version 0.1 notes:
106
-
107
-
108
- Dataset was deduped
109
- and cleaned from
110
-
111
-
112
-
113
- version 0.0, and
114
- learning rate was adjusted. Resulting model seems to be
115
-
116
-
117
- stabler, and 0.0
118
- problems with handling short inputs and min_p sampling
119
-
120
-
121
- seem to be mostly
122
- gone.
123
-
124
-
125
-
126
-
127
-
128
-
129
-
130
-
131
- Will be retrained
132
- once more, because this run crashed around e1.2 (out
133
-
134
-
135
- of 3) (thanks,
136
- DeepSpeed, really appreciate it), and it's still
137
-
138
-
139
-
140
- somewhat
141
- undertrained as a result.
142
-
143
-
144
-
145
-
146
-
147
-
148
-
149
-
150
-
151
-
152
-
153
-
154
-
155
-
156
-
157
-
158
-
159
-
160
-
161
-
162
-
163
-
164
-
165
-
166
-
167
-
168
-
169
-
170
-
171
-
172
-
173
-
174
-
175
-
176
-
177
-
178
-
179
-
180
- Prompt format is
181
- ChatML.
182
-
183
-
184
-
185
-
186
-
187
-
188
-
189
-
190
-
191
-
192
-
193
-
194
-
195
-
196
- Recommended sampler
197
- values:
198
-
199
-
200
-
201
-
202
-
203
-
204
-
205
 
206
  Temperature: 0.87
207
-
208
-
209
  Top-P: 0.81
 
210
 
 
211
 
212
- Repetition Penalty:
213
- 1.03
214
-
215
-
216
-
217
-
218
-
219
-
220
-
221
-
222
- Model appears to
223
- prefer lower temperatures (at least 0.9 and lower). Min-P seems to
224
- work now, as well.
225
-
226
-
227
-
228
-
229
-
230
-
231
-
232
-
233
- Recommended
234
- SillyTavern presets (via CalamitousFelicitousness):
235
-
236
-
237
-
238
-
239
-
240
-
241
-
242
-
243
-
244
-
245
-
246
-
247
-
248
-
249
-
250
-
251
-
252
-
253
-
254
 
255
  Context
256
 
 
257
 
258
- Instruct and System
259
- Prompt
260
-
261
-
262
-
263
-
264
-
265
-
266
-
267
-
268
-
269
-
270
-
271
-
272
-
273
-
274
-
275
-
276
-
277
-
278
-
279
-
280
-
281
-
282
-
283
-
284
-
285
-
286
-
287
-
288
-
289
-
290
-
291
-
292
-
293
-
294
-
295
-
296
-
297
-
298
- Training data:
299
-
300
-
301
-
302
-
303
-
304
-
305
-
306
-
307
- Celeste 70B 0.1 data
308
- mixture minus Opus Instruct subset. See that model's card for
309
- details.
310
-
311
-
312
- Kalomaze's
313
- Opus_Instruct_25k dataset, filtered for refusals.
314
-
315
-
316
- A subset (1k rows)
317
- of ChatGPT-4o-WritingPrompts by Gryphe
318
-
319
 
320
- A subset (2k rows)
321
- of Sonnet3.5-Charcards-Roleplay by Gryphe
322
-
323
-
324
- A cleaned subset
325
- (~3k rows) of shortstories_synthlabels by Auri
326
-
327
-
328
- Synthstruct and
329
- SynthRP datasets by Epiculous
330
-
331
-
332
-
333
-
334
-
335
-
336
-
337
-
338
-
339
-
340
-
341
-
342
-
343
-
344
- Training time and
345
- hardware:
346
-
347
-
348
-
349
-
350
-
351
-
352
-
353
-
354
- 2 days on 4x3090Ti
355
- (locally)
356
-
357
-
358
-
359
-
360
-
361
-
362
-
363
-
364
-
365
-
366
-
367
-
368
-
369
-
370
-
371
-
372
-
373
-
374
-
375
-
376
-
377
-
378
-
379
-
380
-
381
-
382
-
383
-
384
-
385
-
386
-
387
-
388
- Model was trained by
389
- Kearm and Auri.
390
 
 
 
 
391
 
 
 
392
  Special thanks:
393
 
394
-
395
- to Gryphe, Lemmy,
396
- Kalomaze, Nopm and Epiculous for the data
397
-
398
-
399
- to Alpindale for
400
- helping with FFT config for Qwen2.5
401
-
402
-
403
- and to
404
- InfermaticAI's community for their continued support for our
405
- endeavors
406
-
407
-
408
-
409
-
410
-
411
-
412
-
413
 
414
  ---
415
-
416
-
417
-
418
-
419
-
420
-
421
-
422
-
423
- p { line-height: 115%; margin-bottom: 0.25cm; background: transparent }
424
  ## Use with llama.cpp
425
  Install llama.cpp through brew (works on Mac and Linux)
426
 
 
18
  ---
19
 
20
  # Triangle104/EVA-Qwen2.5-7B-v0.1-Q5_K_S-GGUF
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ---
 
 
23
  Model details:
 
 
24
  -
25
+ This model was converted to GGUF format from [`EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1`](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
26
+ Refer to the [original model card](https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-7B-v0.1) for more details on the model.
27
 
28
+ A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-7B on mixture of synthetic and natural data.
29
 
30
+ It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve
31
+ versatility, creativity and "flavor" of the resulting model.
32
 
33
+ Version 0.1 notes:
34
+ Dataset was deduped and cleaned from
35
+ version 0.0, and learning rate was adjusted. Resulting model seems to be
36
+ stabler, and 0.0 problems with handling short inputs and min_p sampling
37
+ seem to be mostly gone.
38
 
39
+ Will be retrained once more, because this run crashed around e1.2 (out
40
+ of 3) (thanks, DeepSpeed, really appreciate it), and it's still
41
+ somewhat undertrained as a result.
42
 
43
+ Prompt format is ChatML.
44
 
45
+ Recommended sampler values:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  Temperature: 0.87
 
 
48
  Top-P: 0.81
49
+ Repetition Penalty: 1.03
50
 
51
+ Model appears to prefer lower temperatures (at least 0.9 and lower). Min-P seems to work now, as well.
52
 
53
+ Recommended SillyTavern presets (via CalamitousFelicitousness):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  Context
56
 
57
+ Instruct and System Prompt
58
 
59
+ Training data:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
+ Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's card for details.
62
+ Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.
63
+ A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe
64
+ A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe
65
+ A cleaned subset (~3k rows) of shortstories_synthlabels by Auri
66
+ Synthstruct and SynthRP datasets by Epiculous
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
+ Training time and hardware:
69
+
70
+ 2 days on 4x3090Ti (locally)
71
 
72
+ Model was trained by Kearm and Auri.
73
+
74
  Special thanks:
75
 
76
+ to Gryphe, Lemmy, Kalomaze, Nopm and Epiculous for the data
77
+ to Alpindale for helping with FFT config for Qwen2.5
78
+ and to InfermaticAI's community for their continued support for our endeavors
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ---
 
 
 
 
 
 
 
 
 
81
  ## Use with llama.cpp
82
  Install llama.cpp through brew (works on Mac and Linux)
83