lbourdois commited on
Commit
c014569
·
verified ·
1 Parent(s): 7408e78

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +60 -46
README.md CHANGED
@@ -1,47 +1,61 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - arcee-ai/EvolKit-20k
5
- base_model:
6
- - Qwen/Qwen2.5-1.5B
7
- ---
8
- # EVA-D Qwen2.5-1.5B v0.0
9
-
10
- <p>
11
- An experimental online logit distillation of EVA-Qwen2.5-14B-v0.1 into Qwen2.5-1.5B. Should work as a RP/storywriting specialist, but don't expect superb performance from it, due to it's small size. All in all, it was a fun experiment to do.<br>
12
- </p>
13
-
14
- <p>Note: using quantized KV cache with Qwen2.5 <b>is not recommended</b> and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.</p>
15
-
16
- <p>
17
- <p>Prompt format is ChatML.</p><br>
18
- <h3>Recommended sampler values:</h3>
19
- <ul>
20
- <li>Temperature: 1</li>
21
- <li>Min-P: 0.02</li>
22
- </ul>
23
-
24
- <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
25
-
26
- - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
27
- - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
28
- </p>
29
-
30
- <p>
31
- <br>
32
- <h3>
33
- Distillation data:
34
- </h3>
35
- <ul>
36
- <li>Arcee.AI's <a href=https://huggingface.co/datasets/arcee-ai/EvolKit-20k>EvolKit-20k</a> dataset, which is specifically made for knowledge distillation purposes.</li>
37
- </ul>
38
- <h3>
39
- Training time and hardware:
40
- </h3>
41
- <ul><li>1.8 hours on 8xA100 SXM, provided by Garg</li></ul><br>
42
- </p>
43
- <p>Model was trained by Kearm and Auri.</p>
44
- <h4>Special thanks:</h4><ul>
45
- <li><b>to Garg for generously providing 8xA100 SXM node for this experiment!</b></li>
46
- <li>to Arcee.AI for creating DistillKit and EvolKit-20k dataset, which were used to create this model.</li>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  <li>and to Allura-org for support and feedback on EVA models.</li></ul>
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - arcee-ai/EvolKit-20k
5
+ base_model:
6
+ - Qwen/Qwen2.5-1.5B
7
+ language:
8
+ - zho
9
+ - eng
10
+ - fra
11
+ - spa
12
+ - por
13
+ - deu
14
+ - ita
15
+ - rus
16
+ - jpn
17
+ - kor
18
+ - vie
19
+ - tha
20
+ - ara
21
+ ---
22
+ # EVA-D Qwen2.5-1.5B v0.0
23
+
24
+ <p>
25
+ An experimental online logit distillation of EVA-Qwen2.5-14B-v0.1 into Qwen2.5-1.5B. Should work as a RP/storywriting specialist, but don't expect superb performance from it, due to it's small size. All in all, it was a fun experiment to do.<br>
26
+ </p>
27
+
28
+ <p>Note: using quantized KV cache with Qwen2.5 <b>is not recommended</b> and can lead to degraded output quality. On the other hand, Qwen's KV cache is already light enough, so using f16 for it shouldn't be problematic.</p>
29
+
30
+ <p>
31
+ <p>Prompt format is ChatML.</p><br>
32
+ <h3>Recommended sampler values:</h3>
33
+ <ul>
34
+ <li>Temperature: 1</li>
35
+ <li>Min-P: 0.02</li>
36
+ </ul>
37
+
38
+ <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
39
+
40
+ - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
41
+ - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
42
+ </p>
43
+
44
+ <p>
45
+ <br>
46
+ <h3>
47
+ Distillation data:
48
+ </h3>
49
+ <ul>
50
+ <li>Arcee.AI's <a href=https://huggingface.co/datasets/arcee-ai/EvolKit-20k>EvolKit-20k</a> dataset, which is specifically made for knowledge distillation purposes.</li>
51
+ </ul>
52
+ <h3>
53
+ Training time and hardware:
54
+ </h3>
55
+ <ul><li>1.8 hours on 8xA100 SXM, provided by Garg</li></ul><br>
56
+ </p>
57
+ <p>Model was trained by Kearm and Auri.</p>
58
+ <h4>Special thanks:</h4><ul>
59
+ <li><b>to Garg for generously providing 8xA100 SXM node for this experiment!</b></li>
60
+ <li>to Arcee.AI for creating DistillKit and EvolKit-20k dataset, which were used to create this model.</li>
61
  <li>and to Allura-org for support and feedback on EVA models.</li></ul>