Update README.md
Browse files
README.md
CHANGED
@@ -11,4 +11,25 @@ This is an experimental 2x8B moe with random gates, using the following 2 models
|
|
11 |
|
12 |
***Important***
|
13 |
|
14 |
-
Make sure to add `</s>` a stop sequence as it uses llama-3-cat-8B-instruct-V1 as the base model.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
***Important***
|
13 |
|
14 |
+
Make sure to add `</s>` a stop sequence as it uses llama-3-cat-8B-instruct-V1 as the base model.
|
15 |
+
|
16 |
+
Update:
|
17 |
+
|
18 |
+
Due to request i decided to add the rest of the quants. Enjoy
|
19 |
+
|
20 |
+
|
21 |
+
Mergekit recipe of the model if too lazy to check the files:
|
22 |
+
|
23 |
+
```
|
24 |
+
base_model: TheSkullery/llama-3-cat-8b-instruct-v1
|
25 |
+
gate_mode: random
|
26 |
+
dtype: bfloat16
|
27 |
+
experts_per_token: 2
|
28 |
+
experts:
|
29 |
+
- source_model: TheSkullery/llama-3-cat-8b-instruct-v1
|
30 |
+
positive_prompts:
|
31 |
+
- " "
|
32 |
+
- source_model: NousResearch/Hermes-2-Theta-Llama-3-8B
|
33 |
+
positive_prompts:
|
34 |
+
- " "
|
35 |
+
```
|