Text Generation
GGUF
English
creative
creative writing
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
llama 3.1
llama-3
llama3
llama-3.1
science fiction
romance
all genres
story
writing
vivid prosing
vivid writing
fiction
roleplaying
bfloat16
swearing
role play
sillytavern
backyard
horror
context 128k
mergekit
Merge
6X8B
Mixture of Experts
mixture of experts
Not-For-All-Audiences
imatrix
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -76,6 +76,8 @@ Higher temps will result in deeper, richer "thoughts"... and frankly more intere
|
|
76 |
|
77 |
With the MOE setup, this model's thinking/output is even stronger.
|
78 |
|
|
|
|
|
79 |
The "thinking/reasoning" tech (for the model at this repo) is from the original Llama 3.1 "DeepHermes" model from NousResearch:
|
80 |
|
81 |
[ https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview ]
|
@@ -85,6 +87,38 @@ Please visit their repo for all information on features, test results and so on.
|
|
85 |
|
86 |
---
|
87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
<B>IMPORTANT OPERATING INSTRUCTIONS:</B>
|
89 |
|
90 |
This is an instruct model with reasoning crafted onto the 6 CORE models in a MOE config.
|
@@ -106,12 +140,6 @@ Note that the reasoning/thinking section is often a lot less "tame" than the fin
|
|
106 |
|
107 |
Suggest a minimum context of 4k , but 8k is better due to reasoning/output blocks.
|
108 |
|
109 |
-
MAX QUANTS:
|
110 |
-
|
111 |
-
There will be two max quants, IQ4XS and Q8 ("MAX" in the file name).
|
112 |
-
|
113 |
-
The thinking/output will be enhanced by the output tensor being enlarged to bf16.
|
114 |
-
|
115 |
KNOWN ISSUES:
|
116 |
|
117 |
- You may need to hit regen sometimes to get the thinking/reasoning to activate / get a good "thinking block".
|
@@ -119,7 +147,6 @@ KNOWN ISSUES:
|
|
119 |
- Sometimes the thinking block will end, and you need to manually prompt the model to "generate" the output.
|
120 |
- This model can sometimes generate really long output and/or never want to "end" the output - close to a rant, but deeper. It is surprising what it can come up with.
|
121 |
|
122 |
-
|
123 |
<B>USE CASES:</B>
|
124 |
|
125 |
This model is for all use cases, and but designed for creative use cases specifically.
|
|
|
76 |
|
77 |
With the MOE setup, this model's thinking/output is even stronger.
|
78 |
|
79 |
+
The "Horror Imatrix" was built using Grand Horror 16B (at my repo). This adds a "tint" of horror to the model.
|
80 |
+
|
81 |
The "thinking/reasoning" tech (for the model at this repo) is from the original Llama 3.1 "DeepHermes" model from NousResearch:
|
82 |
|
83 |
[ https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview ]
|
|
|
87 |
|
88 |
---
|
89 |
|
90 |
+
<b>"HORROR IMATRIX" and Quants</b>
|
91 |
+
|
92 |
+
A strong, in house built, imatrix dataset built by David_AU which results in better overall function,
|
93 |
+
instruction following, output quality and stronger connections to ideas, concepts and the world in general.
|
94 |
+
|
95 |
+
This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" (F16 is full precision) with the most:
|
96 |
+
|
97 |
+
<small>
|
98 |
+
<PRE>
|
99 |
+
IQ1_S | IQ1_M
|
100 |
+
IQ2_XXS | IQ2_XS | Q2_K_S | IQ2_S | Q2_K | IQ2_M
|
101 |
+
IQ3_XXS | Q3_K_S | IQ3_XS | IQ3_S | IQ3_M | Q3_K_M | Q3_K_L
|
102 |
+
Q4_K_S | IQ4_XS | IQ4_NL | Q4_K_M
|
103 |
+
Q5_K_S | Q5_K_M
|
104 |
+
Q6_K
|
105 |
+
Q8_0
|
106 |
+
F16
|
107 |
+
</pre>
|
108 |
+
</small>
|
109 |
+
|
110 |
+
Recommend quants IQ3s / IQ4XS / IQ4NL / Q4s for best results for creative.
|
111 |
+
|
112 |
+
IQ4XS/IQ4NL quants will produce different output from other "Q" and "IQ" quants.
|
113 |
+
|
114 |
+
The "horror tint" will be strongest at IQ4s (1st choice) / Q4s (2nd choice) and lower.
|
115 |
+
|
116 |
+
Recommend q5s/q6/q8 for general usage.
|
117 |
+
|
118 |
+
Note that IQ1s performance is acceptable, whereas IQ2s+ are strong.
|
119 |
+
|
120 |
+
More information on quants is in the document below "Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers".
|
121 |
+
|
122 |
<B>IMPORTANT OPERATING INSTRUCTIONS:</B>
|
123 |
|
124 |
This is an instruct model with reasoning crafted onto the 6 CORE models in a MOE config.
|
|
|
140 |
|
141 |
Suggest a minimum context of 4k , but 8k is better due to reasoning/output blocks.
|
142 |
|
|
|
|
|
|
|
|
|
|
|
|
|
143 |
KNOWN ISSUES:
|
144 |
|
145 |
- You may need to hit regen sometimes to get the thinking/reasoning to activate / get a good "thinking block".
|
|
|
147 |
- Sometimes the thinking block will end, and you need to manually prompt the model to "generate" the output.
|
148 |
- This model can sometimes generate really long output and/or never want to "end" the output - close to a rant, but deeper. It is surprising what it can come up with.
|
149 |
|
|
|
150 |
<B>USE CASES:</B>
|
151 |
|
152 |
This model is for all use cases, and but designed for creative use cases specifically.
|