DavidAU commited on
Commit
1bec10d
·
verified ·
1 Parent(s): b066d47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -7
README.md CHANGED
@@ -76,6 +76,8 @@ Higher temps will result in deeper, richer "thoughts"... and frankly more intere
76
 
77
  With the MOE setup, this model's thinking/output is even stronger.
78
 
 
 
79
  The "thinking/reasoning" tech (for the model at this repo) is from the original Llama 3.1 "DeepHermes" model from NousResearch:
80
 
81
  [ https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview ]
@@ -85,6 +87,38 @@ Please visit their repo for all information on features, test results and so on.
85
 
86
  ---
87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  <B>IMPORTANT OPERATING INSTRUCTIONS:</B>
89
 
90
  This is an instruct model with reasoning crafted onto the 6 CORE models in a MOE config.
@@ -106,12 +140,6 @@ Note that the reasoning/thinking section is often a lot less "tame" than the fin
106
 
107
  Suggest a minimum context of 4k , but 8k is better due to reasoning/output blocks.
108
 
109
- MAX QUANTS:
110
-
111
- There will be two max quants, IQ4XS and Q8 ("MAX" in the file name).
112
-
113
- The thinking/output will be enhanced by the output tensor being enlarged to bf16.
114
-
115
  KNOWN ISSUES:
116
 
117
  - You may need to hit regen sometimes to get the thinking/reasoning to activate / get a good "thinking block".
@@ -119,7 +147,6 @@ KNOWN ISSUES:
119
  - Sometimes the thinking block will end, and you need to manually prompt the model to "generate" the output.
120
  - This model can sometimes generate really long output and/or never want to "end" the output - close to a rant, but deeper. It is surprising what it can come up with.
121
 
122
-
123
  <B>USE CASES:</B>
124
 
125
  This model is for all use cases, and but designed for creative use cases specifically.
 
76
 
77
  With the MOE setup, this model's thinking/output is even stronger.
78
 
79
+ The "Horror Imatrix" was built using Grand Horror 16B (at my repo). This adds a "tint" of horror to the model.
80
+
81
  The "thinking/reasoning" tech (for the model at this repo) is from the original Llama 3.1 "DeepHermes" model from NousResearch:
82
 
83
  [ https://huggingface.co/NousResearch/DeepHermes-3-Llama-3-8B-Preview ]
 
87
 
88
  ---
89
 
90
+ <b>"HORROR IMATRIX" and Quants</b>
91
+
92
+ A strong, in house built, imatrix dataset built by David_AU which results in better overall function,
93
+ instruction following, output quality and stronger connections to ideas, concepts and the world in general.
94
+
95
+ This chart shows the order in terms of "BPW" for each quant (mapped below with relative "strength" to one another) with "IQ1_S" with the least, and "Q8_0" (F16 is full precision) with the most:
96
+
97
+ <small>
98
+ <PRE>
99
+ IQ1_S | IQ1_M
100
+ IQ2_XXS | IQ2_XS | Q2_K_S | IQ2_S | Q2_K | IQ2_M
101
+ IQ3_XXS | Q3_K_S | IQ3_XS | IQ3_S | IQ3_M | Q3_K_M | Q3_K_L
102
+ Q4_K_S | IQ4_XS | IQ4_NL | Q4_K_M
103
+ Q5_K_S | Q5_K_M
104
+ Q6_K
105
+ Q8_0
106
+ F16
107
+ </pre>
108
+ </small>
109
+
110
+ Recommend quants IQ3s / IQ4XS / IQ4NL / Q4s for best results for creative.
111
+
112
+ IQ4XS/IQ4NL quants will produce different output from other "Q" and "IQ" quants.
113
+
114
+ The "horror tint" will be strongest at IQ4s (1st choice) / Q4s (2nd choice) and lower.
115
+
116
+ Recommend q5s/q6/q8 for general usage.
117
+
118
+ Note that IQ1s performance is acceptable, whereas IQ2s+ are strong.
119
+
120
+ More information on quants is in the document below "Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers".
121
+
122
  <B>IMPORTANT OPERATING INSTRUCTIONS:</B>
123
 
124
  This is an instruct model with reasoning crafted onto the 6 CORE models in a MOE config.
 
140
 
141
  Suggest a minimum context of 4k , but 8k is better due to reasoning/output blocks.
142
 
 
 
 
 
 
 
143
  KNOWN ISSUES:
144
 
145
  - You may need to hit regen sometimes to get the thinking/reasoning to activate / get a good "thinking block".
 
147
  - Sometimes the thinking block will end, and you need to manually prompt the model to "generate" the output.
148
  - This model can sometimes generate really long output and/or never want to "end" the output - close to a rant, but deeper. It is surprising what it can come up with.
149
 
 
150
  <B>USE CASES:</B>
151
 
152
  This model is for all use cases, and but designed for creative use cases specifically.