Update README.md
Browse files
README.md
CHANGED
@@ -28,7 +28,7 @@ Run them directly with [llama.cpp](https://github.com/ggerganov/llama.cpp), or a
|
|
28 |
|
29 |
I finally decided to go through llama-quant.cpp and update some of the tensor types, especially for MoE models, since they've kind of been left as-is since the original Mixtral.
|
30 |
|
31 |
-
These changes overall apply a bit more logic to the types, bumping a few values here and there across the board. These changes seem to have an overall positive impact on the results.
|
32 |
|
33 |
IQ2_XXS may not be final, the size increase is quite substantial so I may want to claw it back a bit to keep it in a better spot, still working but wanted to explain these new uploads. PR to llama.cpp will be opened when I'm done investigating.
|
34 |
|
|
|
28 |
|
29 |
I finally decided to go through llama-quant.cpp and update some of the tensor types, especially for MoE models, since they've kind of been left as-is since the original Mixtral.
|
30 |
|
31 |
+
These changes overall apply a bit more logic to the types, bumping a few values here and there across the board. These changes seem to have an overall positive impact on the results. They're similar to what Unsloth accomplished but they're in a more generic (and hopefully upstreamable) way.
|
32 |
|
33 |
IQ2_XXS may not be final, the size increase is quite substantial so I may want to claw it back a bit to keep it in a better spot, still working but wanted to explain these new uploads. PR to llama.cpp will be opened when I'm done investigating.
|
34 |
|