Quant-Cartel
/

experiment_1_8b-iMat-GGUF

Inference Endpoints

Model card Files Files and versions Community

InferenceIllusionist commited on May 21, 2024

Commit

3107ac6

·

verified ·

1 Parent(s): cb1a95f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ PROUDLY PRESENTS
 <b>Quantization Note: Use repetition penalty (--repeat-penalty on llama.cpp) of 1.05 - 1.15 for best results </b>
-Quantized from fp16.
 * Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 189 chunks and n_ctx=512
 * This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
 * The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file

 <b>Quantization Note: Use repetition penalty (--repeat-penalty on llama.cpp) of 1.05 - 1.15 for best results </b>
+Quantized from fp16 with love.
 * Weighted quantizations were creating using fp16 GGUF and [groups_merged-enhancedV2-TurboMini.txt](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-9432658) in 189 chunks and n_ctx=512
 * This method of calculating the importance matrix showed improvements in some areas for Mistral 7b and Llama3 8b models, see above post for details
 * The enhancedv2-turbomini file appends snippets from turboderp's calibration data to the standard groups_merged.txt file