Meta-Llama-3.1-8B-Claude-iMat-GGUF

7/28 Update:

  • Reconverted using llama.cpp b3479, adds llama 3.1 rope scaling factors to llama conversion and inference, improving results for context windows above 8192
  • Importance matrix re-calculated with updated fp16 gguf
  • If using Kobold.cpp make sure you are on v1.71.1 or later to take advantage of rope scaling

Quantized from Meta-Llama-3.1-8B-Claude fp16

  • Weighted quantizations were creating using fp16 GGUF and groups_merged.txt in 88 chunks and n_ctx=512
  • Static fp16 will also be included in repo
  • For a brief rundown of iMatrix quant performance please see this PR
  • All quants are verified working prior to uploading to repo for your safety and convenience

KL-Divergence Reference Chart (Click on image to view in full size)

Original model card can be found here

Downloads last month
913
GGUF
Model size
8.03B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for InferenceIllusionist/Meta-Llama-3.1-8B-Claude-iMat-GGUF

Quantized
(10)
this model

Space using InferenceIllusionist/Meta-Llama-3.1-8B-Claude-iMat-GGUF 1

Collection including InferenceIllusionist/Meta-Llama-3.1-8B-Claude-iMat-GGUF