YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Custom GGUF quants of arcee-ai/Llama-3.1-SuperNova-Lite, where the Output Tensors are quantized to Q8_0 while the Embeddings are kept at F32. Enjoy! 🧠πŸ”₯πŸš€

UPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4067 (54ef9cfc). This particular version of llama.cpp made it so all KQ mat_mul computations were done in F32 vs BF16, when using FA (Flash Attention). This change, plus the other very impactful prior change, which made all KQ mat_muls be computed with F32 (float32) precision for CUDA-Enabled devices, has compoundedly enhanced the O.E.IQuants and has made it excitingly necessary for this update to be pushed. Cheers!

Downloads last month
407
GGUF
Model size
8.03B params
Architecture
llama

4-bit

6-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF