view post Post 2102 Releasing HQQ Llama-3.1-70b 4-bit quantized version! Check it out at mobiuslabsgmbh/Llama-3.1-70b-instruct_4bitgs64_hqq. Achieves 99% of the base model performance across various benchmarks! Details in the model card. 🔥 8 8 + Reply
view post Post 1798 Excited to announce the release of our high-quality Llama-3.1 8B 4-bit HQQ/calibrated quantized model! Achieving an impressive 99.3% relative performance to FP16, it also delivers the fastest inference speed for transformers. mobiuslabsgmbh/Llama-3.1-8b-instruct_4bitgs64_hqq_calib 1 reply · 🔥 9 9 + Reply
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-3bit-metaoffload-HQQ Text Generation • Updated Feb 5 • 4 • 13
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bitgs8-metaoffload-HQQ Text Generation • Updated Feb 5 • 12 • 20
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-metaoffload-HQQ Text Generation • Updated Feb 5 • 24 • 16
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-2bit_g16_s128-HQQ Text Generation • Updated Feb 5 • 11 • 9
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ Text Generation • Updated Feb 5 • 4 • 38
mobiuslabsgmbh/CLIP-ViT-H-14-laion2B-2bit_g16_s128-HQQ Image Classification • Updated Dec 6, 2023 • 16 • 5