Uploading EXL2 quants

Amazing @bullerwins . Just curious:

How did you quantize them / which scripts?

which UI / Inference engine are you using for the exl2 quants and local inference?

I'm using Turboderp's exllamav2 https://github.com/turboderp/exllamav2

Script:
python3 convert.py -i gradientai_Llama-3-8B-Instruct-262k/ -o /temp/ -cf gradientai_Llama-3-8B-Instruct-262k_exl2_5.0bpw/ -b 5.0

I'm testing it using Oobabooga's Textgen webui for inference: https://github.com/oobabooga/text-generation-webui

michaelfeil

Gradient AI org May 3

@bullerwins Working on a revised version (with better chat alignment). Would you be up for creating quants? Id be linking them in the readme again

Also: We released the better alignment for 70-B.
https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-262k

michaelfeil

Gradient AI org May 4

@bullerwins We just upgraded the weights, you should see a drastic improvment over the previous iteration.

bullerwins

May 4

Working on the upgraded 8B weights exl2 quants as well as 70B with better alignment

bullerwins

May 4

Exl2 quants for today's (4th of May 2024) updated weights:

I called them v2 for clarity

8.0bpw https://huggingface.co/bullerwins/gradientai_Llama-3-8B-Instruct-262k_v2_exl2_8.0bpw
6.0bpw https://huggingface.co/bullerwins/gradientai_Llama-3-8B-Instruct-262k_v2_exl2_6.0bpw
5.0bpw https://huggingface.co/bullerwins/gradientai_Llama-3-8B-Instruct-262k_v2_exl2_5.0bpw

Opening a PR for the readme

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment