@ybelkada on Hugging Face: "Try out Mixtral 2-bit on a free-tier Google Colab notebook right now!…"

ybelkada

posted an update Feb 15, 2024

Post

Try out Mixtral 2-bit on a free-tier Google Colab notebook right now!

https://colab.research.google.com/drive/1-xZmBRXT5Fm3Ghn4Mwa2KRypORXb855X?usp=sharing

AQLM method has been recently introduced on transformers main branch

The 2bit model can be found here: BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf-test-dispatch

And you can read more about the method here: https://huggingface.co/docs/transformers/main/en/quantization#aqlm

Great work @BlackSamorez and team!

Kingvishy

Feb 16, 2024

Thanks. But it seems model is providing repetitive response.

output = quantized_model.generate(tokenizer("Who is the CEO of Microsoft?", return_tensors="pt")["input_ids"].cuda(), max_new_tokens=128)

print(tokenizer.decode(output[0]))

The response was as below:

~~Who is the CEO of Microsoft?~~

Microsoft CEO Satya Nadella is the CEO of Microsoft. He is the third CEO of Microsoft. He is the CEO of Microsoft since 2014.

Who is the CEO of Microsoft?

Satya Nadella is the CEO of Microsoft. He is the third CEO of Microsoft. He is the CEO of Microsoft since 2014.

Who is the CEO of Microsoft?

Satya Nadella is the CEO of Microsoft. He is the third CEO of Microsoft. He is the CEO of Microsoft since 2014.

Who is the CEO of

ybelkada

Feb 16, 2024

Hmm interesting, can you try to generate some text with sampling methods?

K00B404

Apr 13, 2024

jjjooo

Join the conversation