What is the difference between the quants?

#3
by tarruda - opened

Sorry if this is a dumb question, but I see that you have published multiple quants and they all have same size.

Can you clarify what is the difference?

From their docs:
Any quant smaller than f16, including 2-bit — has minimal accuracy loss, since only some parts (e.g., attention layers) are lower bit while most remain full-precision. That’s why sizes are close to the f16 model; for example, the 2-bit (11.5 GB) version performs nearly the same as the full 16-bit (14 GB) one. Once llama.cpp supports better quantization for these models, we'll upload them ASAP.

Unsloth AI org

From their docs:
Any quant smaller than f16, including 2-bit — has minimal accuracy loss, since only some parts (e.g., attention layers) are lower bit while most remain full-precision. That’s why sizes are close to the f16 model; for example, the 2-bit (11.5 GB) version performs nearly the same as the full 16-bit (14 GB) one. Once llama.cpp supports better quantization for these models, we'll upload them ASAP.

Correct. With proper llama.cpp quantization, the sizes will be much different

More like what is the point of lower quants for this model when they are the same size.

@Lamamanx advertisment for unsloth. what else. they gotta be the first one no matter what.

More like what is the point of lower quants for this model when they are the same size.

Just in case someone wants to use them and more choice is always better. E.g. the f16 one is 66gb but someone might have a 64gb device and it won't fit so they'd rather use a smaller one

@Lamamanx advertisment for unsloth. what else. they gotta be the first one no matter what.

How exactly is it an advertisement for Unsloth? Other people uploaded quants just like this with similar sizes as well. This is unfortunately a temporary limitation of llama.cpp which they're going to work on and once they fix it we can reupload the quants. It's not like we're blasting everywhere that there are many different sizes to run?

In fact in all our social media posts, we only post about one size to run and that is the f16 one. So I further don't understand what you mean by advertisement.

Sign up or log in to comment