Ed Addario's picture

Ed Addario PRO

eaddario

AI & ML interests

None yet

Recent Activity

updated a model about 14 hours ago
eaddario/gemma-3-12b-it-GGUF
published a model 1 day ago
eaddario/gemma-3-12b-it-GGUF
View all activity

Organizations

None yet

Posts 8

view post
Post
304
HF community survey: What is an acceptable Perplexity (PPL) degradation?

An area of personal research is to find ways to shrink the size of LLMs without incurring in a noticeable loss of capability. All the models in my repo have been generated by quantizing different tensors at different levels based on how much they influence the inference process (see the model's card for more details). This approach produces, on average, a ~10% size reduction with < 1% of PPL penalty.

I'm now focusing on pruning (whole layer removal), as a way to achieve better size reduction, but this comes at the cost of a much higher PPL degradation.

So, the question for the HF community is: what is the lowest/worst PPL correlation coefficient (𝜌PPL) you'd consider acceptable for a quantized model? (e.g. 99%? 95%? 90%? etc)

To clarify, by 𝜌PPL I mean the Cor(ln(PPL(Q)), ln(PPL(base))) statistic generated by llama-perplexity.