Got it working with llama.cpp

#1
by QuantPanda - opened

Hi guys,

Thought I'd share my findings on how to run it with llama.cpp, and share the gguf quants of this lora adapter:

https://huggingface.co/QuantPanda/granite-uncertainty-3.2-8b-lora-GGUF

It's a bit hacky to get it working in conversation mode, but it does work, maybe one day it can be implemented in llama.cpp properly.

I think this is important work that you're doing because it gives people a better estimate of how certain a model is of its own answers.

It does leave me wondering how the certainty is estimated? For example, if it's based on the amount of the same references/conclusions in the dataset, then how does it account for, lets say, a single reference that just happens to be very true? Which might get a lower score due to only having one reference. Of course, I'm just speculating how it works here.

Anyhow, I'm just rambling, but I think it's a great experiment :)
Keep up the good work!

IBM Granite org

Thanks! If you're interested in how the uncertainty score is calibrated, the mentioned paper [Shen et al] in the model card should help clarify the overall philosophy.

Thank you :) I will definitely check that out.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment