Got it working with llama.cpp
Hi guys,
Thought I'd share my findings on how to run it with llama.cpp, and share the gguf quants of this lora adapter:
https://huggingface.co/QuantPanda/granite-uncertainty-3.2-8b-lora-GGUF
It's a bit hacky to get it working in conversation mode, but it does work, maybe one day it can be implemented in llama.cpp properly.
I think this is important work that you're doing because it gives people a better estimate of how certain a model is of its own answers.
It does leave me wondering how the certainty is estimated? For example, if it's based on the amount of the same references/conclusions in the dataset, then how does it account for, lets say, a single reference that just happens to be very true? Which might get a lower score due to only having one reference. Of course, I'm just speculating how it works here.
Anyhow, I'm just rambling, but I think it's a great experiment :)
Keep up the good work!
Thanks! If you're interested in how the uncertainty score is calibrated, the mentioned paper [Shen et al] in the model card should help clarify the overall philosophy.
Thank you :) I will definitely check that out.