thank you for GGUF!

#1
by jacek2024 - opened

It’s really nice to have GGUF available from IBM.

IBM Granite org

You're welcome! We've heard the signal on the confusion for GGUFs, so we'll now be co-locating official GGUFs here in the ibm-granite org under this collection.

Hey how do you enable thinking using Ollama, LMStudio, etc?

IBM Granite org

Hi @RougueSpud ! There are several ways to enable thinking in Ollama and LM Studio, but only one of them works today:

  1. Using these GGUFs which don't contain the official Ollama chat template, you would need to replicate the logic in the official chat template on the client side to enable thinking (adding the requisite system prompt section here)
  2. If you are using the official models from Ollama, they come with a chat template that supports enabling thinking with a special element to the messages field when making an API call with the following format: {"role": "control", "content": "thinking"}. This, unfortunately, is not accessible through the CLI
  3. Ollama just introduced a new thinking capability in in 0.9.0. This will require some special templating in the chat template to get it to work correctly for Granite. I'm actively working on this for the official Granite models, but it isn't done yet.

At the moment, there isn't a systematic way to use thinking through LM Studio without doing client-side system prompt construction (option [1] above).

IBM Granite org

I've now got updated template versions for Ollama that allow the built-in "think" capability to work. They're pushed to my personal staging account (gabegoodhart/granite3.2, gabegoodhart/granite3.3) while we work to get them on the official library. You can try it out as follows:

ollama pull gabegoodhart/granite3.3
ollama run gabegoodhart/granite3.3 --think "What's the best way to visit all of my clients in my sales region?"

I've now got updated template versions for Ollama that allow the built-in "think" capability to work. They're pushed to my personal staging account (gabegoodhart/granite3.2, gabegoodhart/granite3.3) while we work to get them on the official library. You can try it out as follows:

ollama pull gabegoodhart/granite3.3
ollama run gabegoodhart/granite3.3 --think "What's the best way to visit all of my clients in my sales region?"

Thank you

IBM Granite org

One other important note related to running the GGUF models locally with Ollama: You can easily run them directly from huggingface with a command like the following:

ollama run hf.co/ibm-granite/granite-3.3-8b-instruct-GGUF:Q4_K_M

When running this way, if you want to enable thinking, you will need to do client-side template expansion (eg using apply_chat_template from transformers) and then use raw generation in Ollama.

Sign up or log in to comment