thank you for GGUF!

by jacek2024 - opened Apr 16

Discussion

jacek2024

Apr 16

It’s really nice to have GGUF available from IBM.

gabegoodhart

IBM Granite org Apr 16

You're welcome! We've heard the signal on the confusion for GGUFs, so we'll now be co-locating official GGUFs here in the ibm-granite org under this collection.

RougueSpud

May 30

Hey how do you enable thinking using Ollama, LMStudio, etc?

gabegoodhart

IBM Granite org Jun 2

Hi @RougueSpud ! There are several ways to enable thinking in Ollama and LM Studio, but only one of them works today:

Using these GGUFs which don't contain the official Ollama chat template, you would need to replicate the logic in the official chat template on the client side to enable thinking (adding the requisite system prompt section here)
If you are using the official models from Ollama, they come with a chat template that supports enabling thinking with a special element to the messages field when making an API call with the following format: {"role": "control", "content": "thinking"}. This, unfortunately, is not accessible through the CLI
Ollama just introduced a new thinking capability in in 0.9.0. This will require some special templating in the chat template to get it to work correctly for Granite. I'm actively working on this for the official Granite models, but it isn't done yet.

At the moment, there isn't a systematic way to use thinking through LM Studio without doing client-side system prompt construction (option [1] above).

gabegoodhart

IBM Granite org Jun 3

I've now got updated template versions for Ollama that allow the built-in "think" capability to work. They're pushed to my personal staging account (gabegoodhart/granite3.2, gabegoodhart/granite3.3) while we work to get them on the official library. You can try it out as follows:

ollama pull gabegoodhart/granite3.3
ollama run gabegoodhart/granite3.3 --think "What's the best way to visit all of my clients in my sales region?"

RougueSpud

Jun 9

I've now got updated template versions for Ollama that allow the built-in "think" capability to work. They're pushed to my personal staging account (gabegoodhart/granite3.2, gabegoodhart/granite3.3) while we work to get them on the official library. You can try it out as follows:
ollama pull gabegoodhart/granite3.3
ollama run gabegoodhart/granite3.3 --think "What's the best way to visit all of my clients in my sales region?"

Thank you

gabegoodhart

IBM Granite org Jun 11

One other important note related to running the GGUF models locally with Ollama: You can easily run them directly from huggingface with a command like the following:

ollama run hf.co/ibm-granite/granite-3.3-8b-instruct-GGUF:Q4_K_M

When running this way, if you want to enable thinking, you will need to do client-side template expansion (eg using apply_chat_template from transformers) and then use raw generation in Ollama.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment