No parameter file?

#10
by elluAI - opened

https://ollama.com/library/granite3.3
I was looking for a default configs, but there is no parameter file, or related information, stating the parameters.
What are recommended configuration to run this model with?

IBM Granite org

Hi @elluAI , thanks for checking out the model! For the Ollama model, we intentionally leave the parameters unset so the user has full choice and the experience matches the default experience for other models in Ollama.

The main parameter that we recommend setting is num_ctx. By default Ollama launches with 4k context length (num_ctx = 4096). Since 3.1, the granite series has supported 128k (num_ctx = 131072). The reason for the low default is to avoid consuming too much VRAM by default since many users don't need long context support for basic usage and the inference engine needs to pre-allocate VRAM for the maximum context size. You can set num_ctx in several ways:

  1. Set it on the Ollama CLI (>>> /set parameter num_ctx 131072)
  2. Set with an individual inference request in the API (eg in this sample request)
  3. Create a "derived" model locally that sets the context length in the Modelfile (ollama create granite3.3:8b-128k -f <(echo -e "FROM granite3.3:8b\nPARAMETER num_ctx 131072"))

You can find full details on running Granite on Ollama the docs: https://www.ibm.com/granite/docs/run/granite-with-ollama/mac/

Thanks for responding.

Hi @elluAI , thanks for checking out the model! For the Ollama model, we intentionally leave the parameters unset so the user has full choice and the experience matches the default experience for other models in Ollama.

Interesting decision.

The main parameter that we recommend setting is num_ctx. By default Ollama launches with 4k context length (num_ctx = 4096). Since 3.1, the granite series has supported 128k (num_ctx = 131072). The reason for the low default is to avoid consuming too much VRAM by default since many users don't need long context support for basic usage and the inference engine needs to pre-allocate VRAM for the maximum context size. You can set num_ctx in several ways:

  1. Set it on the Ollama CLI (>>> /set parameter num_ctx 131072)
  2. Set with an individual inference request in the API (eg in this sample request)
  3. Create a "derived" model locally that sets the context length in the Modelfile (ollama create granite3.3:8b-128k -f <(echo -e "FROM granite3.3:8b\nPARAMETER num_ctx 131072"))

You can find full details on running Granite on Ollama the docs: https://www.ibm.com/granite/docs/run/granite-with-ollama/mac/

num_ctx was the only parameter I came across in my research.

elluAI changed discussion status to closed

Sign up or log in to comment