Token Limit

#2
by Disassemblern - opened

I want to control the length of the AI responses. When I use max_tokens or num_predict parameters that Ollama provides, the response has been cut when it hits to the token limit which cause uncompleted responses. Also I don't want to put a limit by using system prompt. Is there any way to put a token limit that directly affects the model itself to generate responses with the given token limit where the responses is completed and coherent with the user input?

Sign up or log in to comment