Token Limit
#2
by
Disassemblern
- opened
I want to control the length of the AI responses. When I use max_tokens or num_predict parameters that Ollama provides, the response has been cut when it hits to the token limit which cause uncompleted responses. Also I don't want to put a limit by using system prompt. Is there any way to put a token limit that directly affects the model itself to generate responses with the given token limit where the responses is completed and coherent with the user input?