Token Limit

by Disassemblern - opened 23 days ago

23 days ago

I want to control the length of the AI responses. When I use max_tokens or num_predict parameters that Ollama provides, the response has been cut when it hits to the token limit which cause uncompleted responses. Also I don't want to put a limit by using system prompt. Is there any way to put a token limit that directly affects the model itself to generate responses with the given token limit where the responses is completed and coherent with the user input?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment