Optimal settings for running Small-24b using Ollama?
This model seems to be quite sensitive to settings. I've seen many posts and comments on Reddit about this model having weird bugs due to inference settings. Could you share the optimal settings for running this model?
For example, what should the repeat penalty, top p & k values be? And is the optimal temperature indeed 0.15 as stated in the model card?
At a minimum, I've observed that repetition penalty seems to harm this model. Yesterday I had tried running it and thought something was wrong with the ggufs, because it couldn't repeat back certain things that other models easily could (for example- reprinting a sudoku board that it was given; it would add tons of spaces, extra dashes, etc when rendering the board). I stripped out all of my sampler settings and added them back one at a time until I realized that my repetition penalty of 1.2, range of 2048 was the culprit. Once that was completely disabled, the model was able to act more normally.
Depends on the task. If it's task related, function calling, information retrieval, and stuff like that, a very low temp seems best (0.1 - 0.3 'ish), with no other sampler. And I'd avoid repetition penalty in that case.
For more creative pursuits, you can push the temperature surprisingly high, compared to the previous model, before it starts to become incoherent (0.85 Temp + MinP 0.1 is safe, you can experiment with this as a starting point.), I'd add some repetition penalty or a DRY sampler (if you have a back-end that supports it) because it can get very repetitive very quick, like all mistral llm.
Rep Pen of 1.2 on 2048 is a very high setting, especially if the expected output is repetitive by definition (json file, sodoku board ;) ). I'd use a lower penalty on larger body of text, instead (1.05 over 4096, depending on context, are more conservative values you can start with and +/- until you hit a sweet spot).
cc @ollama
temp should be 0.15
I tried everything, and of course setting temp to 0.15. But still this model clearly loses to Qwen2.5-14B-Instruct-1M. Tested on lots of questions. Maybe it's good on specific tasks, like coding, no idea. Seems that nobody talks about this.
I tried everything, and of course setting temp to 0.15. But still this model clearly loses to Qwen2.5-14B-Instruct-1M. Tested on lots of questions. Maybe it's good on specific tasks, like coding, no idea. Seems that nobody talks about this.
It's been talked about in a closed thread you can still check.
https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501/discussions/5
It's excellent at:
- Precise instruction following, like outputting a JSON formatted document, summarizing, titling, any kind of "Do this task in this fashion"
- Information extraction from document (even if 32K tokens might be a bit small for that) / RAG
- Function calling for integration in a tool-chain
- Translation (especially EN <-> FR)
It does okay at:
- Writing single creative short document (aka write me a letter by X explaining Y to Z)
- Basic coding tasks (it's definitely not a coding assistant, though)
It really shouldn't be used as a knowledge bank, same as most generalist models really (but that's another debate), I agree it's not great for this particular use case. In my understanding, and someone corrects me if I'm wrong, such low temperature requirement are kind of an alarm bell in that regard. If a 0.05 temp difference or a comma in a system prompt is going to give wildly different results for a majority of questions, there may be a dataset or training issue going on somewhere.
Reality is, Mistral's instruction tuning is what it is. What will really be interesting is to see what people will do with the base model in the coming weeks and months.
There is a guy (ai voice) on youtube, who really makes me angry. Channel name "AICodeKing". He has 10 questions to test ai models. What i find super suspicious, is that he almost always gets correct answer for his 1 question, even with small models like this one. The question is: Tell me the name of a country whose name ends with 'lia'. Give me the capital city of that country as well. (Correct answer: Australia)
I literally NEVER EVER had this answer with any small model i tested. You can watch his video about Mistral Small 3 24b, where he shows this result.
You can try this question, i tell you, its imposible to get "Australia" answer.
I tried everything, and of course setting temp to 0.15. But still this model clearly loses to Qwen2.5-14B-Instruct-1M. Tested on lots of questions. Maybe it's good on specific tasks, like coding, no idea. Seems that nobody talks about this.
It's been talked about in a closed thread you can still check.
https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501/discussions/5
Ah, I see; the "why this tiny model isn't a pop-culture expert" guy has already been here.
Mistral team, please don't overfit your future model with pop culture. This guy has been spamming the same complaint under every small local model for a year or so.