Optimal settings for running Small-24b using Ollama?

#14

by AaronFeng753 - opened Jan 31

Jan 31

This model seems to be quite sensitive to settings. I've seen many posts and comments on Reddit about this model having weird bugs due to inference settings. Could you share the optimal settings for running this model?

For example, what should the repeat penalty, top p & k values be? And is the optimal temperature indeed 0.15 as stated in the model card?

SomeOddCodeGuy

Jan 31

At a minimum, I've observed that repetition penalty seems to harm this model. Yesterday I had tried running it and thought something was wrong with the ggufs, because it couldn't repeat back certain things that other models easily could (for example- reprinting a sudoku board that it was given; it would add tons of spaces, extra dashes, etc when rendering the board). I stripped out all of my sampler settings and added them back one at a time until I realized that my repetition penalty of 1.2, range of 2048 was the culprit. Once that was completely disabled, the model was able to act more normally.

SerialKicked

Jan 31

•

edited Jan 31

Depends on the task. If it's task related, function calling, information retrieval, and stuff like that, a very low temp seems best (0.1 - 0.3 'ish), with no other sampler. And I'd avoid repetition penalty in that case.

For more creative pursuits, you can push the temperature surprisingly high, compared to the previous model, before it starts to become incoherent (0.85 Temp + MinP 0.1 is safe, you can experiment with this as a starting point.), I'd add some repetition penalty or a DRY sampler (if you have a back-end that supports it) because it can get very repetitive very quick, like all mistral llm.

Rep Pen of 1.2 on 2048 is a very high setting, especially if the expected output is repetitive by definition (json file, sodoku board ;) ). I'd use a lower penalty on larger body of text, instead (1.05 over 4096, depending on context, are more conservative values you can start with and +/- until you hit a sweet spot).

patrickvonplaten

Mistral AI_ org Feb 2

cc @ollama

patrickvonplaten

Mistral AI_ org Feb 2

temp should be 0.15

AaronFeng753

Feb 2

cc @ollama

temp should be 0.15

So ollama default settings + 0.15 temp is already optimal?

urtuuuu

Feb 4

I tried everything, and of course setting temp to 0.15. But still this model clearly loses to Qwen2.5-14B-Instruct-1M. Tested on lots of questions. Maybe it's good on specific tasks, like coding, no idea. Seems that nobody talks about this.

SerialKicked

Feb 4

•

edited Feb 5

I tried everything, and of course setting temp to 0.15. But still this model clearly loses to Qwen2.5-14B-Instruct-1M. Tested on lots of questions. Maybe it's good on specific tasks, like coding, no idea. Seems that nobody talks about this.

It's been talked about in a closed thread you can still check.
https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501/discussions/5

It's excellent at:

Precise instruction following, like outputting a JSON formatted document, summarizing, titling, any kind of "Do this task in this fashion"
Information extraction from document (even if 32K tokens might be a bit small for that) / RAG
Function calling for integration in a tool-chain
Translation (especially EN <-> FR)

It does okay at:

Writing single creative short document (aka write me a letter by X explaining Y to Z)
Basic coding tasks (it's definitely not a coding assistant, though)

It really shouldn't be used as a knowledge bank, same as most generalist models really (but that's another debate), I agree it's not great for this particular use case. In my understanding, and someone corrects me if I'm wrong, such low temperature requirement are kind of an alarm bell in that regard. If a 0.05 temp difference or a comma in a system prompt is going to give wildly different results for a majority of questions, there may be a dataset or training issue going on somewhere.

Reality is, Mistral's instruction tuning is what it is. What will really be interesting is to see what people will do with the base model in the coming weeks and months.

urtuuuu

Feb 4

There is a guy (ai voice) on youtube, who really makes me angry. Channel name "AICodeKing". He has 10 questions to test ai models. What i find super suspicious, is that he almost always gets correct answer for his 1 question, even with small models like this one. The question is: Tell me the name of a country whose name ends with 'lia'. Give me the capital city of that country as well. (Correct answer: Australia)
I literally NEVER EVER had this answer with any small model i tested. You can watch his video about Mistral Small 3 24b, where he shows this result.
You can try this question, i tell you, its imposible to get "Australia" answer.

AaronFeng753

Feb 5

This comment has been hidden

AaronFeng753 changed discussion status to closed Feb 5

PlayAI

May 21

There is a guy (ai voice) on youtube, who really makes me angry. Channel name "AICodeKing". He has 10 questions to test ai models. What i find super suspicious, is that he almost always gets correct answer for his 1 question, even with small models like this one. The question is: Tell me the name of a country whose name ends with 'lia'. Give me the capital city of that country as well. (Correct answer: Australia)
I literally NEVER EVER had this answer with any small model i tested. You can watch his video about Mistral Small 3 24b, where he shows this result.
You can try this question, i tell you, its imposible to get "Australia" answer.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment