mistralai/Mistral-Small-3.1-24B-Instruct-2503 · Updated `hidden

May 8

Hi!

Sergio from the HF team here.
We 've detected that when serving the model via vLLM using --config_format hf, there are some discrepancies between the outputs when comparted to serving the model using --config_format mistral.
Investigating, we've found that when launching using --config_format hf, the model is a Mistral3ForConditionalGeneration instance, while when using --config_format mistral it's a PixtralForConditionalGeneration.
In the second case, the original, there is a silu call here while for the hf version, it is defined here, so taken from this file.

The gelu must be changed to silu.

The issue can be seen here. This is a debug view for instantiating the model using --config_format hf (gelu in the right hand side of the image):

Modifying this file, we can get (silu in the right hand side of the image):

This is currently causing problems when evaluating the model. For instance, using mistral-evals and ChartQA, for the --config_format mistral we obtain 0.8612 vs 0.818 using --config_format hf.
Changing this config file, we obtain 0.86.

Updated `hidden_act` to `silu`2bb8ca36

patrickvonplaten

Mistral AI_ org May 9

Thanks!

patrickvonplaten changed pull request status to merged May 9

Shmood789

May 19

هلا