Updated `hidden_act` to `silu`

#74
by sergiopaniego HF Staff - opened

Hi!

Sergio from the HF team here.
We 've detected that when serving the model via vLLM using --config_format hf, there are some discrepancies between the outputs when comparted to serving the model using --config_format mistral.
Investigating, we've found that when launching using --config_format hf, the model is a Mistral3ForConditionalGeneration instance, while when using --config_format mistral it's a PixtralForConditionalGeneration.
In the second case, the original, there is a silu call here while for the hf version, it is defined here, so taken from this file.

The gelu must be changed to silu.

The issue can be seen here. This is a debug view for instantiating the model using --config_format hf (gelu in the right hand side of the image):
gelu_issue.png

Modifying this file, we can get (silu in the right hand side of the image):
silu_solution.png

This is currently causing problems when evaluating the model. For instance, using mistral-evals and ChartQA, for the --config_format mistral we obtain 0.8612 vs 0.818 using --config_format hf.
Changing this config file, we obtain 0.86.

Thanks!

patrickvonplaten changed pull request status to merged

Sign up or log in to comment