Updated `hidden_act` to `silu`
Hi!
Sergio from the HF team here.
We 've detected that when serving the model via vLLM using --config_format hf
, there are some discrepancies between the outputs when comparted to serving the model using --config_format mistral
.
Investigating, we've found that when launching using --config_format hf
, the model is a Mistral3ForConditionalGeneration
instance, while when using --config_format mistral
it's a PixtralForConditionalGeneration
.
In the second case, the original, there is a silu
call here while for the hf
version, it is defined here, so taken from this file.
The gelu
must be changed to silu
.
The issue can be seen here. This is a debug view for instantiating the model using --config_format hf
(gelu
in the right hand side of the image):
Modifying this file, we can get (silu
in the right hand side of the image):
This is currently causing problems when evaluating the model. For instance, using mistral-evals
and ChartQA, for the --config_format mistral
we obtain 0.8612
vs 0.818
using --config_format hf
.
Changing this config file, we obtain 0.86
.
Thanks!