difference in performence - AutoModel vs. Sentence transformence

#8
by yearivig - opened

Hi,
recently I checked the mteb benchmark (focused on the classifications benchmarks), and I got difference results when I used the model loaded with Automodel (and did last token pooling) than loaded the model through Sentencetransformer package (with the default config). Can someone help me figure this one up?

GritLM org

The model usage is documented here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#inference
It is not compatible with Sentence Transformers and does not use last token pooling, so these will lead to suboptimal performance.

So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?

GritLM org

So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?

Yes! You should be able to get the same results as GritLM-7B, you can e.g. use this script: https://github.com/ContextualAI/gritlm/blob/main/README.md#embedding

Thank you!
Actually, I’m looking for the right configuration to use this model loaded with Automodel and which pooling method should I use. I want to use the option of add past_key_values to my context, which is available in Automodel package. Do you familiar with such configuration?

Sign up or log in to comment