difference in performence - AutoModel vs. Sentence transformence

by yearivig - opened Jun 9, 2024

Jun 9, 2024

Hi,
recently I checked the mteb benchmark (focused on the classifications benchmarks), and I got difference results when I used the model loaded with Automodel (and did last token pooling) than loaded the model through Sentencetransformer package (with the default config). Can someone help me figure this one up?

Muennighoff

GritLM org Jun 9, 2024

The model usage is documented here: https://github.com/ContextualAI/gritlm?tab=readme-ov-file#inference
It is not compatible with Sentence Transformers and does not use last token pooling, so these will lead to suboptimal performance.

yearivig

Jun 9, 2024

•

edited Jun 9, 2024

So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?

Muennighoff

GritLM org Jun 9, 2024

So are you saying load the model with grit package as model = GritLM("GritLM/GritLM-7B", torch_dtype="auto")
should give Me the best results on mteb?

Yes! You should be able to get the same results as GritLM-7B, you can e.g. use this script: https://github.com/ContextualAI/gritlm/blob/main/README.md#embedding

yearivig

Jun 9, 2024

Thank you!
Actually, I’m looking for the right configuration to use this model loaded with Automodel and which pooling method should I use. I want to use the option of add past_key_values to my context, which is available in Automodel package. Do you familiar with such configuration?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment