Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ashercn97 
posted an update 2 days ago
Post
3105
does anyone know what the SOTA in text embedding is? Specifically for like sentence similarity and clustering?

I think that the MTEB leaderboard is super complex. I feel lost looking at it (what metric should I judge by?)

I would say, sort by "Mean (task)" and pick one of those. Or if you can, compare three of the best on your data. That holds unless you need a longer context, or you are in medical or similar field where there are domain-specific models

·

Oh wait this makes sense.

I have created some benchmarks from user data-- maybe i make my own leaderboard haha.

Thanks for the help!

·

Yes ive seen! Thank you. My issue is the 100 requests a day..

I think it is NV-Embed-v2, with a score of 72.31 on MTEB

·

Oh this is good 2 know!