Spaces:

mteb
/

leaderboard

Running on CPU Upgrade

App Files Files Community

169

Language distribution in MTEB

#83

by maiia-bocharova - opened Mar 9, 2024

Discussion

maiia-bocharova

Mar 9, 2024

Hello, I am writing a paper for my PhD (on text embeddings for Ukrainian language) and I want to include information about language distribution in MTEB (maybe token per language) into my paper. How can I get such statistics? I did not find anything apart from number of languages in the official MTEB paper.
Can you please help?

KennethEnevoldsen

Massive Text Embedding Benchmark org Mar 10, 2024

Hi @Maiia, I believe Ukrainian is only included in the bitext mining tasks. You can easily search the GitHub repo and see it here:

https://github.com/search?q=repo%3Aembeddings-benchmark%2Fmteb+%22uk%22&type=code

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment