Training data

#1
by kardosdrur - opened

Hi @Haon-Chen !
I'm Márton Kardos, maintainer of MTEB.
As a part of our recent efforts to extend MTEB to be a fully multilingual benchmark, we have been trying to collect information on which tasks in our benchmarks can be considered in/out-of-domain for different models (essentially track which models were finetuned on which benchmark tasks). This is useful to our users, as they can not only see how well models perform on the tasks, but also how reliably this might indicate the models' generalization performance.
I am writing to you, as we still lack training data annotations on your models, some of which are already present on our leaderboard.
If you have the time, I would like to ask you to submit a pull request to our repository with these additions to our model metadata annotations.
You can find more information about how to do this here: https://github.com/embeddings-benchmark/mteb/blob/main/docs/adding_a_model.md
Your model already has a metadata object, we would just like to ask you to fill out the missing fields.
Thanks in advance for your help.

Márton

As far as I can tell from your paper, you have evaluated your model both in a zero-shot and finetuned setting. Can you tell me whether the model posted here in this repo is before, or after, the fine-tuning on MTEB stage?
And if it is, can you also upload the model that was only trained on the synthetic data? I would expect that the benchmark scores on the leaderboard would better reflect the zero-shot model's generalization performance

Sign up or log in to comment