Source of training queries of BGE-EN-ICL
#13
by
ftvalentini
- opened
I have some questions regarding the origin of the training queries used for BGE-EN-ICL, which have no training queries in BEIR:
- quora: 10k test, 5k dev queries in beir -- bge-full-data has 60202 queries.
- scidocsrr: 1k test queries in beir -- bge-full-data has 12654 queries.
- arguana: 1406 test queries in beir -- bge-full-data has 3101 queries.
Where do these train queries come from?
Also for nli dataset: what is the source dataset?
Thank you so much for making such a valuable dataset available!