Emanuele Artegiani commited on
Commit
49d7409
·
1 Parent(s): 802185a

Base config

Browse files
Files changed (4) hide show
  1. DESCRIPTION.md +1 -0
  2. README.md +4 -4
  3. app.py +1801 -0
  4. requirements.txt +4 -0
DESCRIPTION.md ADDED
@@ -0,0 +1 @@
 
 
1
+ Low-Resource Summarization (LRS) Leaderboard.
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
  title: LRS Leaderboard
3
- emoji: 🐠
4
  colorFrom: yellow
5
- colorTo: gray
6
- sdk: gradio
7
- sdk_version: 4.1.2
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
 
1
  ---
2
  title: LRS Leaderboard
3
+ emoji: 📊
4
  colorFrom: yellow
5
+ colorTo: pink
6
+ sdk: streamlit
7
+ sdk_version: 1.28.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
app.py ADDED
@@ -0,0 +1,1801 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from functools import partial
2
+ import json
3
+
4
+ # from datasets import load_dataset
5
+ import gradio as gr
6
+ # from huggingface_hub import get_hf_file_metadata, HfApi, hf_hub_download, hf_hub_url
7
+ # from huggingface_hub.repocard import metadata_load
8
+ import pandas as pd
9
+ import random
10
+
11
+ # TASKS = [
12
+ # "BitextMining",
13
+ # "Classification",
14
+ # "Clustering",
15
+ # "PairClassification",
16
+ # "Reranking",
17
+ # "Retrieval",
18
+ # "STS",
19
+ # "Summarization",
20
+ # ]
21
+
22
+ NUM_DATASETS = 0
23
+ NUM_SCORES = 0
24
+ NUM_MODELS = 0
25
+
26
+ block = gr.Blocks()
27
+ with block:
28
+ gr.Markdown(f"""
29
+ Massive Text Embedding Benchmark (MTEB) Leaderboard. To submit, refer to the <a href="https://github.com/embeddings-benchmark/mteb#leaderboard" target="_blank" style="text-decoration: underline">MTEB GitHub repository</a> 🤗 Refer to the [MTEB paper](https://arxiv.org/abs/2210.07316) for details on metrics, tasks and models.
30
+
31
+ - **Total Datasets**: {NUM_DATASETS}
32
+ - **Total Languages**: 113
33
+ - **Total Scores**: {NUM_SCORES}
34
+ - **Total Models**: {NUM_MODELS}
35
+ """)
36
+
37
+ block.queue(max_size=10)
38
+ port = random.randint(15000, 16000)
39
+ block.launch(server_port=7680)
40
+
41
+ # TASK_LIST_BITEXT_MINING = ['BUCC (de-en)', 'BUCC (fr-en)', 'BUCC (ru-en)', 'BUCC (zh-en)', 'Tatoeba (afr-eng)', 'Tatoeba (amh-eng)', 'Tatoeba (ang-eng)', 'Tatoeba (ara-eng)', 'Tatoeba (arq-eng)', 'Tatoeba (arz-eng)', 'Tatoeba (ast-eng)', 'Tatoeba (awa-eng)', 'Tatoeba (aze-eng)', 'Tatoeba (bel-eng)', 'Tatoeba (ben-eng)', 'Tatoeba (ber-eng)', 'Tatoeba (bos-eng)', 'Tatoeba (bre-eng)', 'Tatoeba (bul-eng)', 'Tatoeba (cat-eng)', 'Tatoeba (cbk-eng)', 'Tatoeba (ceb-eng)', 'Tatoeba (ces-eng)', 'Tatoeba (cha-eng)', 'Tatoeba (cmn-eng)', 'Tatoeba (cor-eng)', 'Tatoeba (csb-eng)', 'Tatoeba (cym-eng)', 'Tatoeba (dan-eng)', 'Tatoeba (deu-eng)', 'Tatoeba (dsb-eng)', 'Tatoeba (dtp-eng)', 'Tatoeba (ell-eng)', 'Tatoeba (epo-eng)', 'Tatoeba (est-eng)', 'Tatoeba (eus-eng)', 'Tatoeba (fao-eng)', 'Tatoeba (fin-eng)', 'Tatoeba (fra-eng)', 'Tatoeba (fry-eng)', 'Tatoeba (gla-eng)', 'Tatoeba (gle-eng)', 'Tatoeba (glg-eng)', 'Tatoeba (gsw-eng)', 'Tatoeba (heb-eng)', 'Tatoeba (hin-eng)', 'Tatoeba (hrv-eng)', 'Tatoeba (hsb-eng)', 'Tatoeba (hun-eng)', 'Tatoeba (hye-eng)', 'Tatoeba (ido-eng)', 'Tatoeba (ile-eng)', 'Tatoeba (ina-eng)', 'Tatoeba (ind-eng)', 'Tatoeba (isl-eng)', 'Tatoeba (ita-eng)', 'Tatoeba (jav-eng)', 'Tatoeba (jpn-eng)', 'Tatoeba (kab-eng)', 'Tatoeba (kat-eng)', 'Tatoeba (kaz-eng)', 'Tatoeba (khm-eng)', 'Tatoeba (kor-eng)', 'Tatoeba (kur-eng)', 'Tatoeba (kzj-eng)', 'Tatoeba (lat-eng)', 'Tatoeba (lfn-eng)', 'Tatoeba (lit-eng)', 'Tatoeba (lvs-eng)', 'Tatoeba (mal-eng)', 'Tatoeba (mar-eng)', 'Tatoeba (max-eng)', 'Tatoeba (mhr-eng)', 'Tatoeba (mkd-eng)', 'Tatoeba (mon-eng)', 'Tatoeba (nds-eng)', 'Tatoeba (nld-eng)', 'Tatoeba (nno-eng)', 'Tatoeba (nob-eng)', 'Tatoeba (nov-eng)', 'Tatoeba (oci-eng)', 'Tatoeba (orv-eng)', 'Tatoeba (pam-eng)', 'Tatoeba (pes-eng)', 'Tatoeba (pms-eng)', 'Tatoeba (pol-eng)', 'Tatoeba (por-eng)', 'Tatoeba (ron-eng)', 'Tatoeba (rus-eng)', 'Tatoeba (slk-eng)', 'Tatoeba (slv-eng)', 'Tatoeba (spa-eng)', 'Tatoeba (sqi-eng)', 'Tatoeba (srp-eng)', 'Tatoeba (swe-eng)', 'Tatoeba (swg-eng)', 'Tatoeba (swh-eng)', 'Tatoeba (tam-eng)', 'Tatoeba (tat-eng)', 'Tatoeba (tel-eng)', 'Tatoeba (tgl-eng)', 'Tatoeba (tha-eng)', 'Tatoeba (tuk-eng)', 'Tatoeba (tur-eng)', 'Tatoeba (tzl-eng)', 'Tatoeba (uig-eng)', 'Tatoeba (ukr-eng)', 'Tatoeba (urd-eng)', 'Tatoeba (uzb-eng)', 'Tatoeba (vie-eng)', 'Tatoeba (war-eng)', 'Tatoeba (wuu-eng)', 'Tatoeba (xho-eng)', 'Tatoeba (yid-eng)', 'Tatoeba (yue-eng)', 'Tatoeba (zsm-eng)']
42
+ # TASK_LIST_BITEXT_MINING_OTHER = ["BornholmBitextMining"]
43
+
44
+ # TASK_LIST_CLASSIFICATION = [
45
+ # "AmazonCounterfactualClassification (en)",
46
+ # "AmazonPolarityClassification",
47
+ # "AmazonReviewsClassification (en)",
48
+ # "Banking77Classification",
49
+ # "EmotionClassification",
50
+ # "ImdbClassification",
51
+ # "MassiveIntentClassification (en)",
52
+ # "MassiveScenarioClassification (en)",
53
+ # "MTOPDomainClassification (en)",
54
+ # "MTOPIntentClassification (en)",
55
+ # "ToxicConversationsClassification",
56
+ # "TweetSentimentExtractionClassification",
57
+ # ]
58
+
59
+ # TASK_LIST_CLASSIFICATION_NORM = [x.replace(" (en)", "") for x in TASK_LIST_CLASSIFICATION]
60
+
61
+ # TASK_LIST_CLASSIFICATION_DA = [
62
+ # "AngryTweetsClassification",
63
+ # "DanishPoliticalCommentsClassification",
64
+ # "DKHateClassification",
65
+ # "LccSentimentClassification",
66
+ # "MassiveIntentClassification (da)",
67
+ # "MassiveScenarioClassification (da)",
68
+ # "NordicLangClassification",
69
+ # "ScalaDaClassification",
70
+ # ]
71
+
72
+ # TASK_LIST_CLASSIFICATION_NB = [
73
+ # "NoRecClassification",
74
+ # "NordicLangClassification",
75
+ # "NorwegianParliament",
76
+ # "MassiveIntentClassification (nb)",
77
+ # "MassiveScenarioClassification (nb)",
78
+ # "ScalaNbClassification",
79
+ # ]
80
+
81
+ # TASK_LIST_CLASSIFICATION_PL = [
82
+ # "AllegroReviews",
83
+ # "CBD",
84
+ # "MassiveIntentClassification (pl)",
85
+ # "MassiveScenarioClassification (pl)",
86
+ # "PAC",
87
+ # "PolEmo2.0-IN",
88
+ # "PolEmo2.0-OUT",
89
+ # ]
90
+
91
+ # TASK_LIST_CLASSIFICATION_SV = [
92
+ # "DalajClassification",
93
+ # "MassiveIntentClassification (sv)",
94
+ # "MassiveScenarioClassification (sv)",
95
+ # "NordicLangClassification",
96
+ # "ScalaSvClassification",
97
+ # "SweRecClassification",
98
+ # ]
99
+
100
+ # TASK_LIST_CLASSIFICATION_ZH = [
101
+ # "AmazonReviewsClassification (zh)",
102
+ # "IFlyTek",
103
+ # "JDReview",
104
+ # "MassiveIntentClassification (zh-CN)",
105
+ # "MassiveScenarioClassification (zh-CN)",
106
+ # "MultilingualSentiment",
107
+ # "OnlineShopping",
108
+ # "TNews",
109
+ # "Waimai",
110
+ # ]
111
+
112
+ # TASK_LIST_CLASSIFICATION_OTHER = ['AmazonCounterfactualClassification (de)', 'AmazonCounterfactualClassification (ja)', 'AmazonReviewsClassification (de)', 'AmazonReviewsClassification (es)', 'AmazonReviewsClassification (fr)', 'AmazonReviewsClassification (ja)', 'AmazonReviewsClassification (zh)', 'MTOPDomainClassification (de)', 'MTOPDomainClassification (es)', 'MTOPDomainClassification (fr)', 'MTOPDomainClassification (hi)', 'MTOPDomainClassification (th)', 'MTOPIntentClassification (de)', 'MTOPIntentClassification (es)', 'MTOPIntentClassification (fr)', 'MTOPIntentClassification (hi)', 'MTOPIntentClassification (th)', 'MassiveIntentClassification (af)', 'MassiveIntentClassification (am)', 'MassiveIntentClassification (ar)', 'MassiveIntentClassification (az)', 'MassiveIntentClassification (bn)', 'MassiveIntentClassification (cy)', 'MassiveIntentClassification (de)', 'MassiveIntentClassification (el)', 'MassiveIntentClassification (es)', 'MassiveIntentClassification (fa)', 'MassiveIntentClassification (fi)', 'MassiveIntentClassification (fr)', 'MassiveIntentClassification (he)', 'MassiveIntentClassification (hi)', 'MassiveIntentClassification (hu)', 'MassiveIntentClassification (hy)', 'MassiveIntentClassification (id)', 'MassiveIntentClassification (is)', 'MassiveIntentClassification (it)', 'MassiveIntentClassification (ja)', 'MassiveIntentClassification (jv)', 'MassiveIntentClassification (ka)', 'MassiveIntentClassification (km)', 'MassiveIntentClassification (kn)', 'MassiveIntentClassification (ko)', 'MassiveIntentClassification (lv)', 'MassiveIntentClassification (ml)', 'MassiveIntentClassification (mn)', 'MassiveIntentClassification (ms)', 'MassiveIntentClassification (my)', 'MassiveIntentClassification (nl)', 'MassiveIntentClassification (pt)', 'MassiveIntentClassification (ro)', 'MassiveIntentClassification (ru)', 'MassiveIntentClassification (sl)', 'MassiveIntentClassification (sq)', 'MassiveIntentClassification (sw)', 'MassiveIntentClassification (ta)', 'MassiveIntentClassification (te)', 'MassiveIntentClassification (th)', 'MassiveIntentClassification (tl)', 'MassiveIntentClassification (tr)', 'MassiveIntentClassification (ur)', 'MassiveIntentClassification (vi)', 'MassiveIntentClassification (zh-TW)', 'MassiveScenarioClassification (af)', 'MassiveScenarioClassification (am)', 'MassiveScenarioClassification (ar)', 'MassiveScenarioClassification (az)', 'MassiveScenarioClassification (bn)', 'MassiveScenarioClassification (cy)', 'MassiveScenarioClassification (de)', 'MassiveScenarioClassification (el)', 'MassiveScenarioClassification (es)', 'MassiveScenarioClassification (fa)', 'MassiveScenarioClassification (fi)', 'MassiveScenarioClassification (fr)', 'MassiveScenarioClassification (he)', 'MassiveScenarioClassification (hi)', 'MassiveScenarioClassification (hu)', 'MassiveScenarioClassification (hy)', 'MassiveScenarioClassification (id)', 'MassiveScenarioClassification (is)', 'MassiveScenarioClassification (it)', 'MassiveScenarioClassification (ja)', 'MassiveScenarioClassification (jv)', 'MassiveScenarioClassification (ka)', 'MassiveScenarioClassification (km)', 'MassiveScenarioClassification (kn)', 'MassiveScenarioClassification (ko)', 'MassiveScenarioClassification (lv)', 'MassiveScenarioClassification (ml)', 'MassiveScenarioClassification (mn)', 'MassiveScenarioClassification (ms)', 'MassiveScenarioClassification (my)', 'MassiveScenarioClassification (nl)', 'MassiveScenarioClassification (pt)', 'MassiveScenarioClassification (ro)', 'MassiveScenarioClassification (ru)', 'MassiveScenarioClassification (sl)', 'MassiveScenarioClassification (sq)', 'MassiveScenarioClassification (sw)', 'MassiveScenarioClassification (ta)', 'MassiveScenarioClassification (te)', 'MassiveScenarioClassification (th)', 'MassiveScenarioClassification (tl)', 'MassiveScenarioClassification (tr)', 'MassiveScenarioClassification (ur)', 'MassiveScenarioClassification (vi)', 'MassiveScenarioClassification (zh-TW)']
113
+
114
+ # TASK_LIST_CLUSTERING = [
115
+ # "ArxivClusteringP2P",
116
+ # "ArxivClusteringS2S",
117
+ # "BiorxivClusteringP2P",
118
+ # "BiorxivClusteringS2S",
119
+ # "MedrxivClusteringP2P",
120
+ # "MedrxivClusteringS2S",
121
+ # "RedditClustering",
122
+ # "RedditClusteringP2P",
123
+ # "StackExchangeClustering",
124
+ # "StackExchangeClusteringP2P",
125
+ # "TwentyNewsgroupsClustering",
126
+ # ]
127
+
128
+
129
+ # TASK_LIST_CLUSTERING_DE = [
130
+ # "BlurbsClusteringP2P",
131
+ # "BlurbsClusteringS2S",
132
+ # "TenKGnadClusteringP2P",
133
+ # "TenKGnadClusteringS2S",
134
+ # ]
135
+
136
+ # TASK_LIST_CLUSTERING_PL = [
137
+ # "8TagsClustering",
138
+ # ]
139
+
140
+ # TASK_LIST_CLUSTERING_ZH = [
141
+ # "CLSClusteringP2P",
142
+ # "CLSClusteringS2S",
143
+ # "ThuNewsClusteringP2P",
144
+ # "ThuNewsClusteringS2S",
145
+ # ]
146
+
147
+ # TASK_LIST_PAIR_CLASSIFICATION = [
148
+ # "SprintDuplicateQuestions",
149
+ # "TwitterSemEval2015",
150
+ # "TwitterURLCorpus",
151
+ # ]
152
+
153
+ # TASK_LIST_PAIR_CLASSIFICATION_PL = [
154
+ # "CDSC-E",
155
+ # "PPC",
156
+ # "PSC",
157
+ # "SICK-E-PL",
158
+ # ]
159
+
160
+ # TASK_LIST_PAIR_CLASSIFICATION_ZH = [
161
+ # "Cmnli",
162
+ # "Ocnli",
163
+ # ]
164
+
165
+ # TASK_LIST_RERANKING = [
166
+ # "AskUbuntuDupQuestions",
167
+ # "MindSmallReranking",
168
+ # "SciDocsRR",
169
+ # "StackOverflowDupQuestions",
170
+ # ]
171
+
172
+ # TASK_LIST_RERANKING_ZH = [
173
+ # "CMedQAv1",
174
+ # "CMedQAv2",
175
+ # "MMarcoReranking",
176
+ # "T2Reranking",
177
+ # ]
178
+
179
+ # TASK_LIST_RETRIEVAL = [
180
+ # "ArguAna",
181
+ # "ClimateFEVER",
182
+ # "CQADupstackRetrieval",
183
+ # "DBPedia",
184
+ # "FEVER",
185
+ # "FiQA2018",
186
+ # "HotpotQA",
187
+ # "MSMARCO",
188
+ # "NFCorpus",
189
+ # "NQ",
190
+ # "QuoraRetrieval",
191
+ # "SCIDOCS",
192
+ # "SciFact",
193
+ # "Touche2020",
194
+ # "TRECCOVID",
195
+ # ]
196
+
197
+ # TASK_LIST_RETRIEVAL_PL = [
198
+ # "ArguAna-PL",
199
+ # "DBPedia-PL",
200
+ # "FiQA-PL",
201
+ # "HotpotQA-PL",
202
+ # "MSMARCO-PL",
203
+ # "NFCorpus-PL",
204
+ # "NQ-PL",
205
+ # "Quora-PL",
206
+ # "SCIDOCS-PL",
207
+ # "SciFact-PL",
208
+ # "TRECCOVID-PL",
209
+ # ]
210
+
211
+ # TASK_LIST_RETRIEVAL_ZH = [
212
+ # "CmedqaRetrieval",
213
+ # "CovidRetrieval",
214
+ # "DuRetrieval",
215
+ # "EcomRetrieval",
216
+ # "MedicalRetrieval",
217
+ # "MMarcoRetrieval",
218
+ # "T2Retrieval",
219
+ # "VideoRetrieval",
220
+ # ]
221
+
222
+ # TASK_LIST_RETRIEVAL_NORM = TASK_LIST_RETRIEVAL + [
223
+ # "CQADupstackAndroidRetrieval",
224
+ # "CQADupstackEnglishRetrieval",
225
+ # "CQADupstackGamingRetrieval",
226
+ # "CQADupstackGisRetrieval",
227
+ # "CQADupstackMathematicaRetrieval",
228
+ # "CQADupstackPhysicsRetrieval",
229
+ # "CQADupstackProgrammersRetrieval",
230
+ # "CQADupstackStatsRetrieval",
231
+ # "CQADupstackTexRetrieval",
232
+ # "CQADupstackUnixRetrieval",
233
+ # "CQADupstackWebmastersRetrieval",
234
+ # "CQADupstackWordpressRetrieval"
235
+ # ]
236
+
237
+ # TASK_LIST_STS = [
238
+ # "BIOSSES",
239
+ # "SICK-R",
240
+ # "STS12",
241
+ # "STS13",
242
+ # "STS14",
243
+ # "STS15",
244
+ # "STS16",
245
+ # "STS17 (en-en)",
246
+ # "STS22 (en)",
247
+ # "STSBenchmark",
248
+ # ]
249
+
250
+ # TASK_LIST_STS_PL = [
251
+ # "CDSC-R",
252
+ # "SICK-R-PL",
253
+ # "STS22 (pl)",
254
+ # ]
255
+
256
+ # TASK_LIST_STS_ZH = [
257
+ # "AFQMC",
258
+ # "ATEC",
259
+ # "BQ",
260
+ # "LCQMC",
261
+ # "PAWSX",
262
+ # "QBQTC",
263
+ # "STS22 (zh)",
264
+ # "STSB",
265
+ # ]
266
+
267
+ # TASK_LIST_STS_OTHER = ["STS17 (ar-ar)", "STS17 (en-ar)", "STS17 (en-de)", "STS17 (en-tr)", "STS17 (es-en)", "STS17 (es-es)", "STS17 (fr-en)", "STS17 (it-en)", "STS17 (ko-ko)", "STS17 (nl-en)", "STS22 (ar)", "STS22 (de)", "STS22 (de-en)", "STS22 (de-fr)", "STS22 (de-pl)", "STS22 (es)", "STS22 (es-en)", "STS22 (es-it)", "STS22 (fr)", "STS22 (fr-pl)", "STS22 (it)", "STS22 (pl)", "STS22 (pl-en)", "STS22 (ru)", "STS22 (tr)", "STS22 (zh-en)", "STSBenchmark",]
268
+ # TASK_LIST_STS_NORM = [x.replace(" (en)", "").replace(" (en-en)", "") for x in TASK_LIST_STS]
269
+
270
+ # TASK_LIST_SUMMARIZATION = ["SummEval",]
271
+
272
+ # TASK_LIST_EN = TASK_LIST_CLASSIFICATION + TASK_LIST_CLUSTERING + TASK_LIST_PAIR_CLASSIFICATION + TASK_LIST_RERANKING + TASK_LIST_RETRIEVAL + TASK_LIST_STS + TASK_LIST_SUMMARIZATION
273
+ # TASK_LIST_PL = TASK_LIST_CLASSIFICATION_PL + TASK_LIST_CLUSTERING_PL + TASK_LIST_PAIR_CLASSIFICATION_PL + TASK_LIST_RETRIEVAL_PL + TASK_LIST_STS_PL
274
+ # TASK_LIST_ZH = TASK_LIST_CLASSIFICATION_ZH + TASK_LIST_CLUSTERING_ZH + TASK_LIST_PAIR_CLASSIFICATION_ZH + TASK_LIST_RERANKING_ZH + TASK_LIST_RETRIEVAL_ZH + TASK_LIST_STS_ZH
275
+
276
+ # TASK_TO_METRIC = {
277
+ # "BitextMining": "f1",
278
+ # "Clustering": "v_measure",
279
+ # "Classification": "accuracy",
280
+ # "PairClassification": "cos_sim_ap",
281
+ # "Reranking": "map",
282
+ # "Retrieval": "ndcg_at_10",
283
+ # "STS": "cos_sim_spearman",
284
+ # "Summarization": "cos_sim_spearman",
285
+ # }
286
+
287
+ # def make_clickable_model(model_name, link=None):
288
+ # if link is None:
289
+ # link = "https://huggingface.co/" + model_name
290
+ # # Remove user from model name
291
+ # return (
292
+ # f'<a target="_blank" style="text-decoration: underline" href="{link}">{model_name.split("/")[-1]}</a>'
293
+ # )
294
+
295
+ # # Models without metadata, thus we cannot fetch their results naturally
296
+ # EXTERNAL_MODELS = [
297
+ # "all-MiniLM-L12-v2",
298
+ # "all-MiniLM-L6-v2",
299
+ # "all-mpnet-base-v2",
300
+ # "allenai-specter",
301
+ # "bert-base-swedish-cased",
302
+ # "bert-base-uncased",
303
+ # "bge-base-zh-v1.5",
304
+ # "bge-large-zh-v1.5",
305
+ # "bge-large-zh-noinstruct",
306
+ # "bge-small-zh-v1.5",
307
+ # "contriever-base-msmarco",
308
+ # "cross-en-de-roberta-sentence-transformer",
309
+ # "dfm-encoder-large-v1",
310
+ # "dfm-sentence-encoder-large-1",
311
+ # "distiluse-base-multilingual-cased-v2",
312
+ # "DanskBERT",
313
+ # "e5-base",
314
+ # "e5-large",
315
+ # "e5-small",
316
+ # "electra-small-nordic",
317
+ # "electra-small-swedish-cased-discriminator",
318
+ # "gbert-base",
319
+ # "gbert-large",
320
+ # "gelectra-base",
321
+ # "gelectra-large",
322
+ # "gottbert-base",
323
+ # "glove.6B.300d",
324
+ # "gtr-t5-base",
325
+ # "gtr-t5-large",
326
+ # "gtr-t5-xl",
327
+ # "gtr-t5-xxl",
328
+ # "herbert-base-retrieval-v2",
329
+ # "komninos",
330
+ # "luotuo-bert-medium",
331
+ # "LASER2",
332
+ # "LaBSE",
333
+ # "m3e-base",
334
+ # "m3e-large",
335
+ # "msmarco-bert-co-condensor",
336
+ # "multilingual-e5-base",
337
+ # "multilingual-e5-large",
338
+ # "multilingual-e5-small",
339
+ # "nb-bert-base",
340
+ # "nb-bert-large",
341
+ # "norbert3-base",
342
+ # "norbert3-large",
343
+ # "paraphrase-multilingual-MiniLM-L12-v2",
344
+ # "paraphrase-multilingual-mpnet-base-v2",
345
+ # "sentence-bert-swedish-cased",
346
+ # "sentence-t5-base",
347
+ # "sentence-t5-large",
348
+ # "sentence-t5-xl",
349
+ # "sentence-t5-xxl",
350
+ # "sup-simcse-bert-base-uncased",
351
+ # "st-polish-paraphrase-from-distilroberta",
352
+ # "st-polish-paraphrase-from-mpnet",
353
+ # "text2vec-base-chinese",
354
+ # "text2vec-large-chinese",
355
+ # "text-embedding-ada-002",
356
+ # "text-similarity-ada-001",
357
+ # "text-similarity-babbage-001",
358
+ # "text-similarity-curie-001",
359
+ # "text-similarity-davinci-001",
360
+ # "text-search-ada-doc-001",
361
+ # "text-search-ada-001",
362
+ # "text-search-babbage-001",
363
+ # "text-search-curie-001",
364
+ # "text-search-davinci-001",
365
+ # "titan-embed-text-v1",
366
+ # "unsup-simcse-bert-base-uncased",
367
+ # "use-cmlm-multilingual",
368
+ # "xlm-roberta-base",
369
+ # "xlm-roberta-large",
370
+ # ]
371
+
372
+ # EXTERNAL_MODEL_TO_LINK = {
373
+ # "allenai-specter": "https://huggingface.co/sentence-transformers/allenai-specter",
374
+ # "allenai-specter": "https://huggingface.co/sentence-transformers/allenai-specter",
375
+ # "all-MiniLM-L12-v2": "https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2",
376
+ # "all-MiniLM-L6-v2": "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2",
377
+ # "all-mpnet-base-v2": "https://huggingface.co/sentence-transformers/all-mpnet-base-v2",
378
+ # "bert-base-swedish-cased": "https://huggingface.co/KB/bert-base-swedish-cased",
379
+ # "bert-base-uncased": "https://huggingface.co/bert-base-uncased",
380
+ # "bge-base-zh-v1.5": "https://huggingface.co/BAAI/bge-base-zh-v1.5",
381
+ # "bge-large-zh-v1.5": "https://huggingface.co/BAAI/bge-large-zh-v1.5",
382
+ # "bge-large-zh-noinstruct": "https://huggingface.co/BAAI/bge-large-zh-noinstruct",
383
+ # "bge-small-zh-v1.5": "https://huggingface.co/BAAI/bge-small-zh-v1.5",
384
+ # "contriever-base-msmarco": "https://huggingface.co/nthakur/contriever-base-msmarco",
385
+ # "cross-en-de-roberta-sentence-transformer": "https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer",
386
+ # "DanskBERT": "https://huggingface.co/vesteinn/DanskBERT",
387
+ # "distiluse-base-multilingual-cased-v2": "https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2",
388
+ # "dfm-encoder-large-v1": "https://huggingface.co/chcaa/dfm-encoder-large-v1",
389
+ # "dfm-sentence-encoder-large-1": "https://huggingface.co/chcaa/dfm-encoder-large-v1",
390
+ # "e5-base": "https://huggingface.co/intfloat/e5-base",
391
+ # "e5-large": "https://huggingface.co/intfloat/e5-large",
392
+ # "e5-small": "https://huggingface.co/intfloat/e5-small",
393
+ # "electra-small-nordic": "https://huggingface.co/jonfd/electra-small-nordic",
394
+ # "electra-small-swedish-cased-discriminator": "https://huggingface.co/KBLab/electra-small-swedish-cased-discriminator",
395
+ # "gbert-base": "https://huggingface.co/deepset/gbert-base",
396
+ # "gbert-large": "https://huggingface.co/deepset/gbert-large",
397
+ # "gelectra-base": "https://huggingface.co/deepset/gelectra-base",
398
+ # "gelectra-large": "https://huggingface.co/deepset/gelectra-large",
399
+ # "glove.6B.300d": "https://huggingface.co/sentence-transformers/average_word_embeddings_glove.6B.300d",
400
+ # "gottbert-base": "https://huggingface.co/uklfr/gottbert-base",
401
+ # "gtr-t5-base": "https://huggingface.co/sentence-transformers/gtr-t5-base",
402
+ # "gtr-t5-large": "https://huggingface.co/sentence-transformers/gtr-t5-large",
403
+ # "gtr-t5-xl": "https://huggingface.co/sentence-transformers/gtr-t5-xl",
404
+ # "gtr-t5-xxl": "https://huggingface.co/sentence-transformers/gtr-t5-xxl",
405
+ # "herbert-base-retrieval-v2": "https://huggingface.co/ipipan/herbert-base-retrieval-v2",
406
+ # "komninos": "https://huggingface.co/sentence-transformers/average_word_embeddings_komninos",
407
+ # "luotuo-bert-medium": "https://huggingface.co/silk-road/luotuo-bert-medium",
408
+ # "LASER2": "https://github.com/facebookresearch/LASER",
409
+ # "LaBSE": "https://huggingface.co/sentence-transformers/LaBSE",
410
+ # "m3e-base": "https://huggingface.co/moka-ai/m3e-base",
411
+ # "m3e-large": "https://huggingface.co/moka-ai/m3e-large",
412
+ # "msmarco-bert-co-condensor": "https://huggingface.co/sentence-transformers/msmarco-bert-co-condensor",
413
+ # "multilingual-e5-base": "https://huggingface.co/intfloat/multilingual-e5-base",
414
+ # "multilingual-e5-large": "https://huggingface.co/intfloat/multilingual-e5-large",
415
+ # "multilingual-e5-small": "https://huggingface.co/intfloat/multilingual-e5-small",
416
+ # "nb-bert-base": "https://huggingface.co/NbAiLab/nb-bert-base",
417
+ # "nb-bert-large": "https://huggingface.co/NbAiLab/nb-bert-large",
418
+ # "norbert3-base": "https://huggingface.co/ltg/norbert3-base",
419
+ # "norbert3-large": "https://huggingface.co/ltg/norbert3-large",
420
+ # "paraphrase-multilingual-mpnet-base-v2": "https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
421
+ # "paraphrase-multilingual-MiniLM-L12-v2": "https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
422
+ # "sentence-bert-swedish-cased": "https://huggingface.co/KBLab/sentence-bert-swedish-cased",
423
+ # "sentence-t5-base": "https://huggingface.co/sentence-transformers/sentence-t5-base",
424
+ # "sentence-t5-large": "https://huggingface.co/sentence-transformers/sentence-t5-large",
425
+ # "sentence-t5-xl": "https://huggingface.co/sentence-transformers/sentence-t5-xl",
426
+ # "sentence-t5-xxl": "https://huggingface.co/sentence-transformers/sentence-t5-xxl",
427
+ # "sup-simcse-bert-base-uncased": "https://huggingface.co/princeton-nlp/sup-simcse-bert-base-uncased",
428
+ # "st-polish-paraphrase-from-distilroberta": "https://huggingface.co/sdadas/st-polish-paraphrase-from-distilroberta",
429
+ # "st-polish-paraphrase-from-mpnet": "https://huggingface.co/sdadas/st-polish-paraphrase-from-mpnet",
430
+ # "text2vec-base-chinese": "https://huggingface.co/shibing624/text2vec-base-chinese",
431
+ # "text2vec-large-chinese": "https://huggingface.co/GanymedeNil/text2vec-large-chinese",
432
+ # "text-embedding-ada-002": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
433
+ # "text-similarity-ada-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
434
+ # "text-similarity-babbage-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
435
+ # "text-similarity-curie-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
436
+ # "text-similarity-davinci-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
437
+ # "text-search-ada-doc-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
438
+ # "text-search-ada-query-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
439
+ # "text-search-ada-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
440
+ # "text-search-curie-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
441
+ # "text-search-babbage-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
442
+ # "text-search-davinci-001": "https://beta.openai.com/docs/guides/embeddings/types-of-embedding-models",
443
+ # "titan-embed-text-v1": "https://docs.aws.amazon.com/bedrock/latest/userguide/embeddings.html",
444
+ # "unsup-simcse-bert-base-uncased": "https://huggingface.co/princeton-nlp/unsup-simcse-bert-base-uncased",
445
+ # "use-cmlm-multilingual": "https://huggingface.co/sentence-transformers/use-cmlm-multilingual",
446
+ # "xlm-roberta-base": "https://huggingface.co/xlm-roberta-base",
447
+ # "xlm-roberta-large": "https://huggingface.co/xlm-roberta-large",
448
+ # }
449
+
450
+ # EXTERNAL_MODEL_TO_DIM = {
451
+ # "all-MiniLM-L12-v2": 384,
452
+ # "all-MiniLM-L6-v2": 384,
453
+ # "all-mpnet-base-v2": 768,
454
+ # "allenai-specter": 768,
455
+ # "bert-base-swedish-cased": 768,
456
+ # "bert-base-uncased": 768,
457
+ # "bge-base-zh-v1.5": 768,
458
+ # "bge-large-zh-v1.5": 1024,
459
+ # "bge-large-zh-noinstruct": 1024,
460
+ # "bge-small-zh-v1.5": 512,
461
+ # "contriever-base-msmarco": 768,
462
+ # "cross-en-de-roberta-sentence-transformer": 768,
463
+ # "DanskBERT": 768,
464
+ # "distiluse-base-multilingual-cased-v2": 512,
465
+ # "dfm-encoder-large-v1": 1024,
466
+ # "dfm-sentence-encoder-large-1": 1024,
467
+ # "e5-base": 768,
468
+ # "e5-small": 384,
469
+ # "e5-large": 1024,
470
+ # "electra-small-nordic": 256,
471
+ # "electra-small-swedish-cased-discriminator": 256,
472
+ # "luotuo-bert-medium": 768,
473
+ # "LASER2": 1024,
474
+ # "LaBSE": 768,
475
+ # "gbert-base": 768,
476
+ # "gbert-large": 1024,
477
+ # "gelectra-base": 768,
478
+ # "gelectra-large": 1024,
479
+ # "glove.6B.300d": 300,
480
+ # "gottbert-base": 768,
481
+ # "gtr-t5-base": 768,
482
+ # "gtr-t5-large": 768,
483
+ # "gtr-t5-xl": 768,
484
+ # "gtr-t5-xxl": 768,
485
+ # "herbert-base-retrieval-v2": 768,
486
+ # "komninos": 300,
487
+ # "m3e-base": 768,
488
+ # "m3e-large": 768,
489
+ # "msmarco-bert-co-condensor": 768,
490
+ # "multilingual-e5-base": 768,
491
+ # "multilingual-e5-small": 384,
492
+ # "multilingual-e5-large": 1024,
493
+ # "nb-bert-base": 768,
494
+ # "nb-bert-large": 1024,
495
+ # "norbert3-base": 768,
496
+ # "norbert3-large": 1024,
497
+ # "paraphrase-multilingual-MiniLM-L12-v2": 384,
498
+ # "paraphrase-multilingual-mpnet-base-v2": 768,
499
+ # "sentence-bert-swedish-cased": 768,
500
+ # "sentence-t5-base": 768,
501
+ # "sentence-t5-large": 768,
502
+ # "sentence-t5-xl": 768,
503
+ # "sentence-t5-xxl": 768,
504
+ # "sup-simcse-bert-base-uncased": 768,
505
+ # "st-polish-paraphrase-from-distilroberta": 768,
506
+ # "st-polish-paraphrase-from-mpnet": 768,
507
+ # "text2vec-base-chinese": 768,
508
+ # "text2vec-large-chinese": 1024,
509
+ # "text-embedding-ada-002": 1536,
510
+ # "text-similarity-ada-001": 1024,
511
+ # "text-similarity-babbage-001": 2048,
512
+ # "text-similarity-curie-001": 4096,
513
+ # "text-similarity-davinci-001": 12288,
514
+ # "text-search-ada-doc-001": 1024,
515
+ # "text-search-ada-query-001": 1024,
516
+ # "text-search-ada-001": 1024,
517
+ # "text-search-babbage-001": 2048,
518
+ # "text-search-curie-001": 4096,
519
+ # "text-search-davinci-001": 12288,
520
+ # "titan-embed-text-v1": 1536,
521
+ # "unsup-simcse-bert-base-uncased": 768,
522
+ # "use-cmlm-multilingual": 768,
523
+ # "xlm-roberta-base": 768,
524
+ # "xlm-roberta-large": 1024,
525
+ # }
526
+
527
+ # EXTERNAL_MODEL_TO_SEQLEN = {
528
+ # "all-MiniLM-L12-v2": 512,
529
+ # "all-MiniLM-L6-v2": 512,
530
+ # "all-mpnet-base-v2": 514,
531
+ # "allenai-specter": 512,
532
+ # "bert-base-swedish-cased": 512,
533
+ # "bert-base-uncased": 512,
534
+ # "bge-base-zh-v1.5": 512,
535
+ # "bge-large-zh-v1.5": 512,
536
+ # "bge-large-zh-noinstruct": 512,
537
+ # "bge-small-zh-v1.5": 512,
538
+ # "contriever-base-msmarco": 512,
539
+ # "cross-en-de-roberta-sentence-transformer": 514,
540
+ # "DanskBERT": 514,
541
+ # "dfm-encoder-large-v1": 512,
542
+ # "dfm-sentence-encoder-large-1": 512,
543
+ # "distiluse-base-multilingual-cased-v2": 512,
544
+ # "e5-base": 512,
545
+ # "e5-large": 512,
546
+ # "e5-small": 512,
547
+ # "electra-small-nordic": 512,
548
+ # "electra-small-swedish-cased-discriminator": 512,
549
+ # "gbert-base": 512,
550
+ # "gbert-large": 512,
551
+ # "gelectra-base": 512,
552
+ # "gelectra-large": 512,
553
+ # "gottbert-base": 512,
554
+ # "glove.6B.300d": "N/A",
555
+ # "gtr-t5-base": 512,
556
+ # "gtr-t5-large": 512,
557
+ # "gtr-t5-xl": 512,
558
+ # "gtr-t5-xxl": 512,
559
+ # "herbert-base-retrieval-v2": 514,
560
+ # "komninos": "N/A",
561
+ # "luotuo-bert-medium": 512,
562
+ # "LASER2": "N/A",
563
+ # "LaBSE": 512,
564
+ # "m3e-base": 512,
565
+ # "m3e-large": 512,
566
+ # "msmarco-bert-co-condensor": 512,
567
+ # "multilingual-e5-base": 514,
568
+ # "multilingual-e5-large": 514,
569
+ # "multilingual-e5-small": 512,
570
+ # "nb-bert-base": 512,
571
+ # "nb-bert-large": 512,
572
+ # "norbert3-base": 512,
573
+ # "norbert3-large": 512,
574
+ # "paraphrase-multilingual-MiniLM-L12-v2": 512,
575
+ # "paraphrase-multilingual-mpnet-base-v2": 514,
576
+ # "sentence-bert-swedish-cased": 512,
577
+ # "sentence-t5-base": 512,
578
+ # "sentence-t5-large": 512,
579
+ # "sentence-t5-xl": 512,
580
+ # "sentence-t5-xxl": 512,
581
+ # "sup-simcse-bert-base-uncased": 512,
582
+ # "st-polish-paraphrase-from-distilroberta": 514,
583
+ # "st-polish-paraphrase-from-mpnet": 514,
584
+ # "text2vec-base-chinese": 512,
585
+ # "text2vec-large-chinese": 512,
586
+ # "text-embedding-ada-002": 8191,
587
+ # "text-similarity-ada-001": 2046,
588
+ # "text-similarity-babbage-001": 2046,
589
+ # "text-similarity-curie-001": 2046,
590
+ # "text-similarity-davinci-001": 2046,
591
+ # "text-search-ada-doc-001": 2046,
592
+ # "text-search-ada-query-001": 2046,
593
+ # "text-search-ada-001": 2046,
594
+ # "text-search-babbage-001": 2046,
595
+ # "text-search-curie-001": 2046,
596
+ # "text-search-davinci-001": 2046,
597
+ # "titan-embed-text-v1": 8000,
598
+ # "use-cmlm-multilingual": 512,
599
+ # "unsup-simcse-bert-base-uncased": 512,
600
+ # "xlm-roberta-base": 514,
601
+ # "xlm-roberta-large": 514,
602
+ # }
603
+
604
+ # EXTERNAL_MODEL_TO_SIZE = {
605
+ # "allenai-specter": 0.44,
606
+ # "all-MiniLM-L12-v2": 0.13,
607
+ # "all-MiniLM-L6-v2": 0.09,
608
+ # "all-mpnet-base-v2": 0.44,
609
+ # "bert-base-uncased": 0.44,
610
+ # "bert-base-swedish-cased": 0.50,
611
+ # "bge-base-zh-v1.5": 0.41,
612
+ # "bge-large-zh-v1.5": 1.30,
613
+ # "bge-large-zh-noinstruct": 1.30,
614
+ # "bge-small-zh-v1.5": 0.10,
615
+ # "cross-en-de-roberta-sentence-transformer": 1.11,
616
+ # "contriever-base-msmarco": 0.44,
617
+ # "DanskBERT": 0.50,
618
+ # "distiluse-base-multilingual-cased-v2": 0.54,
619
+ # "dfm-encoder-large-v1": 1.42,
620
+ # "dfm-sentence-encoder-large-1": 1.63,
621
+ # "e5-base": 0.44,
622
+ # "e5-small": 0.13,
623
+ # "e5-large": 1.34,
624
+ # "electra-small-nordic": 0.09,
625
+ # "electra-small-swedish-cased-discriminator": 0.06,
626
+ # "gbert-base": 0.44,
627
+ # "gbert-large": 1.35,
628
+ # "gelectra-base": 0.44,
629
+ # "gelectra-large": 1.34,
630
+ # "glove.6B.300d": 0.48,
631
+ # "gottbert-base": 0.51,
632
+ # "gtr-t5-base": 0.22,
633
+ # "gtr-t5-large": 0.67,
634
+ # "gtr-t5-xl": 2.48,
635
+ # "gtr-t5-xxl": 9.73,
636
+ # "herbert-base-retrieval-v2": 0.50,
637
+ # "komninos": 0.27,
638
+ # "luotuo-bert-medium": 1.31,
639
+ # "LASER2": 0.17,
640
+ # "LaBSE": 1.88,
641
+ # "m3e-base": 0.41,
642
+ # "m3e-large": 0.41,
643
+ # "msmarco-bert-co-condensor": 0.44,
644
+ # "multilingual-e5-base": 1.11,
645
+ # "multilingual-e5-small": 0.47,
646
+ # "multilingual-e5-large": 2.24,
647
+ # "nb-bert-base": 0.71,
648
+ # "nb-bert-large": 1.42,
649
+ # "norbert3-base": 0.52,
650
+ # "norbert3-large": 1.47,
651
+ # "paraphrase-multilingual-mpnet-base-v2": 1.11,
652
+ # "paraphrase-multilingual-MiniLM-L12-v2": 0.47,
653
+ # "sentence-bert-swedish-cased": 0.50,
654
+ # "sentence-t5-base": 0.22,
655
+ # "sentence-t5-large": 0.67,
656
+ # "sentence-t5-xl": 2.48,
657
+ # "sentence-t5-xxl": 9.73,
658
+ # "sup-simcse-bert-base-uncased": 0.44,
659
+ # "st-polish-paraphrase-from-distilroberta": 0.50,
660
+ # "st-polish-paraphrase-from-mpnet": 0.50,
661
+ # "text2vec-base-chinese": 0.41,
662
+ # "text2vec-large-chinese": 1.30,
663
+ # "unsup-simcse-bert-base-uncased": 0.44,
664
+ # "use-cmlm-multilingual": 1.89,
665
+ # "xlm-roberta-base": 1.12,
666
+ # "xlm-roberta-large": 2.24,
667
+ # }
668
+
669
+ # MODELS_TO_SKIP = {
670
+ # "baseplate/instructor-large-1", # Duplicate
671
+ # "radames/e5-large", # Duplicate
672
+ # "gentlebowl/instructor-large-safetensors", # Duplicate
673
+ # "Consensus/instructor-base", # Duplicate
674
+ # "GovCompete/instructor-xl", # Duplicate
675
+ # "GovCompete/e5-large-v2", # Duplicate
676
+ # "t12e/instructor-base", # Duplicate
677
+ # "michaelfeil/ct2fast-e5-large-v2",
678
+ # "michaelfeil/ct2fast-e5-large",
679
+ # "michaelfeil/ct2fast-e5-small-v2",
680
+ # "newsrx/instructor-xl-newsrx",
681
+ # "newsrx/instructor-large-newsrx",
682
+ # "fresha/e5-large-v2-endpoint",
683
+ # "ggrn/e5-small-v2",
684
+ # "michaelfeil/ct2fast-e5-small",
685
+ # "jncraton/e5-small-v2-ct2-int8",
686
+ # "anttip/ct2fast-e5-small-v2-hfie",
687
+ # "newsrx/instructor-large",
688
+ # "newsrx/instructor-xl",
689
+ # "dmlls/all-mpnet-base-v2",
690
+ # "cgldo/semanticClone",
691
+ # "Malmuk1/e5-large-v2_Sharded",
692
+ # "jncraton/gte-small-ct2-int8",
693
+ # "Einas/einas_ashkar",
694
+ # "gruber/e5-small-v2-ggml",
695
+ # "jncraton/bge-small-en-ct2-int8",
696
+ # "vectoriseai/bge-small-en",
697
+ # "recipe/embeddings",
698
+ # "dhairya0907/thenlper-get-large",
699
+ # "Narsil/bge-base-en",
700
+ # "kozistr/fused-large-en",
701
+ # "sionic-ai/sionic-ai-v2", # Wait for https://huggingface.co/sionic-ai/sionic-ai-v2/discussions/1
702
+ # "sionic-ai/sionic-ai-v1", # Wait for https://huggingface.co/sionic-ai/sionic-ai-v2/discussions/1
703
+ # "BAAI/bge-large-en", # Deprecated in favor of v1.5
704
+ # "BAAI/bge-base-en", # Deprecated in favor of v1.5
705
+ # "BAAI/bge-small-en", # Deprecated in favor of v1.5
706
+ # "d0rj/e5-large-en-ru",
707
+ # "d0rj/e5-base-en-ru",
708
+ # "d0rj/e5-small-en-ru",
709
+ # "aident-ai/bge-base-en-onnx",
710
+ # "barisaydin/bge-base-en",
711
+ # "barisaydin/gte-large",
712
+ # "barisaydin/gte-base",
713
+ # "barisaydin/gte-small",
714
+ # "barisaydin/bge-small-en",
715
+ # "odunola/e5-base-v2",
716
+ # "goldenrooster/multilingual-e5-large",
717
+ # "davidpeer/gte-small",
718
+ # "barisaydin/bge-large-en",
719
+ # "jamesgpt1/english-large-v1",
720
+ # "vectoriseai/bge-large-en-v1.5",
721
+ # "vectoriseai/bge-base-en-v1.5",
722
+ # "vectoriseai/instructor-large",
723
+ # "vectoriseai/instructor-base",
724
+ # "vectoriseai/gte-large",
725
+ # "vectoriseai/gte-base",
726
+ # "vectoriseai/e5-large-v2",
727
+ # "vectoriseai/bge-small-en-v1.5",
728
+ # "vectoriseai/e5-base-v2",
729
+ # "vectoriseai/e5-large",
730
+ # "vectoriseai/multilingual-e5-large",
731
+ # "vectoriseai/gte-small",
732
+ # "vectoriseai/ember-v1",
733
+ # "vectoriseai/e5-base",
734
+ # "vectoriseai/e5-small-v2",
735
+ # "michaelfeil/ct2fast-bge-large-en-v1.5",
736
+ # "michaelfeil/ct2fast-bge-large-en-v1.5",
737
+ # "michaelfeil/ct2fast-bge-base-en-v1.5",
738
+ # "michaelfeil/ct2fast-gte-large",
739
+ # "michaelfeil/ct2fast-gte-base",
740
+ # "michaelfeil/ct2fast-bge-small-en-v1.5",
741
+ # "rizki/bgr-tf",
742
+ # "ef-zulla/e5-multi-sml-torch",
743
+ # "cherubhao/yogamodel",
744
+ # "morgendigital/multilingual-e5-large-quantized",
745
+ # "jncraton/gte-tiny-ct2-int8",
746
+ # "Research2NLP/electrical_stella",
747
+ # }
748
+
749
+ # EXTERNAL_MODEL_RESULTS = {model: {k: {v: []} for k, v in TASK_TO_METRIC.items()} for model in EXTERNAL_MODELS}
750
+
751
+ # def add_lang(examples):
752
+ # if not(examples["eval_language"]):
753
+ # examples["mteb_dataset_name_with_lang"] = examples["mteb_dataset_name"]
754
+ # else:
755
+ # examples["mteb_dataset_name_with_lang"] = examples["mteb_dataset_name"] + f' ({examples["eval_language"]})'
756
+ # return examples
757
+
758
+ # def add_task(examples):
759
+ # # Could be added to the dataset loading script instead
760
+ # if examples["mteb_dataset_name"] in TASK_LIST_CLASSIFICATION_NORM + TASK_LIST_CLASSIFICATION_DA + TASK_LIST_CLASSIFICATION_NB + TASK_LIST_CLASSIFICATION_PL + TASK_LIST_CLASSIFICATION_SV + TASK_LIST_CLASSIFICATION_ZH:
761
+ # examples["mteb_task"] = "Classification"
762
+ # elif examples["mteb_dataset_name"] in TASK_LIST_CLUSTERING + TASK_LIST_CLUSTERING_DE + TASK_LIST_CLUSTERING_PL + TASK_LIST_CLUSTERING_ZH:
763
+ # examples["mteb_task"] = "Clustering"
764
+ # elif examples["mteb_dataset_name"] in TASK_LIST_PAIR_CLASSIFICATION + TASK_LIST_PAIR_CLASSIFICATION_PL + TASK_LIST_PAIR_CLASSIFICATION_ZH:
765
+ # examples["mteb_task"] = "PairClassification"
766
+ # elif examples["mteb_dataset_name"] in TASK_LIST_RERANKING + TASK_LIST_RERANKING_ZH:
767
+ # examples["mteb_task"] = "Reranking"
768
+ # elif examples["mteb_dataset_name"] in TASK_LIST_RETRIEVAL_NORM + TASK_LIST_RETRIEVAL_PL + TASK_LIST_RETRIEVAL_ZH:
769
+ # examples["mteb_task"] = "Retrieval"
770
+ # elif examples["mteb_dataset_name"] in TASK_LIST_STS_NORM + TASK_LIST_STS_PL + TASK_LIST_STS_ZH:
771
+ # examples["mteb_task"] = "STS"
772
+ # elif examples["mteb_dataset_name"] in TASK_LIST_SUMMARIZATION:
773
+ # examples["mteb_task"] = "Summarization"
774
+ # elif examples["mteb_dataset_name"] in [x.split(" ")[0] for x in TASK_LIST_BITEXT_MINING + TASK_LIST_BITEXT_MINING_OTHER]:
775
+ # examples["mteb_task"] = "BitextMining"
776
+ # else:
777
+ # print("WARNING: Task not found for dataset", examples["mteb_dataset_name"])
778
+ # examples["mteb_task"] = "Unknown"
779
+ # return examples
780
+
781
+ # for model in EXTERNAL_MODELS:
782
+ # ds = load_dataset("mteb/results", model)
783
+ # # For local debugging:
784
+ # #, download_mode='force_redownload', verification_mode="no_checks")
785
+ # ds = ds.map(add_lang)
786
+ # ds = ds.map(add_task)
787
+ # base_dict = {"Model": make_clickable_model(model, link=EXTERNAL_MODEL_TO_LINK.get(model, "https://huggingface.co/spaces/mteb/leaderboard"))}
788
+ # # For now only one metric per task - Could add more metrics lateron
789
+ # for task, metric in TASK_TO_METRIC.items():
790
+ # ds_dict = ds.filter(lambda x: (x["mteb_task"] == task) and (x["metric"] == metric))["test"].to_dict()
791
+ # ds_dict = {k: round(v, 2) for k, v in zip(ds_dict["mteb_dataset_name_with_lang"], ds_dict["score"])}
792
+ # EXTERNAL_MODEL_RESULTS[model][task][metric].append({**base_dict, **ds_dict})
793
+
794
+ # def get_dim_seq_size(model):
795
+ # filenames = [sib.rfilename for sib in model.siblings]
796
+ # dim, seq, size = "", "", ""
797
+ # if "1_Pooling/config.json" in filenames:
798
+ # st_config_path = hf_hub_download(model.modelId, filename="1_Pooling/config.json")
799
+ # dim = json.load(open(st_config_path)).get("word_embedding_dimension", "")
800
+ # elif "2_Pooling/config.json" in filenames:
801
+ # st_config_path = hf_hub_download(model.modelId, filename="2_Pooling/config.json")
802
+ # dim = json.load(open(st_config_path)).get("word_embedding_dimension", "")
803
+ # if "config.json" in filenames:
804
+ # config_path = hf_hub_download(model.modelId, filename="config.json")
805
+ # config = json.load(open(config_path))
806
+ # if not dim:
807
+ # dim = config.get("hidden_dim", config.get("hidden_size", config.get("d_model", "")))
808
+ # seq = config.get("n_positions", config.get("max_position_embeddings", config.get("n_ctx", config.get("seq_length", ""))))
809
+ # # Get model file size without downloading
810
+ # if "pytorch_model.bin" in filenames:
811
+ # url = hf_hub_url(model.modelId, filename="pytorch_model.bin")
812
+ # meta = get_hf_file_metadata(url)
813
+ # size = round(meta.size / 1e9, 2)
814
+ # elif "pytorch_model.bin.index.json" in filenames:
815
+ # index_path = hf_hub_download(model.modelId, filename="pytorch_model.bin.index.json")
816
+ # """
817
+ # {
818
+ # "metadata": {
819
+ # "total_size": 28272820224
820
+ # },....
821
+ # """
822
+ # size = json.load(open(index_path))
823
+ # if ("metadata" in size) and ("total_size" in size["metadata"]):
824
+ # size = round(size["metadata"]["total_size"] / 1e9, 2)
825
+ # return dim, seq, size
826
+
827
+ # def make_datasets_clickable(df):
828
+ # """Does not work"""
829
+ # if "BornholmBitextMining" in df.columns:
830
+ # link = "https://huggingface.co/datasets/strombergnlp/bornholmsk_parallel"
831
+ # df = df.rename(
832
+ # columns={f'BornholmBitextMining': '<a target="_blank" style="text-decoration: underline" href="{link}">BornholmBitextMining</a>',})
833
+ # return df
834
+
835
+ # def add_rank(df):
836
+ # cols_to_rank = [col for col in df.columns if col not in ["Model", "Model Size (GB)", "Embedding Dimensions", "Sequence Length"]]
837
+ # if len(cols_to_rank) == 1:
838
+ # df.sort_values(cols_to_rank[0], ascending=False, inplace=True)
839
+ # else:
840
+ # df.insert(1, "Average", df[cols_to_rank].mean(axis=1, skipna=False))
841
+ # df.sort_values("Average", ascending=False, inplace=True)
842
+ # df.insert(0, "Rank", list(range(1, len(df) + 1)))
843
+ # df = df.round(2)
844
+ # # Fill NaN after averaging
845
+ # df.fillna("", inplace=True)
846
+ # return df
847
+
848
+ # def get_mteb_data(tasks=["Clustering"], langs=[], datasets=[], fillna=True, add_emb_dim=False, task_to_metric=TASK_TO_METRIC, rank=True):
849
+ # api = HfApi()
850
+ # models = api.list_models(filter="mteb")
851
+ # # Initialize list to models that we cannot fetch metadata from
852
+ # df_list = []
853
+ # for model in EXTERNAL_MODEL_RESULTS:
854
+ # results_list = [res for task in tasks for res in EXTERNAL_MODEL_RESULTS[model][task][task_to_metric[task]]]
855
+ # if len(datasets) > 0:
856
+ # res = {k: v for d in results_list for k, v in d.items() if (k == "Model") or any([x in k for x in datasets])}
857
+ # elif langs:
858
+ # # Would be cleaner to rely on an extra language column instead
859
+ # langs_format = [f"({lang})" for lang in langs]
860
+ # res = {k: v for d in results_list for k, v in d.items() if any([k.split(" ")[-1] in (k, x) for x in langs_format])}
861
+ # else:
862
+ # res = {k: v for d in results_list for k, v in d.items()}
863
+ # # Model & at least one result
864
+ # if len(res) > 1:
865
+ # if add_emb_dim:
866
+ # res["Model Size (GB)"] = EXTERNAL_MODEL_TO_SIZE.get(model, "")
867
+ # res["Embedding Dimensions"] = EXTERNAL_MODEL_TO_DIM.get(model, "")
868
+ # res["Sequence Length"] = EXTERNAL_MODEL_TO_SEQLEN.get(model, "")
869
+ # df_list.append(res)
870
+
871
+ # for model in models:
872
+ # if model.modelId in MODELS_TO_SKIP: continue
873
+ # print("MODEL", model)
874
+ # readme_path = hf_hub_download(model.modelId, filename="README.md")
875
+ # meta = metadata_load(readme_path)
876
+ # # meta['model-index'][0]["results"] is list of elements like:
877
+ # # {
878
+ # # "task": {"type": "Classification"},
879
+ # # "dataset": {
880
+ # # "type": "mteb/amazon_massive_intent",
881
+ # # "name": "MTEB MassiveIntentClassification (nb)",
882
+ # # "config": "nb",
883
+ # # "split": "test",
884
+ # # },
885
+ # # "metrics": [
886
+ # # {"type": "accuracy", "value": 39.81506388702084},
887
+ # # {"type": "f1", "value": 38.809586587791664},
888
+ # # ],
889
+ # # },
890
+ # # Use "get" instead of dict indexing to skip incompat metadata instead of erroring out
891
+ # if len(datasets) > 0:
892
+ # task_results = [sub_res for sub_res in meta["model-index"][0]["results"] if (sub_res.get("task", {}).get("type", "") in tasks) and any([x in sub_res.get("dataset", {}).get("name", "") for x in datasets])]
893
+ # elif langs:
894
+ # task_results = [sub_res for sub_res in meta["model-index"][0]["results"] if (sub_res.get("task", {}).get("type", "") in tasks) and (sub_res.get("dataset", {}).get("config", "default") in ("default", *langs))]
895
+ # else:
896
+ # task_results = [sub_res for sub_res in meta["model-index"][0]["results"] if (sub_res.get("task", {}).get("type", "") in tasks)]
897
+ # out = [{res["dataset"]["name"].replace("MTEB ", ""): [round(score["value"], 2) for score in res["metrics"] if score["type"] == task_to_metric.get(res["task"]["type"])][0]} for res in task_results]
898
+ # out = {k: v for d in out for k, v in d.items()}
899
+ # out["Model"] = make_clickable_model(model.modelId)
900
+ # # Model & at least one result
901
+ # if len(out) > 1:
902
+ # if add_emb_dim:
903
+ # out["Embedding Dimensions"], out["Sequence Length"], out["Model Size (GB)"] = get_dim_seq_size(model)
904
+ # df_list.append(out)
905
+ # df = pd.DataFrame(df_list)
906
+ # # If there are any models that are the same, merge them
907
+ # # E.g. if out["Model"] has the same value in two places, merge & take whichever one is not NaN else just take the first one
908
+ # df = df.groupby("Model", as_index=False).first()
909
+ # # Put 'Model' column first
910
+ # cols = sorted(list(df.columns))
911
+ # cols.insert(0, cols.pop(cols.index("Model")))
912
+ # df = df[cols]
913
+ # if rank:
914
+ # df = add_rank(df)
915
+ # if fillna:
916
+ # df.fillna("", inplace=True)
917
+ # return df
918
+
919
+ # def get_mteb_average():
920
+ # global DATA_OVERALL, DATA_CLASSIFICATION_EN, DATA_CLUSTERING, DATA_PAIR_CLASSIFICATION, DATA_RERANKING, DATA_RETRIEVAL, DATA_STS_EN, DATA_SUMMARIZATION
921
+ # DATA_OVERALL = get_mteb_data(
922
+ # tasks=[
923
+ # "Classification",
924
+ # "Clustering",
925
+ # "PairClassification",
926
+ # "Reranking",
927
+ # "Retrieval",
928
+ # "STS",
929
+ # "Summarization",
930
+ # ],
931
+ # datasets=TASK_LIST_CLASSIFICATION + TASK_LIST_CLUSTERING + TASK_LIST_PAIR_CLASSIFICATION + TASK_LIST_RERANKING + TASK_LIST_RETRIEVAL + TASK_LIST_STS + TASK_LIST_SUMMARIZATION,
932
+ # fillna=False,
933
+ # add_emb_dim=True,
934
+ # rank=False,
935
+ # )
936
+ # # Debugging:
937
+ # # DATA_OVERALL.to_csv("overall.csv")
938
+
939
+ # DATA_OVERALL.insert(1, f"Average ({len(TASK_LIST_EN)} datasets)", DATA_OVERALL[TASK_LIST_EN].mean(axis=1, skipna=False))
940
+ # DATA_OVERALL.insert(2, f"Classification Average ({len(TASK_LIST_CLASSIFICATION)} datasets)", DATA_OVERALL[TASK_LIST_CLASSIFICATION].mean(axis=1, skipna=False))
941
+ # DATA_OVERALL.insert(3, f"Clustering Average ({len(TASK_LIST_CLUSTERING)} datasets)", DATA_OVERALL[TASK_LIST_CLUSTERING].mean(axis=1, skipna=False))
942
+ # DATA_OVERALL.insert(4, f"Pair Classification Average ({len(TASK_LIST_PAIR_CLASSIFICATION)} datasets)", DATA_OVERALL[TASK_LIST_PAIR_CLASSIFICATION].mean(axis=1, skipna=False))
943
+ # DATA_OVERALL.insert(5, f"Reranking Average ({len(TASK_LIST_RERANKING)} datasets)", DATA_OVERALL[TASK_LIST_RERANKING].mean(axis=1, skipna=False))
944
+ # DATA_OVERALL.insert(6, f"Retrieval Average ({len(TASK_LIST_RETRIEVAL)} datasets)", DATA_OVERALL[TASK_LIST_RETRIEVAL].mean(axis=1, skipna=False))
945
+ # DATA_OVERALL.insert(7, f"STS Average ({len(TASK_LIST_STS)} datasets)", DATA_OVERALL[TASK_LIST_STS].mean(axis=1, skipna=False))
946
+ # DATA_OVERALL.insert(8, f"Summarization Average ({len(TASK_LIST_SUMMARIZATION)} dataset)", DATA_OVERALL[TASK_LIST_SUMMARIZATION].mean(axis=1, skipna=False))
947
+ # DATA_OVERALL.sort_values(f"Average ({len(TASK_LIST_EN)} datasets)", ascending=False, inplace=True)
948
+ # # Start ranking from 1
949
+ # DATA_OVERALL.insert(0, "Rank", list(range(1, len(DATA_OVERALL) + 1)))
950
+
951
+ # DATA_OVERALL = DATA_OVERALL.round(2)
952
+
953
+ # DATA_CLASSIFICATION_EN = add_rank(DATA_OVERALL[["Model"] + TASK_LIST_CLASSIFICATION])
954
+ # # Only keep rows with at least one score in addition to the "Model" & rank column
955
+ # DATA_CLASSIFICATION_EN = DATA_CLASSIFICATION_EN[DATA_CLASSIFICATION_EN.iloc[:, 2:].ne("").any(axis=1)]
956
+
957
+ # DATA_CLUSTERING = add_rank(DATA_OVERALL[["Model"] + TASK_LIST_CLUSTERING])
958
+ # DATA_CLUSTERING = DATA_CLUSTERING[DATA_CLUSTERING.iloc[:, 2:].ne("").any(axis=1)]
959
+
960
+ # DATA_PAIR_CLASSIFICATION = add_rank(DATA_OVERALL[["Model"] + TASK_LIST_PAIR_CLASSIFICATION])
961
+ # DATA_PAIR_CLASSIFICATION = DATA_PAIR_CLASSIFICATION[DATA_PAIR_CLASSIFICATION.iloc[:, 2:].ne("").any(axis=1)]
962
+
963
+ # DATA_RERANKING = add_rank(DATA_OVERALL[["Model"] + TASK_LIST_RERANKING])
964
+ # DATA_RERANKING = DATA_RERANKING[DATA_RERANKING.iloc[:, 2:].ne("").any(axis=1)]
965
+
966
+ # DATA_RETRIEVAL = add_rank(DATA_OVERALL[["Model"] + TASK_LIST_RETRIEVAL])
967
+ # DATA_RETRIEVAL = DATA_RETRIEVAL[DATA_RETRIEVAL.iloc[:, 2:].ne("").any(axis=1)]
968
+
969
+ # DATA_STS_EN = add_rank(DATA_OVERALL[["Model"] + TASK_LIST_STS])
970
+ # DATA_STS_EN = DATA_STS_EN[DATA_STS_EN.iloc[:, 2:].ne("").any(axis=1)]
971
+
972
+ # DATA_SUMMARIZATION = add_rank(DATA_OVERALL[["Model"] + TASK_LIST_SUMMARIZATION])
973
+ # DATA_SUMMARIZATION = DATA_SUMMARIZATION[DATA_SUMMARIZATION.iloc[:, 1:].ne("").any(axis=1)]
974
+
975
+ # # Fill NaN after averaging
976
+ # DATA_OVERALL.fillna("", inplace=True)
977
+
978
+ # DATA_OVERALL = DATA_OVERALL[["Rank", "Model", "Model Size (GB)", "Embedding Dimensions", "Sequence Length", f"Average ({len(TASK_LIST_EN)} datasets)", f"Classification Average ({len(TASK_LIST_CLASSIFICATION)} datasets)", f"Clustering Average ({len(TASK_LIST_CLUSTERING)} datasets)", f"Pair Classification Average ({len(TASK_LIST_PAIR_CLASSIFICATION)} datasets)", f"Reranking Average ({len(TASK_LIST_RERANKING)} datasets)", f"Retrieval Average ({len(TASK_LIST_RETRIEVAL)} datasets)", f"STS Average ({len(TASK_LIST_STS)} datasets)", f"Summarization Average ({len(TASK_LIST_SUMMARIZATION)} dataset)"]]
979
+ # DATA_OVERALL = DATA_OVERALL[DATA_OVERALL.iloc[:, 5:].ne("").any(axis=1)]
980
+
981
+ # return DATA_OVERALL
982
+
983
+ # def get_mteb_average_zh():
984
+ # global DATA_OVERALL_ZH, DATA_CLASSIFICATION_ZH, DATA_CLUSTERING_ZH, DATA_PAIR_CLASSIFICATION_ZH, DATA_RERANKING_ZH, DATA_RETRIEVAL_ZH, DATA_STS_ZH
985
+ # DATA_OVERALL_ZH = get_mteb_data(
986
+ # tasks=[
987
+ # "Classification",
988
+ # "Clustering",
989
+ # "PairClassification",
990
+ # "Reranking",
991
+ # "Retrieval",
992
+ # "STS",
993
+ # ],
994
+ # datasets=TASK_LIST_CLASSIFICATION_ZH + TASK_LIST_CLUSTERING_ZH + TASK_LIST_PAIR_CLASSIFICATION_ZH + TASK_LIST_RERANKING_ZH + TASK_LIST_RETRIEVAL_ZH + TASK_LIST_STS_ZH,
995
+ # fillna=False,
996
+ # add_emb_dim=True,
997
+ # rank=False,
998
+ # )
999
+ # # Debugging:
1000
+ # # DATA_OVERALL_ZH.to_csv("overall.csv")
1001
+
1002
+ # DATA_OVERALL_ZH.insert(1, f"Average ({len(TASK_LIST_ZH)} datasets)", DATA_OVERALL_ZH[TASK_LIST_ZH].mean(axis=1, skipna=False))
1003
+ # DATA_OVERALL_ZH.insert(2, f"Classification Average ({len(TASK_LIST_CLASSIFICATION_ZH)} datasets)", DATA_OVERALL_ZH[TASK_LIST_CLASSIFICATION_ZH].mean(axis=1, skipna=False))
1004
+ # DATA_OVERALL_ZH.insert(3, f"Clustering Average ({len(TASK_LIST_CLUSTERING_ZH)} datasets)", DATA_OVERALL_ZH[TASK_LIST_CLUSTERING_ZH].mean(axis=1, skipna=False))
1005
+ # DATA_OVERALL_ZH.insert(4, f"Pair Classification Average ({len(TASK_LIST_PAIR_CLASSIFICATION_ZH)} datasets)", DATA_OVERALL_ZH[TASK_LIST_PAIR_CLASSIFICATION_ZH].mean(axis=1, skipna=False))
1006
+ # DATA_OVERALL_ZH.insert(5, f"Reranking Average ({len(TASK_LIST_RERANKING_ZH)} datasets)", DATA_OVERALL_ZH[TASK_LIST_RERANKING_ZH].mean(axis=1, skipna=False))
1007
+ # DATA_OVERALL_ZH.insert(6, f"Retrieval Average ({len(TASK_LIST_RETRIEVAL_ZH)} datasets)", DATA_OVERALL_ZH[TASK_LIST_RETRIEVAL_ZH].mean(axis=1, skipna=False))
1008
+ # DATA_OVERALL_ZH.insert(7, f"STS Average ({len(TASK_LIST_STS_ZH)} datasets)", DATA_OVERALL_ZH[TASK_LIST_STS_ZH].mean(axis=1, skipna=False))
1009
+ # DATA_OVERALL_ZH.sort_values(f"Average ({len(TASK_LIST_ZH)} datasets)", ascending=False, inplace=True)
1010
+ # # Start ranking from 1
1011
+ # DATA_OVERALL_ZH.insert(0, "Rank", list(range(1, len(DATA_OVERALL_ZH) + 1)))
1012
+
1013
+ # DATA_OVERALL_ZH = DATA_OVERALL_ZH.round(2)
1014
+
1015
+ # DATA_CLASSIFICATION_ZH = add_rank(DATA_OVERALL_ZH[["Model"] + TASK_LIST_CLASSIFICATION_ZH])
1016
+ # # Only keep rows with at least one score in addition to the "Model" & rank column
1017
+ # DATA_CLASSIFICATION_ZH = DATA_CLASSIFICATION_ZH[DATA_CLASSIFICATION_ZH.iloc[:, 2:].ne("").any(axis=1)]
1018
+
1019
+ # DATA_CLUSTERING_ZH = add_rank(DATA_OVERALL_ZH[["Model"] + TASK_LIST_CLUSTERING_ZH])
1020
+ # DATA_CLUSTERING_ZH = DATA_CLUSTERING_ZH[DATA_CLUSTERING_ZH.iloc[:, 2:].ne("").any(axis=1)]
1021
+
1022
+ # DATA_PAIR_CLASSIFICATION_ZH = add_rank(DATA_OVERALL_ZH[["Model"] + TASK_LIST_PAIR_CLASSIFICATION_ZH])
1023
+ # DATA_PAIR_CLASSIFICATION_ZH = DATA_PAIR_CLASSIFICATION_ZH[DATA_PAIR_CLASSIFICATION_ZH.iloc[:, 2:].ne("").any(axis=1)]
1024
+
1025
+ # DATA_RERANKING_ZH = add_rank(DATA_OVERALL_ZH[["Model"] + TASK_LIST_RERANKING_ZH])
1026
+ # DATA_RERANKING_ZH = DATA_RERANKING_ZH[DATA_RERANKING_ZH.iloc[:, 2:].ne("").any(axis=1)]
1027
+
1028
+ # DATA_RETRIEVAL_ZH = add_rank(DATA_OVERALL_ZH[["Model"] + TASK_LIST_RETRIEVAL_ZH])
1029
+ # DATA_RETRIEVAL_ZH = DATA_RETRIEVAL_ZH[DATA_RETRIEVAL_ZH.iloc[:, 2:].ne("").any(axis=1)]
1030
+
1031
+ # DATA_STS_ZH = add_rank(DATA_OVERALL_ZH[["Model"] + TASK_LIST_STS_ZH])
1032
+ # DATA_STS_ZH = DATA_STS_ZH[DATA_STS_ZH.iloc[:, 2:].ne("").any(axis=1)]
1033
+
1034
+ # # Fill NaN after averaging
1035
+ # DATA_OVERALL_ZH.fillna("", inplace=True)
1036
+
1037
+ # DATA_OVERALL_ZH = DATA_OVERALL_ZH[["Rank", "Model", "Model Size (GB)", "Embedding Dimensions", "Sequence Length", f"Average ({len(TASK_LIST_ZH)} datasets)", f"Classification Average ({len(TASK_LIST_CLASSIFICATION_ZH)} datasets)", f"Clustering Average ({len(TASK_LIST_CLUSTERING_ZH)} datasets)", f"Pair Classification Average ({len(TASK_LIST_PAIR_CLASSIFICATION_ZH)} datasets)", f"Reranking Average ({len(TASK_LIST_RERANKING_ZH)} datasets)", f"Retrieval Average ({len(TASK_LIST_RETRIEVAL_ZH)} datasets)", f"STS Average ({len(TASK_LIST_STS_ZH)} datasets)"]]
1038
+ # DATA_OVERALL_ZH = DATA_OVERALL_ZH[DATA_OVERALL_ZH.iloc[:, 5:].ne("").any(axis=1)]
1039
+
1040
+ # return DATA_OVERALL_ZH
1041
+
1042
+ # def get_mteb_average_pl():
1043
+ # global DATA_OVERALL_PL, DATA_CLASSIFICATION_PL, DATA_CLUSTERING_PL, DATA_PAIR_CLASSIFICATION_PL, DATA_RETRIEVAL_PL, DATA_STS_PL
1044
+ # DATA_OVERALL_PL = get_mteb_data(
1045
+ # tasks=[
1046
+ # "Classification",
1047
+ # "Clustering",
1048
+ # "PairClassification",
1049
+ # "Retrieval",
1050
+ # "STS",
1051
+ # ],
1052
+ # datasets=TASK_LIST_CLASSIFICATION_PL + TASK_LIST_CLUSTERING_PL + TASK_LIST_PAIR_CLASSIFICATION_PL + TASK_LIST_RETRIEVAL_PL + TASK_LIST_STS_PL,
1053
+ # fillna=False,
1054
+ # add_emb_dim=True,
1055
+ # rank=False,
1056
+ # )
1057
+ # # Debugging:
1058
+ # # DATA_OVERALL_PL.to_csv("overall.csv")
1059
+
1060
+ # DATA_OVERALL_PL.insert(1, f"Average ({len(TASK_LIST_PL)} datasets)", DATA_OVERALL_PL[TASK_LIST_PL].mean(axis=1, skipna=False))
1061
+ # DATA_OVERALL_PL.insert(2, f"Classification Average ({len(TASK_LIST_CLASSIFICATION_PL)} datasets)", DATA_OVERALL_PL[TASK_LIST_CLASSIFICATION_PL].mean(axis=1, skipna=False))
1062
+ # DATA_OVERALL_PL.insert(3, f"Clustering Average ({len(TASK_LIST_CLUSTERING_PL)} datasets)", DATA_OVERALL_PL[TASK_LIST_CLUSTERING_PL].mean(axis=1, skipna=False))
1063
+ # DATA_OVERALL_PL.insert(4, f"Pair Classification Average ({len(TASK_LIST_PAIR_CLASSIFICATION_PL)} datasets)", DATA_OVERALL_PL[TASK_LIST_PAIR_CLASSIFICATION_PL].mean(axis=1, skipna=False))
1064
+ # DATA_OVERALL_PL.insert(5, f"Retrieval Average ({len(TASK_LIST_RETRIEVAL_PL)} datasets)", DATA_OVERALL_PL[TASK_LIST_RETRIEVAL_PL].mean(axis=1, skipna=False))
1065
+ # DATA_OVERALL_PL.insert(6, f"STS Average ({len(TASK_LIST_STS_PL)} datasets)", DATA_OVERALL_PL[TASK_LIST_STS_PL].mean(axis=1, skipna=False))
1066
+ # DATA_OVERALL_PL.sort_values(f"Average ({len(TASK_LIST_PL)} datasets)", ascending=False, inplace=True)
1067
+ # # Start ranking from 1
1068
+ # DATA_OVERALL_PL.insert(0, "Rank", list(range(1, len(DATA_OVERALL_PL) + 1)))
1069
+
1070
+ # DATA_OVERALL_PL = DATA_OVERALL_PL.round(2)
1071
+
1072
+ # DATA_CLASSIFICATION_PL = add_rank(DATA_OVERALL_PL[["Model"] + TASK_LIST_CLASSIFICATION_PL])
1073
+ # # Only keep rows with at least one score in addition to the "Model" & rank column
1074
+ # DATA_CLASSIFICATION_PL = DATA_CLASSIFICATION_PL[DATA_CLASSIFICATION_PL.iloc[:, 2:].ne("").any(axis=1)]
1075
+
1076
+ # DATA_CLUSTERING_PL = add_rank(DATA_OVERALL_PL[["Model"] + TASK_LIST_CLUSTERING_PL])
1077
+ # DATA_CLUSTERING_PL = DATA_CLUSTERING_PL[DATA_CLUSTERING_PL.iloc[:, 2:].ne("").any(axis=1)]
1078
+
1079
+ # DATA_PAIR_CLASSIFICATION_PL = add_rank(DATA_OVERALL_PL[["Model"] + TASK_LIST_PAIR_CLASSIFICATION_PL])
1080
+ # DATA_PAIR_CLASSIFICATION_PL = DATA_PAIR_CLASSIFICATION_PL[DATA_PAIR_CLASSIFICATION_PL.iloc[:, 2:].ne("").any(axis=1)]
1081
+
1082
+ # DATA_RETRIEVAL_PL = add_rank(DATA_OVERALL_PL[["Model"] + TASK_LIST_RETRIEVAL_PL])
1083
+ # DATA_RETRIEVAL_PL = DATA_RETRIEVAL_PL[DATA_RETRIEVAL_PL.iloc[:, 2:].ne("").any(axis=1)]
1084
+
1085
+ # DATA_STS_PL = add_rank(DATA_OVERALL_PL[["Model"] + TASK_LIST_STS_PL])
1086
+ # DATA_STS_PL = DATA_STS_PL[DATA_STS_PL.iloc[:, 2:].ne("").any(axis=1)]
1087
+
1088
+ # # Fill NaN after averaging
1089
+ # DATA_OVERALL_PL.fillna("", inplace=True)
1090
+
1091
+ # DATA_OVERALL_PL = DATA_OVERALL_PL[["Rank", "Model", "Model Size (GB)", "Embedding Dimensions", "Sequence Length", f"Average ({len(TASK_LIST_PL)} datasets)", f"Classification Average ({len(TASK_LIST_CLASSIFICATION_PL)} datasets)", f"Clustering Average ({len(TASK_LIST_CLUSTERING_PL)} datasets)", f"Pair Classification Average ({len(TASK_LIST_PAIR_CLASSIFICATION_PL)} datasets)", f"Retrieval Average ({len(TASK_LIST_RETRIEVAL_PL)} datasets)", f"STS Average ({len(TASK_LIST_STS_PL)} datasets)"]]
1092
+ # DATA_OVERALL_PL = DATA_OVERALL_PL[DATA_OVERALL_PL.iloc[:, 5:].ne("").any(axis=1)]
1093
+
1094
+ # return DATA_OVERALL_PL
1095
+
1096
+ # get_mteb_average()
1097
+ # get_mteb_average_pl()
1098
+ # get_mteb_average_zh()
1099
+ # DATA_BITEXT_MINING = get_mteb_data(["BitextMining"], [], TASK_LIST_BITEXT_MINING)
1100
+ # DATA_BITEXT_MINING_OTHER = get_mteb_data(["BitextMining"], [], TASK_LIST_BITEXT_MINING_OTHER)
1101
+ # DATA_CLASSIFICATION_DA = get_mteb_data(["Classification"], [], TASK_LIST_CLASSIFICATION_DA)
1102
+ # DATA_CLASSIFICATION_NB = get_mteb_data(["Classification"], [], TASK_LIST_CLASSIFICATION_NB)
1103
+ # DATA_CLASSIFICATION_SV = get_mteb_data(["Classification"], [], TASK_LIST_CLASSIFICATION_SV)
1104
+ # DATA_CLASSIFICATION_OTHER = get_mteb_data(["Classification"], [], TASK_LIST_CLASSIFICATION_OTHER)
1105
+ # DATA_CLUSTERING_DE = get_mteb_data(["Clustering"], [], TASK_LIST_CLUSTERING_DE)
1106
+ # DATA_STS_OTHER = get_mteb_data(["STS"], [], TASK_LIST_STS_OTHER)
1107
+
1108
+ # # Exact, add all non-nan integer values for every dataset
1109
+ # NUM_SCORES = 0
1110
+ # DATASETS = []
1111
+ # MODELS = []
1112
+ # # LANGUAGES = []
1113
+ # for d in [
1114
+ # DATA_BITEXT_MINING,
1115
+ # DATA_BITEXT_MINING_OTHER,
1116
+ # DATA_CLASSIFICATION_EN,
1117
+ # DATA_CLASSIFICATION_DA,
1118
+ # DATA_CLASSIFICATION_NB,
1119
+ # DATA_CLASSIFICATION_PL,
1120
+ # DATA_CLASSIFICATION_SV,
1121
+ # DATA_CLASSIFICATION_ZH,
1122
+ # DATA_CLASSIFICATION_OTHER,
1123
+ # DATA_CLUSTERING,
1124
+ # DATA_CLUSTERING_DE,
1125
+ # DATA_CLUSTERING_PL,
1126
+ # DATA_CLUSTERING_ZH,
1127
+ # DATA_PAIR_CLASSIFICATION,
1128
+ # DATA_PAIR_CLASSIFICATION_PL,
1129
+ # DATA_PAIR_CLASSIFICATION_ZH,
1130
+ # DATA_RERANKING,
1131
+ # DATA_RERANKING_ZH,
1132
+ # DATA_RETRIEVAL,
1133
+ # DATA_RETRIEVAL_PL,
1134
+ # DATA_RETRIEVAL_ZH,
1135
+ # DATA_STS_EN,
1136
+ # DATA_STS_PL,
1137
+ # DATA_STS_ZH,
1138
+ # DATA_STS_OTHER,
1139
+ # DATA_SUMMARIZATION,
1140
+ # ]:
1141
+ # # NUM_SCORES += d.iloc[:, 1:].apply(lambda x: sum([1 for y in x if isinstance(y, float) and not np.isnan(y)]), axis=1).sum()
1142
+ # cols_to_ignore = 3 if "Average" in d.columns else 2
1143
+ # # Count number of scores including only non-nan floats & excluding the rank column
1144
+ # NUM_SCORES += d.iloc[:, cols_to_ignore:].notna().sum().sum()
1145
+ # # Exclude rank & model name column (first two); Do not count different language versions as different datasets
1146
+ # DATASETS += [i.split(" ")[0] for i in d.columns[cols_to_ignore:]]
1147
+ # # LANGUAGES += [i.split(" ")[-1] for i in d.columns[cols_to_ignore:]]
1148
+ # MODELS += d["Model"].tolist()
1149
+
1150
+ # NUM_DATASETS = len(set(DATASETS))
1151
+ # # NUM_LANGUAGES = len(set(LANGUAGES))
1152
+ # NUM_MODELS = len(set(MODELS))
1153
+
1154
+ # block = gr.Blocks()
1155
+ # with block:
1156
+ # gr.Markdown(f"""
1157
+ # Massive Text Embedding Benchmark (MTEB) Leaderboard. To submit, refer to the <a href="https://github.com/embeddings-benchmark/mteb#leaderboard" target="_blank" style="text-decoration: underline">MTEB GitHub repository</a> 🤗 Refer to the [MTEB paper](https://arxiv.org/abs/2210.07316) for details on metrics, tasks and models.
1158
+
1159
+ # - **Total Datasets**: {NUM_DATASETS}
1160
+ # - **Total Languages**: 113
1161
+ # - **Total Scores**: {NUM_SCORES}
1162
+ # - **Total Models**: {NUM_MODELS}
1163
+ # """)
1164
+ # with gr.Tabs():
1165
+ # with gr.TabItem("Overall"):
1166
+ # with gr.TabItem("English"):
1167
+ # with gr.Row():
1168
+ # gr.Markdown("""
1169
+ # **Overall MTEB English leaderboard** 🔮
1170
+
1171
+ # - **Metric:** Various, refer to task tabs
1172
+ # - **Languages:** English
1173
+ # """)
1174
+ # with gr.Row():
1175
+ # data_overall = gr.components.Dataframe(
1176
+ # DATA_OVERALL,
1177
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_OVERALL.columns),
1178
+ # type="pandas",
1179
+ # wrap=True,
1180
+ # )
1181
+ # with gr.Row():
1182
+ # data_run_overall = gr.Button("Refresh")
1183
+ # data_run_overall.click(get_mteb_average, inputs=None, outputs=data_overall)
1184
+ # with gr.TabItem("Chinese"):
1185
+ # with gr.Row():
1186
+ # gr.Markdown("""
1187
+ # **Overall MTEB Chinese leaderboard (C-MTEB)** 🔮🇨🇳
1188
+
1189
+ # - **Metric:** Various, refer to task tabs
1190
+ # - **Languages:** Chinese
1191
+ # - **Credits:** [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)
1192
+ # """)
1193
+ # with gr.Row():
1194
+ # data_overall_zh = gr.components.Dataframe(
1195
+ # DATA_OVERALL_ZH,
1196
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_OVERALL_ZH.columns),
1197
+ # type="pandas",
1198
+ # wrap=True,
1199
+ # )
1200
+ # with gr.Row():
1201
+ # data_run_overall_zh = gr.Button("Refresh")
1202
+ # data_run_overall_zh.click(get_mteb_average_zh, inputs=None, outputs=data_overall_zh)
1203
+ # with gr.TabItem("Polish"):
1204
+ # with gr.Row():
1205
+ # gr.Markdown("""
1206
+ # **Overall MTEB Polish leaderboard (PL-MTEB)** 🔮🇵🇱
1207
+
1208
+ # - **Metric:** Various, refer to task tabs
1209
+ # - **Languages:** Polish
1210
+ # - **Credits:** [Rafał Poświata](https://github.com/rafalposwiata), [Konrad Wojtasik](https://github.com/kwojtasi) & [BEIR-PL](https://arxiv.org/abs/2305.19840)
1211
+ # """)
1212
+ # with gr.Row():
1213
+ # data_overall_pl = gr.components.Dataframe(
1214
+ # DATA_OVERALL_PL,
1215
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_OVERALL_PL.columns),
1216
+ # type="pandas",
1217
+ # wrap=True,
1218
+ # )
1219
+ # with gr.Row():
1220
+ # data_run_overall_pl = gr.Button("Refresh")
1221
+ # data_run_overall_pl.click(get_mteb_average_pl, inputs=None, outputs=data_overall_pl)
1222
+ # with gr.TabItem("Bitext Mining"):
1223
+ # with gr.TabItem("English-X"):
1224
+ # with gr.Row():
1225
+ # gr.Markdown("""
1226
+ # **Bitext Mining English-X Leaderboard** 🎌
1227
+
1228
+ # - **Metric:** [F1](https://huggingface.co/spaces/evaluate-metric/f1)
1229
+ # - **Languages:** 117 (Pairs of: English & other language)
1230
+ # """)
1231
+ # with gr.Row():
1232
+ # data_bitext_mining = gr.components.Dataframe(
1233
+ # DATA_BITEXT_MINING,
1234
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_BITEXT_MINING.columns),
1235
+ # type="pandas",
1236
+ # )
1237
+ # with gr.Row():
1238
+ # data_run_bitext_mining = gr.Button("Refresh")
1239
+ # data_run_bitext_mining.click(
1240
+ # partial(get_mteb_data, tasks=["BitextMining"], datasets=TASK_LIST_BITEXT_MINING),
1241
+ # outputs=data_bitext_mining,
1242
+ # )
1243
+ # with gr.TabItem("Danish"):
1244
+ # with gr.Row():
1245
+ # gr.Markdown("""
1246
+ # **Bitext Mining Danish Leaderboard** 🎌🇩🇰
1247
+
1248
+ # - **Metric:** [F1](https://huggingface.co/spaces/evaluate-metric/f1)
1249
+ # - **Languages:** Danish & Bornholmsk (Danish Dialect)
1250
+ # - **Credits:** [Kenneth Enevoldsen](https://github.com/KennethEnevoldsen), [scandinavian-embedding-benchmark](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/)
1251
+ # """)
1252
+ # with gr.Row():
1253
+ # data_bitext_mining_da = gr.components.Dataframe(
1254
+ # DATA_BITEXT_MINING_OTHER,
1255
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_BITEXT_MINING_OTHER.columns),
1256
+ # type="pandas",
1257
+ # )
1258
+ # with gr.Row():
1259
+ # data_run_bitext_mining_da = gr.Button("Refresh")
1260
+ # data_run_bitext_mining_da.click(
1261
+ # partial(get_mteb_data, tasks=["BitextMining"], datasets=TASK_LIST_BITEXT_MINING_OTHER),
1262
+ # outputs=data_bitext_mining_da,
1263
+ # )
1264
+ # with gr.TabItem("Classification"):
1265
+ # with gr.TabItem("English"):
1266
+ # with gr.Row():
1267
+ # gr.Markdown("""
1268
+ # **Classification English Leaderboard** ❤️
1269
+
1270
+ # - **Metric:** [Accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy)
1271
+ # - **Languages:** English
1272
+ # """)
1273
+ # with gr.Row():
1274
+ # data_classification_en = gr.components.Dataframe(
1275
+ # DATA_CLASSIFICATION_EN,
1276
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLASSIFICATION_EN.columns),
1277
+ # type="pandas",
1278
+ # )
1279
+ # with gr.Row():
1280
+ # data_run_classification_en = gr.Button("Refresh")
1281
+ # data_run_classification_en.click(
1282
+ # partial(get_mteb_data, tasks=["Classification"], langs=["en"]),
1283
+ # outputs=data_classification_en,
1284
+ # )
1285
+ # with gr.TabItem("Chinese"):
1286
+ # with gr.Row():
1287
+ # gr.Markdown("""
1288
+ # **Classification Chinese Leaderboard** 🧡🇨🇳
1289
+
1290
+ # - **Metric:** [Accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy)
1291
+ # - **Languages:** Chinese
1292
+ # - **Credits:** [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)
1293
+ # """)
1294
+ # with gr.Row():
1295
+ # data_classification_zh = gr.components.Dataframe(
1296
+ # DATA_CLASSIFICATION_ZH,
1297
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLASSIFICATION_ZH.columns),
1298
+ # type="pandas",
1299
+ # )
1300
+ # with gr.Row():
1301
+ # data_run_classification_zh = gr.Button("Refresh")
1302
+ # data_run_classification_zh.click(
1303
+ # partial(get_mteb_data, tasks=["Classification"], datasets=TASK_LIST_CLASSIFICATION_ZH),
1304
+ # outputs=data_classification_zh,
1305
+ # )
1306
+ # with gr.TabItem("Danish"):
1307
+ # with gr.Row():
1308
+ # gr.Markdown("""
1309
+ # **Classification Danish Leaderboard** 🤍🇩🇰
1310
+
1311
+ # - **Metric:** [Accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy)
1312
+ # - **Languages:** Danish
1313
+ # - **Credits:** [Kenneth Enevoldsen](https://github.com/KennethEnevoldsen), [scandinavian-embedding-benchmark](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/)
1314
+ # """)
1315
+ # with gr.Row():
1316
+ # data_classification_da = gr.components.Dataframe(
1317
+ # DATA_CLASSIFICATION_DA,
1318
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLASSIFICATION_DA.columns),
1319
+ # type="pandas",
1320
+ # )
1321
+ # with gr.Row():
1322
+ # data_run_classification_da = gr.Button("Refresh")
1323
+ # data_run_classification_da.click(
1324
+ # partial(get_mteb_data, tasks=["Classification"], datasets=TASK_LIST_CLASSIFICATION_DA),
1325
+ # outputs=data_run_classification_da,
1326
+ # )
1327
+ # with gr.TabItem("Norwegian"):
1328
+ # with gr.Row():
1329
+ # gr.Markdown("""
1330
+ # **Classification Norwegian Leaderboard** 💙🇳🇴
1331
+
1332
+ # - **Metric:** [Accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy)
1333
+ # - **Languages:** Norwegian Bokmål
1334
+ # - **Credits:** [Kenneth Enevoldsen](https://github.com/KennethEnevoldsen), [scandinavian-embedding-benchmark](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/)
1335
+ # """)
1336
+ # with gr.Row():
1337
+ # data_classification_nb = gr.components.Dataframe(
1338
+ # DATA_CLASSIFICATION_NB,
1339
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLASSIFICATION_NB.columns),
1340
+ # type="pandas",
1341
+ # )
1342
+ # with gr.Row():
1343
+ # data_run_classification_nb = gr.Button("Refresh")
1344
+ # data_run_classification_nb.click(
1345
+ # partial(get_mteb_data, tasks=["Classification"], datasets=TASK_LIST_CLASSIFICATION_NB),
1346
+ # outputs=data_classification_nb,
1347
+ # )
1348
+ # with gr.TabItem("Polish"):
1349
+ # with gr.Row():
1350
+ # gr.Markdown("""
1351
+ # **Classification Polish Leaderboard** 🤍🇵🇱
1352
+
1353
+ # - **Metric:** [Accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy)
1354
+ # - **Languages:** Polish
1355
+ # - **Credits:** [Rafał Poświata](https://github.com/rafalposwiata)
1356
+ # """)
1357
+ # with gr.Row():
1358
+ # data_classification_pl = gr.components.Dataframe(
1359
+ # DATA_CLASSIFICATION_PL,
1360
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLASSIFICATION_PL.columns),
1361
+ # type="pandas",
1362
+ # )
1363
+ # with gr.Row():
1364
+ # data_run_classification_pl = gr.Button("Refresh")
1365
+ # data_run_classification_pl.click(
1366
+ # partial(get_mteb_data, tasks=["Classification"], datasets=TASK_LIST_CLASSIFICATION_PL),
1367
+ # outputs=data_classification_pl,
1368
+ # )
1369
+ # with gr.TabItem("Swedish"):
1370
+ # with gr.Row():
1371
+ # gr.Markdown("""
1372
+ # **Classification Swedish Leaderboard** 💛🇸🇪
1373
+
1374
+ # - **Metric:** [Accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy)
1375
+ # - **Languages:** Swedish
1376
+ # - **Credits:** [Kenneth Enevoldsen](https://github.com/KennethEnevoldsen), [scandinavian-embedding-benchmark](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/)
1377
+ # """)
1378
+ # with gr.Row():
1379
+ # data_classification_sv = gr.components.Dataframe(
1380
+ # DATA_CLASSIFICATION_SV,
1381
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLASSIFICATION_SV.columns),
1382
+ # type="pandas",
1383
+ # )
1384
+ # with gr.Row():
1385
+ # data_run_classification_sv = gr.Button("Refresh")
1386
+ # data_run_classification_sv.click(
1387
+ # partial(get_mteb_data, tasks=["Classification"], datasets=TASK_LIST_CLASSIFICATION_SV),
1388
+ # outputs=data_classification_sv,
1389
+ # )
1390
+ # with gr.TabItem("Other"):
1391
+ # with gr.Row():
1392
+ # gr.Markdown("""
1393
+ # **Classification Other Languages Leaderboard** 💜💚💙
1394
+
1395
+ # - **Metric:** [Accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy)
1396
+ # - **Languages:** 47 (Only languages not included in the other tabs)
1397
+ # """)
1398
+ # with gr.Row():
1399
+ # data_classification = gr.components.Dataframe(
1400
+ # DATA_CLASSIFICATION_OTHER,
1401
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLASSIFICATION_OTHER) * 10,
1402
+ # type="pandas",
1403
+ # )
1404
+ # with gr.Row():
1405
+ # data_run_classification = gr.Button("Refresh")
1406
+ # data_run_classification.click(
1407
+ # partial(get_mteb_data, tasks=["Classification"], datasets=TASK_LIST_CLASSIFICATION_OTHER),
1408
+ # outputs=data_classification,
1409
+ # )
1410
+ # with gr.TabItem("Clustering"):
1411
+ # with gr.TabItem("English"):
1412
+ # with gr.Row():
1413
+ # gr.Markdown("""
1414
+ # **Clustering Leaderboard** ✨
1415
+
1416
+ # - **Metric:** Validity Measure (v_measure)
1417
+ # - **Languages:** English
1418
+ # """)
1419
+ # with gr.Row():
1420
+ # data_clustering = gr.components.Dataframe(
1421
+ # DATA_CLUSTERING,
1422
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLUSTERING.columns),
1423
+ # type="pandas",
1424
+ # )
1425
+ # with gr.Row():
1426
+ # data_run_clustering_en = gr.Button("Refresh")
1427
+ # data_run_clustering_en.click(
1428
+ # partial(get_mteb_data, tasks=["Clustering"], datasets=TASK_LIST_CLUSTERING),
1429
+ # outputs=data_clustering,
1430
+ # )
1431
+ # with gr.TabItem("Chinese"):
1432
+ # with gr.Row():
1433
+ # gr.Markdown("""
1434
+ # **Clustering Chinese Leaderboard** ✨🇨🇳
1435
+
1436
+ # - **Metric:** Validity Measure (v_measure)
1437
+ # - **Languages:** Chinese
1438
+ # - **Credits:** [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)
1439
+ # """)
1440
+ # with gr.Row():
1441
+ # data_clustering_zh = gr.components.Dataframe(
1442
+ # DATA_CLUSTERING_ZH,
1443
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLUSTERING_ZH.columns),
1444
+ # type="pandas",
1445
+ # )
1446
+ # with gr.Row():
1447
+ # data_run_clustering_zh = gr.Button("Refresh")
1448
+ # data_run_clustering_zh.click(
1449
+ # partial(get_mteb_data, tasks=["Clustering"], datasets=TASK_LIST_CLUSTERING_ZH),
1450
+ # outputs=data_clustering_zh,
1451
+ # )
1452
+ # with gr.TabItem("German"):
1453
+ # with gr.Row():
1454
+ # gr.Markdown("""
1455
+ # **Clustering German Leaderboard** ✨🇩🇪
1456
+
1457
+ # - **Metric:** Validity Measure (v_measure)
1458
+ # - **Languages:** German
1459
+ # - **Credits:** [Silvan](https://github.com/slvnwhrl)
1460
+ # """)
1461
+ # with gr.Row():
1462
+ # data_clustering_de = gr.components.Dataframe(
1463
+ # DATA_CLUSTERING_DE,
1464
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLUSTERING_DE.columns) * 2,
1465
+ # type="pandas",
1466
+ # )
1467
+ # with gr.Row():
1468
+ # data_run_clustering_de = gr.Button("Refresh")
1469
+ # data_run_clustering_de.click(
1470
+ # partial(get_mteb_data, tasks=["Clustering"], datasets=TASK_LIST_CLUSTERING_DE),
1471
+ # outputs=data_clustering_de,
1472
+ # )
1473
+ # with gr.TabItem("Polish"):
1474
+ # with gr.Row():
1475
+ # gr.Markdown("""
1476
+ # **Clustering Polish Leaderboard** ✨🇵🇱
1477
+
1478
+ # - **Metric:** Validity Measure (v_measure)
1479
+ # - **Languages:** Polish
1480
+ # - **Credits:** [Rafał Poświata](https://github.com/rafalposwiata)
1481
+ # """)
1482
+ # with gr.Row():
1483
+ # data_clustering_pl = gr.components.Dataframe(
1484
+ # DATA_CLUSTERING_PL,
1485
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_CLUSTERING_PL.columns) * 2,
1486
+ # type="pandas",
1487
+ # )
1488
+ # with gr.Row():
1489
+ # data_run_clustering_pl = gr.Button("Refresh")
1490
+ # data_run_clustering_pl.click(
1491
+ # partial(get_mteb_data, tasks=["Clustering"], datasets=TASK_LIST_CLUSTERING_PL),
1492
+ # outputs=data_clustering_pl,
1493
+ # )
1494
+ # with gr.TabItem("Pair Classification"):
1495
+ # with gr.TabItem("English"):
1496
+ # with gr.Row():
1497
+ # gr.Markdown("""
1498
+ # **Pair Classification English Leaderboard** 🎭
1499
+
1500
+ # - **Metric:** Average Precision based on Cosine Similarities (cos_sim_ap)
1501
+ # - **Languages:** English
1502
+ # """)
1503
+ # with gr.Row():
1504
+ # data_pair_classification = gr.components.Dataframe(
1505
+ # DATA_PAIR_CLASSIFICATION,
1506
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_PAIR_CLASSIFICATION.columns),
1507
+ # type="pandas",
1508
+ # )
1509
+ # with gr.Row():
1510
+ # data_run_pair_classification = gr.Button("Refresh")
1511
+ # data_run_pair_classification.click(
1512
+ # partial(get_mteb_data, tasks=["PairClassification"], datasets=TASK_LIST_PAIR_CLASSIFICATION),
1513
+ # outputs=data_pair_classification,
1514
+ # )
1515
+ # with gr.TabItem("Chinese"):
1516
+ # with gr.Row():
1517
+ # gr.Markdown("""
1518
+ # **Pair Classification Chinese Leaderboard** 🎭🇨🇳
1519
+
1520
+ # - **Metric:** Average Precision based on Cosine Similarities (cos_sim_ap)
1521
+ # - **Languages:** Chinese
1522
+ # - **Credits:** [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)
1523
+ # """)
1524
+ # with gr.Row():
1525
+ # data_pair_classification_zh = gr.components.Dataframe(
1526
+ # DATA_PAIR_CLASSIFICATION_ZH,
1527
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_PAIR_CLASSIFICATION_ZH.columns),
1528
+ # type="pandas",
1529
+ # )
1530
+ # with gr.Row():
1531
+ # data_run_pair_classification_zh = gr.Button("Refresh")
1532
+ # data_run_pair_classification_zh.click(
1533
+ # partial(get_mteb_data, tasks=["PairClassification"], datasets=TASK_LIST_PAIR_CLASSIFICATION_ZH),
1534
+ # outputs=data_pair_classification_zh,
1535
+ # )
1536
+ # with gr.TabItem("Polish"):
1537
+ # with gr.Row():
1538
+ # gr.Markdown("""
1539
+ # **Pair Classification Polish Leaderboard** 🎭🇵🇱
1540
+
1541
+ # - **Metric:** Average Precision based on Cosine Similarities (cos_sim_ap)
1542
+ # - **Languages:** Polish
1543
+ # - **Credits:** [Rafał Poświata](https://github.com/rafalposwiata)
1544
+ # """)
1545
+ # with gr.Row():
1546
+ # data_pair_classification_pl = gr.components.Dataframe(
1547
+ # DATA_PAIR_CLASSIFICATION_PL,
1548
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_PAIR_CLASSIFICATION_PL.columns),
1549
+ # type="pandas",
1550
+ # )
1551
+ # with gr.Row():
1552
+ # data_run_pair_classification_pl = gr.Button("Refresh")
1553
+ # data_run_pair_classification_pl.click(
1554
+ # partial(get_mteb_data, tasks=["PairClassification"], datasets=TASK_LIST_PAIR_CLASSIFICATION_PL),
1555
+ # outputs=data_pair_classification_pl,
1556
+ # )
1557
+ # with gr.TabItem("Reranking"):
1558
+ # with gr.TabItem("English"):
1559
+ # with gr.Row():
1560
+ # gr.Markdown("""
1561
+ # **Reranking English Leaderboard** 🥈
1562
+
1563
+ # - **Metric:** Mean Average Precision (MAP)
1564
+ # - **Languages:** English
1565
+ # """)
1566
+ # with gr.Row():
1567
+ # data_reranking = gr.components.Dataframe(
1568
+ # DATA_RERANKING,
1569
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_RERANKING.columns),
1570
+ # type="pandas",
1571
+ # )
1572
+ # with gr.Row():
1573
+ # data_run_reranking = gr.Button("Refresh")
1574
+ # data_run_reranking.click(
1575
+ # partial(get_mteb_data, tasks=["Reranking"], datasets=TASK_LIST_RERANKING),
1576
+ # outputs=data_reranking,
1577
+ # )
1578
+ # with gr.TabItem("Chinese"):
1579
+ # with gr.Row():
1580
+ # gr.Markdown("""
1581
+ # **Reranking Chinese Leaderboard** 🥈🇨🇳
1582
+
1583
+ # - **Metric:** Mean Average Precision (MAP)
1584
+ # - **Languages:** Chinese
1585
+ # - **Credits:** [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)
1586
+ # """)
1587
+ # with gr.Row():
1588
+ # data_reranking_zh = gr.components.Dataframe(
1589
+ # DATA_RERANKING_ZH,
1590
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_RERANKING_ZH.columns),
1591
+ # type="pandas",
1592
+ # )
1593
+ # with gr.Row():
1594
+ # data_run_reranking_zh = gr.Button("Refresh")
1595
+ # data_run_reranking_zh.click(
1596
+ # partial(get_mteb_data, tasks=["Reranking"], datasets=TASK_LIST_RERANKING_ZH),
1597
+ # outputs=data_reranking_zh,
1598
+ # )
1599
+ # with gr.TabItem("Retrieval"):
1600
+ # with gr.TabItem("English"):
1601
+ # with gr.Row():
1602
+ # gr.Markdown("""
1603
+ # **Retrieval English Leaderboard** 🔎
1604
+
1605
+ # - **Metric:** Normalized Discounted Cumulative Gain @ k (ndcg_at_10)
1606
+ # - **Languages:** English
1607
+ # """)
1608
+ # with gr.Row():
1609
+ # data_retrieval = gr.components.Dataframe(
1610
+ # DATA_RETRIEVAL,
1611
+ # # Add support for more columns than existing as a buffer for CQADupstack & other Retrieval tasks (e.g. MSMARCOv2)
1612
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_RETRIEVAL.columns) * 2,
1613
+ # type="pandas",
1614
+ # )
1615
+ # with gr.Row():
1616
+ # data_run_retrieval = gr.Button("Refresh")
1617
+ # data_run_retrieval.click(
1618
+ # partial(get_mteb_data, tasks=["Retrieval"], datasets=TASK_LIST_RETRIEVAL),
1619
+ # outputs=data_retrieval,
1620
+ # )
1621
+ # with gr.TabItem("Chinese"):
1622
+ # with gr.Row():
1623
+ # gr.Markdown("""
1624
+ # **Retrieval Chinese Leaderboard** 🔎🇨🇳
1625
+
1626
+ # - **Metric:** Normalized Discounted Cumulative Gain @ k (ndcg_at_10)
1627
+ # - **Languages:** Chinese
1628
+ # - **Credits:** [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)
1629
+ # """)
1630
+ # with gr.Row():
1631
+ # data_retrieval_zh = gr.components.Dataframe(
1632
+ # DATA_RETRIEVAL_ZH,
1633
+ # # Add support for more columns than existing as a buffer for CQADupstack & other Retrieval tasks (e.g. MSMARCOv2)
1634
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_RETRIEVAL_ZH.columns) * 2,
1635
+ # type="pandas",
1636
+ # )
1637
+ # with gr.Row():
1638
+ # data_run_retrieval_zh = gr.Button("Refresh")
1639
+ # data_run_retrieval_zh.click(
1640
+ # partial(get_mteb_data, tasks=["Retrieval"], datasets=TASK_LIST_RETRIEVAL_ZH),
1641
+ # outputs=data_retrieval_zh,
1642
+ # )
1643
+ # with gr.TabItem("Polish"):
1644
+ # with gr.Row():
1645
+ # gr.Markdown("""
1646
+ # **Retrieval Polish Leaderboard** 🔎🇵🇱
1647
+
1648
+ # - **Metric:** Normalized Discounted Cumulative Gain @ k (ndcg_at_10)
1649
+ # - **Languages:** Polish
1650
+ # - **Credits:** [Konrad Wojtasik](https://github.com/kwojtasi) & [BEIR-PL](https://arxiv.org/abs/2305.19840)
1651
+ # """)
1652
+ # with gr.Row():
1653
+ # data_retrieval_pl = gr.components.Dataframe(
1654
+ # DATA_RETRIEVAL_PL,
1655
+ # # Add support for more columns than existing as a buffer for CQADupstack & other Retrieval tasks (e.g. MSMARCOv2)
1656
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_RETRIEVAL_PL.columns) * 2,
1657
+ # type="pandas",
1658
+ # )
1659
+ # with gr.Row():
1660
+ # data_run_retrieval_pl = gr.Button("Refresh")
1661
+ # data_run_retrieval_pl.click(
1662
+ # partial(get_mteb_data, tasks=["Retrieval"], datasets=TASK_LIST_RETRIEVAL_PL),
1663
+ # outputs=data_retrieval_pl,
1664
+ # )
1665
+ # with gr.TabItem("STS"):
1666
+ # with gr.TabItem("English"):
1667
+ # with gr.Row():
1668
+ # gr.Markdown("""
1669
+ # **STS English Leaderboard** 🤖
1670
+
1671
+ # - **Metric:** Spearman correlation based on cosine similarity
1672
+ # - **Languages:** English
1673
+ # """)
1674
+ # with gr.Row():
1675
+ # data_sts_en = gr.components.Dataframe(
1676
+ # DATA_STS_EN,
1677
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_STS_EN.columns),
1678
+ # type="pandas",
1679
+ # )
1680
+ # with gr.Row():
1681
+ # data_run_sts_en = gr.Button("Refresh")
1682
+ # data_run_sts_en.click(
1683
+ # partial(get_mteb_data, tasks=["STS"], datasets=TASK_LIST_STS),
1684
+ # outputs=data_sts_en,
1685
+ # )
1686
+ # with gr.TabItem("Chinese"):
1687
+ # with gr.Row():
1688
+ # gr.Markdown("""
1689
+ # **STS Chinese Leaderboard** 🤖🇨🇳
1690
+
1691
+ # - **Metric:** Spearman correlation based on cosine similarity
1692
+ # - **Languages:** Chinese
1693
+ # - **Credits:** [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding)
1694
+ # """)
1695
+ # with gr.Row():
1696
+ # data_sts_zh = gr.components.Dataframe(
1697
+ # DATA_STS_ZH,
1698
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_STS_ZH.columns),
1699
+ # type="pandas",
1700
+ # )
1701
+ # with gr.Row():
1702
+ # data_run_sts_zh = gr.Button("Refresh")
1703
+ # data_run_sts_zh.click(
1704
+ # partial(get_mteb_data, tasks=["STS"], datasets=TASK_LIST_STS_ZH),
1705
+ # outputs=data_sts_zh,
1706
+ # )
1707
+ # with gr.TabItem("Polish"):
1708
+ # with gr.Row():
1709
+ # gr.Markdown("""
1710
+ # **STS Polish Leaderboard** 🤖🇵🇱
1711
+
1712
+ # - **Metric:** Spearman correlation based on cosine similarity
1713
+ # - **Languages:** Polish
1714
+ # - **Credits:** [Rafał Poświata](https://github.com/rafalposwiata)
1715
+ # """)
1716
+ # with gr.Row():
1717
+ # data_sts_pl = gr.components.Dataframe(
1718
+ # DATA_STS_PL,
1719
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_STS_PL.columns),
1720
+ # type="pandas",
1721
+ # )
1722
+ # with gr.Row():
1723
+ # data_run_sts_pl = gr.Button("Refresh")
1724
+ # data_run_sts_pl.click(
1725
+ # partial(get_mteb_data, tasks=["STS"], datasets=TASK_LIST_STS_PL),
1726
+ # outputs=data_sts_pl,
1727
+ # )
1728
+ # with gr.TabItem("Other"):
1729
+ # with gr.Row():
1730
+ # gr.Markdown("""
1731
+ # **STS Other Leaderboard** 👽
1732
+
1733
+ # - **Metric:** Spearman correlation based on cosine similarity
1734
+ # - **Languages:** Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Russian, Spanish (Only language combos not included in the other tabs)
1735
+ # """)
1736
+ # with gr.Row():
1737
+ # data_sts_other = gr.components.Dataframe(
1738
+ # DATA_STS_OTHER,
1739
+ # datatype=["number", "markdown"] + ["number"] * len(DATA_STS_OTHER.columns) * 2,
1740
+ # type="pandas",
1741
+ # )
1742
+ # with gr.Row():
1743
+ # data_run_sts_other = gr.Button("Refresh")
1744
+ # data_run_sts_other.click(
1745
+ # partial(get_mteb_data, tasks=["STS"], datasets=TASK_LIST_STS_OTHER),
1746
+ # outputs=data_sts_other,
1747
+ # )
1748
+ # with gr.TabItem("Summarization"):
1749
+ # with gr.Row():
1750
+ # gr.Markdown("""
1751
+ # **Summarization Leaderboard** 📜
1752
+
1753
+ # - **Metric:** Spearman correlation based on cosine similarity
1754
+ # - **Languages:** English
1755
+ # """)
1756
+ # with gr.Row():
1757
+ # data_summarization = gr.components.Dataframe(
1758
+ # DATA_SUMMARIZATION,
1759
+ # datatype=["number", "markdown"] + ["number"] * 2,
1760
+ # type="pandas",
1761
+ # )
1762
+ # with gr.Row():
1763
+ # data_run = gr.Button("Refresh")
1764
+ # data_run.click(
1765
+ # partial(get_mteb_data, tasks=["Summarization"]),
1766
+ # outputs=data_summarization,
1767
+ # )
1768
+ # gr.Markdown(r"""
1769
+
1770
+ # Made with ❤️ for NLP. If this work is useful to you, please consider citing:
1771
+
1772
+ # ```bibtex
1773
+ # @article{muennighoff2022mteb,
1774
+ # doi = {10.48550/ARXIV.2210.07316},
1775
+ # url = {https://arxiv.org/abs/2210.07316},
1776
+ # author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Lo{\"\i}c and Reimers, Nils},
1777
+ # title = {MTEB: Massive Text Embedding Benchmark},
1778
+ # publisher = {arXiv},
1779
+ # journal={arXiv preprint arXiv:2210.07316},
1780
+ # year = {2022}
1781
+ # }
1782
+ # ```
1783
+ # """)
1784
+ # # Running the functions on page load in addition to when the button is clicked
1785
+ # # This is optional - If deactivated the data loaded at "Build time" is shown like for Overall tab
1786
+ # """
1787
+ # block.load(get_mteb_data, inputs=[task_bitext_mining], outputs=data_bitext_mining)
1788
+ # """
1789
+
1790
+ # block.queue(max_size=10)
1791
+ # block.launch()
1792
+
1793
+
1794
+ # # Possible changes:
1795
+ # # Could add graphs / other visual content
1796
+ # # Could add verification marks
1797
+
1798
+ # # Sources:
1799
+ # # https://huggingface.co/spaces/gradio/leaderboard
1800
+ # # https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard
1801
+ # # https://getemoji.com/
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio
2
+ datasets
3
+ pandas
4
+ huggingface_hub