Adding the Open Portuguese LLM Leaderboard Evaluation Results

#1
Files changed (1) hide show
  1. README.md +167 -1
README.md CHANGED
@@ -13,6 +13,153 @@ base_model:
13
  - meta-llama/Llama-3.1-8B
14
  pipeline_tag: text-generation
15
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  # Llama-Carvalho-PT
@@ -139,4 +286,23 @@ The above copyright notice and this permission notice shall be included in all c
139
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
140
 
141
  ### Funding
142
- This model was development within the Nós Project, funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the [project ILENIA](https://proyectoilenia.es/) with reference 2022/TL22/00215336.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - meta-llama/Llama-3.1-8B
14
  pipeline_tag: text-generation
15
  library_name: transformers
16
+ model-index:
17
+ - name: Llama-Carvalho-PT
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: ENEM Challenge (No Images)
24
+ type: eduagarcia/enem_challenge
25
+ split: train
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc
30
+ value: 19.38
31
+ name: accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
34
+ name: Open Portuguese LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: BLUEX (No Images)
40
+ type: eduagarcia-temp/BLUEX_without_images
41
+ split: train
42
+ args:
43
+ num_few_shot: 3
44
+ metrics:
45
+ - type: acc
46
+ value: 18.92
47
+ name: accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
50
+ name: Open Portuguese LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: OAB Exams
56
+ type: eduagarcia/oab_exams
57
+ split: train
58
+ args:
59
+ num_few_shot: 3
60
+ metrics:
61
+ - type: acc
62
+ value: 0.0
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
66
+ name: Open Portuguese LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: Assin2 RTE
72
+ type: assin2
73
+ split: test
74
+ args:
75
+ num_few_shot: 15
76
+ metrics:
77
+ - type: f1_macro
78
+ value: 87.5
79
+ name: f1-macro
80
+ source:
81
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
82
+ name: Open Portuguese LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: Assin2 STS
88
+ type: eduagarcia/portuguese_benchmark
89
+ split: test
90
+ args:
91
+ num_few_shot: 15
92
+ metrics:
93
+ - type: pearson
94
+ value: 75.7
95
+ name: pearson
96
+ source:
97
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
98
+ name: Open Portuguese LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: FaQuAD NLI
104
+ type: ruanchaves/faquad-nli
105
+ split: test
106
+ args:
107
+ num_few_shot: 15
108
+ metrics:
109
+ - type: f1_macro
110
+ value: 43.97
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: HateBR Binary
120
+ type: ruanchaves/hatebr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 76.93
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
130
+ name: Open Portuguese LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: PT Hate Speech Binary
136
+ type: hate_speech_portuguese
137
+ split: test
138
+ args:
139
+ num_few_shot: 25
140
+ metrics:
141
+ - type: f1_macro
142
+ value: 49.21
143
+ name: f1-macro
144
+ source:
145
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
146
+ name: Open Portuguese LLM Leaderboard
147
+ - task:
148
+ type: text-generation
149
+ name: Text Generation
150
+ dataset:
151
+ name: tweetSentBR
152
+ type: eduagarcia/tweetsentbr_fewshot
153
+ split: test
154
+ args:
155
+ num_few_shot: 25
156
+ metrics:
157
+ - type: f1_macro
158
+ value: 60.88
159
+ name: f1-macro
160
+ source:
161
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Nos-PT/Llama-Carvalho-PT
162
+ name: Open Portuguese LLM Leaderboard
163
  ---
164
 
165
  # Llama-Carvalho-PT
 
286
  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
287
 
288
  ### Funding
289
+ This model was development within the Nós Project, funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the [project ILENIA](https://proyectoilenia.es/) with reference 2022/TL22/00215336.
290
+
291
+
292
+ # Open Portuguese LLM Leaderboard Evaluation Results
293
+
294
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Nos-PT/Llama-Carvalho-PT) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
295
+
296
+ | Metric | Value |
297
+ |--------------------------|---------|
298
+ |Average |**48.05**|
299
+ |ENEM Challenge (No Images)| 19.38|
300
+ |BLUEX (No Images) | 18.92|
301
+ |OAB Exams | 0|
302
+ |Assin2 RTE | 87.50|
303
+ |Assin2 STS | 75.70|
304
+ |FaQuAD NLI | 43.97|
305
+ |HateBR Binary | 76.93|
306
+ |PT Hate Speech Binary | 49.21|
307
+ |tweetSentBR | 60.88|
308
+