metadata
license: mit
language:
- pt
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
pipeline_tag: text-generation
datasets:
- adalbertojunior/openHermes_portuguese
- cnmoro/smoltalk-555k-ptbr
- cnmoro/RagMixPTBR-Legal-Alpaca-2M
- adalbertojunior/dolphin-2.9-portuguese
model-index:
- name: Qwen2.5-0.5B-Portuguese-v2
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: ENEM Challenge (No Images)
type: eduagarcia/enem_challenge
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 36.81
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BLUEX (No Images)
type: eduagarcia-temp/BLUEX_without_images
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 26.84
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: OAB Exams
type: eduagarcia/oab_exams
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 30.62
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 RTE
type: assin2
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 87.91
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 STS
type: eduagarcia/portuguese_benchmark
split: test
args:
num_few_shot: 15
metrics:
- type: pearson
value: 59.01
name: pearson
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: FaQuAD NLI
type: ruanchaves/faquad-nli
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 43.97
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HateBR Binary
type: ruanchaves/hatebr
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 33.62
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: PT Hate Speech Binary
type: hate_speech_portuguese
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 41.23
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: tweetSentBR
type: eduagarcia/tweetsentbr_fewshot
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 52.33
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
name: Open Portuguese LLM Leaderboard
Qwen2.5-0.5B finetuned for proficiency in Portuguese language and increased intelligence.
https://ollama.com/cnmoro/Qwen2.5-0.5B-Portuguese-v2
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "cnmoro/Qwen2.5-0.5B-Portuguese-v2"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Escreva uma breve introdução sobre LLMs (Large Language Models) e suas aplicações."
# System prompt is always injected and hardcoded automatically
# for ideal performance in portuguese language.
# No need to write it again.
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
response
# As Large Language Models (LLMs) são sistemas computacionais projetados para produzir
# linguagem natural com alta precisão e fluência. Eles usam algoritmos avançados para compreender
# e gerar texto, permitindo-lhes realizar tarefas como tradução de idiomas, geração de conteúdo
# e processamento de linguagem natural.
#
# Os LLMs têm sido amplamente utilizados na área da inteligência artificial e do aprendizado
# de máquina há vários anos. Alguns dos principais usos de LLMs incluem:
#
# 1. Tradução automática: Os LLMs podem traduzir textos entre diferentes idiomas, tornando-os
# úteis em setores onde a comunicação internacional é crítica, como negócios internacionais,
# diplomacia ou relações públicas.
#
# 2. Geração de conteúdo: os LLMs podem criar conteúdo altamente personalizado e adaptado às
# necessidades específicas de seus usuários, tornando-os ideais para criação de sites, aplicativos
# móveis ou plataformas de mídia social.
#
# 3. Processamento de Linguagem Natural: Os LLMs podem ser treinados para reconhecer e compreender
# padrões de linguagem, permitindo-lhes compreender melhor as intenções humanas e responder adequadamente.
#
# 4. Análise de sentimento: Os LLMs podem analisar dados de texto e identificar sentimentos, ajudando
# a entender como as pessoas se sentem em relação a determinadas questões ou questões sociais.
#
# No geral, os LLMs estão se tornando cada vez mais importantes à medida que a tecnologia continua a
# avançar. À medida que continuamos a usar LLMs em nossas vidas diárias, podemos esperar ver ainda
# mais desenvolvimentos interessantes no futuro.
Overall Results
Task | Metric | Value | StdErr |
---|---|---|---|
ASSIN2 RTE | F1 Macro | 0.4486 | 0.0067 |
ASSIN2 RTE | Accuracy | 0.5560 | 0.0071 |
ASSIN2 STS | Pearson | 0.4091 | 0.0104 |
ASSIN2 STS | MSE | 5.6395 | N/A |
BluEX | Accuracy | 0.2503 | 0.0094 |
ENEM Challenge | Accuracy | 0.3128 | 0.0071 |
FAQUAD NLI | F1 Macro | 0.4611 | 0.0094 |
FAQUAD NLI | Accuracy | 0.7877 | 0.0113 |
HateBR Offensive (Binary) | F1 Macro | 0.3439 | 0.0049 |
HateBR Offensive (Binary) | Accuracy | 0.4857 | 0.0095 |
OAB Exams | Accuracy | 0.3062 | 0.0057 |
Portuguese Hate Speech (Binary) | F1 Macro | 0.4119 | 0.0038 |
Portuguese Hate Speech (Binary) | Accuracy | 0.7004 | 0.0111 |
TweetSentBR | F1 Macro | 0.5055 | 0.0078 |
TweetSentBR | Accuracy | 0.5697 | 0.0078 |
Detailed Results by Task
ASSIN2 RTE
Metric | Value | StdErr |
---|---|---|
F1 Macro | 0.4486 | 0.0067 |
Accuracy | 0.5560 | 0.0071 |
ASSIN2 STS
Metric | Value | StdErr |
---|---|---|
Pearson | 0.4091 | 0.0104 |
MSE | 5.6395 | N/A |
BluEX
Exam ID | Metric | Value | StdErr |
---|---|---|---|
All | Accuracy | 0.2503 | 0.0094 |
USP_2018 | Accuracy | 0.2037 | 0.0315 |
UNICAMP_2018 | Accuracy | 0.1852 | 0.0306 |
UNICAMP_2021_1 | Accuracy | 0.0870 | 0.0240 |
USP_2020 | Accuracy | 0.2143 | 0.0317 |
USP_2023 | Accuracy | 0.2045 | 0.0350 |
UNICAMP_2019 | Accuracy | 0.2600 | 0.0358 |
USP_2019 | Accuracy | 0.1500 | 0.0326 |
UNICAMP_2020 | Accuracy | 0.2182 | 0.0321 |
UNICAMP_2021_2 | Accuracy | 0.2941 | 0.0367 |
UNICAMP_2023 | Accuracy | 0.4186 | 0.0433 |
UNICAMP_2024 | Accuracy | 0.3111 | 0.0398 |
USP_2024 | Accuracy | 0.2683 | 0.0398 |
USP_2021 | Accuracy | 0.3269 | 0.0375 |
UNICAMP_2022 | Accuracy | 0.3590 | 0.0444 |
USP_2022 | Accuracy | 0.2857 | 0.0370 |
ENEM Challenge
Exam ID | Metric | Value | StdErr |
---|---|---|---|
All | Accuracy | 0.3128 | 0.0071 |
2017 | Accuracy | 0.2845 | 0.0241 |
2016 | Accuracy | 0.2479 | 0.0226 |
2016_2 | Accuracy | 0.2846 | 0.0235 |
2022 | Accuracy | 0.3534 | 0.0240 |
2012 | Accuracy | 0.3362 | 0.0253 |
2011 | Accuracy | 0.3333 | 0.0251 |
2010 | Accuracy | 0.3846 | 0.0260 |
2014 | Accuracy | 0.3211 | 0.0259 |
2009 | Accuracy | 0.2696 | 0.0239 |
2015 | Accuracy | 0.2521 | 0.0229 |
2023 | Accuracy | 0.3481 | 0.0236 |
2013 | Accuracy | 0.3333 | 0.0261 |
FAQUAD NLI
Metric | Value | StdErr |
---|---|---|
F1 Macro | 0.4611 | 0.0094 |
Accuracy | 0.7877 | 0.0113 |
HateBR Offensive (Binary)
Metric | Value | StdErr |
---|---|---|
F1 Macro | 0.3439 | 0.0049 |
Accuracy | 0.4857 | 0.0095 |
OAB Exams
Exam ID | Metric | Value | StdErr |
---|---|---|---|
All | Accuracy | 0.3062 | 0.0057 |
2011-05 | Accuracy | 0.3375 | 0.0304 |
2012-06a | Accuracy | 0.2625 | 0.0285 |
2010-02 | Accuracy | 0.3700 | 0.0279 |
2017-22 | Accuracy | 0.3500 | 0.0309 |
2016-20 | Accuracy | 0.3125 | 0.0300 |
2011-03 | Accuracy | 0.2626 | 0.0255 |
2015-17 | Accuracy | 0.3205 | 0.0304 |
2017-23 | Accuracy | 0.2875 | 0.0292 |
2018-25 | Accuracy | 0.3625 | 0.0311 |
2016-19 | Accuracy | 0.2436 | 0.0281 |
2017-24 | Accuracy | 0.1625 | 0.0238 |
2015-16 | Accuracy | 0.3125 | 0.0300 |
2011-04 | Accuracy | 0.3250 | 0.0301 |
2012-07 | Accuracy | 0.3500 | 0.0307 |
2012-06 | Accuracy | 0.1875 | 0.0253 |
2012-09 | Accuracy | 0.2468 | 0.0284 |
2013-12 | Accuracy | 0.3625 | 0.0311 |
2013-11 | Accuracy | 0.3000 | 0.0295 |
2010-01 | Accuracy | 0.3412 | 0.0296 |
2015-18 | Accuracy | 0.2875 | 0.0292 |
2014-13 | Accuracy | 0.3500 | 0.0308 |
2013-10 | Accuracy | 0.3125 | 0.0300 |
2016-20a | Accuracy | 0.2500 | 0.0279 |
2014-14 | Accuracy | 0.3125 | 0.0301 |
2012-08 | Accuracy | 0.3000 | 0.0296 |
2016-21 | Accuracy | 0.3375 | 0.0304 |
2014-15 | Accuracy | 0.4103 | 0.0321 |
Portuguese Hate Speech (Binary)
Metric | Value | StdErr |
---|---|---|
F1 Macro | 0.4119 | 0.0038 |
Accuracy | 0.7004 | 0.0111 |
TweetSentBR
Metric | Value | StdErr |
---|---|---|
F1 Macro | 0.5055 | 0.0078 |
Accuracy | 0.5697 | 0.0078 |
Open Portuguese LLM Leaderboard Evaluation Results
Detailed results can be found here and on the 🚀 Open Portuguese LLM Leaderboard
Metric | Value |
---|---|
Average | 45.81 |
ENEM Challenge (No Images) | 36.81 |
BLUEX (No Images) | 26.84 |
OAB Exams | 30.62 |
Assin2 RTE | 87.91 |
Assin2 STS | 59.01 |
FaQuAD NLI | 43.97 |
HateBR Binary | 33.62 |
PT Hate Speech Binary | 41.23 |
tweetSentBR | 52.33 |