File size: 13,229 Bytes

a4b1f52

---
license: mit
language:
- pt
base_model:
- Qwen/Qwen2.5-0.5B-Instruct
pipeline_tag: text-generation
datasets:
- adalbertojunior/openHermes_portuguese
- cnmoro/smoltalk-555k-ptbr
- cnmoro/RagMixPTBR-Legal-Alpaca-2M
- adalbertojunior/dolphin-2.9-portuguese
model-index:
- name: Qwen2.5-0.5B-Portuguese-v2
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: ENEM Challenge (No Images)
      type: eduagarcia/enem_challenge
      split: train
      args:
        num_few_shot: 3
    metrics:
    - type: acc
      value: 36.81
      name: accuracy
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BLUEX (No Images)
      type: eduagarcia-temp/BLUEX_without_images
      split: train
      args:
        num_few_shot: 3
    metrics:
    - type: acc
      value: 26.84
      name: accuracy
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: OAB Exams
      type: eduagarcia/oab_exams
      split: train
      args:
        num_few_shot: 3
    metrics:
    - type: acc
      value: 30.62
      name: accuracy
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Assin2 RTE
      type: assin2
      split: test
      args:
        num_few_shot: 15
    metrics:
    - type: f1_macro
      value: 87.91
      name: f1-macro
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Assin2 STS
      type: eduagarcia/portuguese_benchmark
      split: test
      args:
        num_few_shot: 15
    metrics:
    - type: pearson
      value: 59.01
      name: pearson
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: FaQuAD NLI
      type: ruanchaves/faquad-nli
      split: test
      args:
        num_few_shot: 15
    metrics:
    - type: f1_macro
      value: 43.97
      name: f1-macro
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HateBR Binary
      type: ruanchaves/hatebr
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: f1_macro
      value: 33.62
      name: f1-macro
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: PT Hate Speech Binary
      type: hate_speech_portuguese
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: f1_macro
      value: 41.23
      name: f1-macro
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: tweetSentBR
      type: eduagarcia/tweetsentbr_fewshot
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: f1_macro
      value: 52.33
      name: f1-macro
    source:
      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
      name: Open Portuguese LLM Leaderboard
---

Qwen2.5-0.5B finetuned for proficiency in Portuguese language and increased intelligence.

```text
https://ollama.com/cnmoro/Qwen2.5-0.5B-Portuguese-v2
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "cnmoro/Qwen2.5-0.5B-Portuguese-v2"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Escreva uma breve introdução sobre LLMs (Large Language Models) e suas aplicações."

# System prompt is always injected and hardcoded automatically
# for ideal performance in portuguese language.
# No need to write it again.
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
response
# As Large Language Models (LLMs) são sistemas computacionais projetados para produzir
# linguagem natural com alta precisão e fluência. Eles usam algoritmos avançados para compreender
# e gerar texto, permitindo-lhes realizar tarefas como tradução de idiomas, geração de conteúdo
# e processamento de linguagem natural.
#
# Os LLMs têm sido amplamente utilizados na área da inteligência artificial e do aprendizado
# de máquina há vários anos. Alguns dos principais usos de LLMs incluem:
#
# 1. Tradução automática: Os LLMs podem traduzir textos entre diferentes idiomas, tornando-os
# úteis em setores onde a comunicação internacional é crítica, como negócios internacionais,
# diplomacia ou relações públicas.
#
# 2. Geração de conteúdo: os LLMs podem criar conteúdo altamente personalizado e adaptado às
# necessidades específicas de seus usuários, tornando-os ideais para criação de sites, aplicativos
# móveis ou plataformas de mídia social.
#
# 3. Processamento de Linguagem Natural: Os LLMs podem ser treinados para reconhecer e compreender
# padrões de linguagem, permitindo-lhes compreender melhor as intenções humanas e responder adequadamente.
#
# 4. Análise de sentimento: Os LLMs podem analisar dados de texto e identificar sentimentos, ajudando
# a entender como as pessoas se sentem em relação a determinadas questões ou questões sociais.
#
# No geral, os LLMs estão se tornando cada vez mais importantes à medida que a tecnologia continua a
# avançar. À medida que continuamos a usar LLMs em nossas vidas diárias, podemos esperar ver ainda
# mais desenvolvimentos interessantes no futuro.
```

## Overall Results

| Task                      | Metric        | Value   | StdErr  |
|---------------------------|---------------|---------|---------|
| ASSIN2 RTE                | F1 Macro      | 0.4486  | 0.0067  |
| ASSIN2 RTE                | Accuracy      | 0.5560  | 0.0071  |
| ASSIN2 STS                | Pearson       | 0.4091  | 0.0104  |
| ASSIN2 STS                | MSE           | 5.6395  | N/A     |
| BluEX                     | Accuracy      | 0.2503  | 0.0094  |
| ENEM Challenge            | Accuracy      | 0.3128  | 0.0071  |
| FAQUAD NLI                | F1 Macro      | 0.4611  | 0.0094  |
| FAQUAD NLI                | Accuracy      | 0.7877  | 0.0113  |
| HateBR Offensive (Binary) | F1 Macro      | 0.3439  | 0.0049  |
| HateBR Offensive (Binary) | Accuracy      | 0.4857  | 0.0095  |
| OAB Exams                 | Accuracy      | 0.3062  | 0.0057  |
| Portuguese Hate Speech (Binary) | F1 Macro | 0.4119  | 0.0038  |
| Portuguese Hate Speech (Binary) | Accuracy | 0.7004  | 0.0111  |
| TweetSentBR               | F1 Macro      | 0.5055  | 0.0078  |
| TweetSentBR               | Accuracy      | 0.5697  | 0.0078  |

## Detailed Results by Task

### ASSIN2 RTE

| Metric      | Value   | StdErr  |
|-------------|---------|---------|
| F1 Macro    | 0.4486  | 0.0067  |
| Accuracy    | 0.5560  | 0.0071  |

### ASSIN2 STS

| Metric      | Value   | StdErr  |
|-------------|---------|---------|
| Pearson     | 0.4091  | 0.0104  |
| MSE         | 5.6395  | N/A     |

### BluEX

| Exam ID           | Metric   | Value   | StdErr  |
|-------------------|----------|---------|---------|
| All               | Accuracy | 0.2503  | 0.0094  |
| USP_2018          | Accuracy | 0.2037  | 0.0315  |
| UNICAMP_2018      | Accuracy | 0.1852  | 0.0306  |
| UNICAMP_2021_1    | Accuracy | 0.0870  | 0.0240  |
| USP_2020          | Accuracy | 0.2143  | 0.0317  |
| USP_2023          | Accuracy | 0.2045  | 0.0350  |
| UNICAMP_2019      | Accuracy | 0.2600  | 0.0358  |
| USP_2019          | Accuracy | 0.1500  | 0.0326  |
| UNICAMP_2020      | Accuracy | 0.2182  | 0.0321  |
| UNICAMP_2021_2    | Accuracy | 0.2941  | 0.0367  |
| UNICAMP_2023      | Accuracy | 0.4186  | 0.0433  |
| UNICAMP_2024      | Accuracy | 0.3111  | 0.0398  |
| USP_2024          | Accuracy | 0.2683  | 0.0398  |
| USP_2021          | Accuracy | 0.3269  | 0.0375  |
| UNICAMP_2022      | Accuracy | 0.3590  | 0.0444  |
| USP_2022          | Accuracy | 0.2857  | 0.0370  |

### ENEM Challenge

| Exam ID   | Metric   | Value   | StdErr  |
|-----------|----------|---------|---------|
| All       | Accuracy | 0.3128  | 0.0071  |
| 2017      | Accuracy | 0.2845  | 0.0241  |
| 2016      | Accuracy | 0.2479  | 0.0226  |
| 2016_2    | Accuracy | 0.2846  | 0.0235  |
| 2022      | Accuracy | 0.3534  | 0.0240  |
| 2012      | Accuracy | 0.3362  | 0.0253  |
| 2011      | Accuracy | 0.3333  | 0.0251  |
| 2010      | Accuracy | 0.3846  | 0.0260  |
| 2014      | Accuracy | 0.3211  | 0.0259  |
| 2009      | Accuracy | 0.2696  | 0.0239  |
| 2015      | Accuracy | 0.2521  | 0.0229  |
| 2023      | Accuracy | 0.3481  | 0.0236  |
| 2013      | Accuracy | 0.3333  | 0.0261  |

### FAQUAD NLI

| Metric      | Value   | StdErr  |
|-------------|---------|---------|
| F1 Macro    | 0.4611  | 0.0094  |
| Accuracy    | 0.7877  | 0.0113  |

### HateBR Offensive (Binary)

| Metric      | Value   | StdErr  |
|-------------|---------|---------|
| F1 Macro    | 0.3439  | 0.0049  |
| Accuracy    | 0.4857  | 0.0095  |

### OAB Exams

| Exam ID     | Metric   | Value   | StdErr  |
|-------------|----------|---------|---------|
| All         | Accuracy | 0.3062  | 0.0057  |
| 2011-05     | Accuracy | 0.3375  | 0.0304  |
| 2012-06a    | Accuracy | 0.2625  | 0.0285  |
| 2010-02     | Accuracy | 0.3700  | 0.0279  |
| 2017-22     | Accuracy | 0.3500  | 0.0309  |
| 2016-20     | Accuracy | 0.3125  | 0.0300  |
| 2011-03     | Accuracy | 0.2626  | 0.0255  |
| 2015-17     | Accuracy | 0.3205  | 0.0304  |
| 2017-23     | Accuracy | 0.2875  | 0.0292  |
| 2018-25     | Accuracy | 0.3625  | 0.0311  |
| 2016-19     | Accuracy | 0.2436  | 0.0281  |
| 2017-24     | Accuracy | 0.1625  | 0.0238  |
| 2015-16     | Accuracy | 0.3125  | 0.0300  |
| 2011-04     | Accuracy | 0.3250  | 0.0301  |
| 2012-07     | Accuracy | 0.3500  | 0.0307  |
| 2012-06     | Accuracy | 0.1875  | 0.0253  |
| 2012-09     | Accuracy | 0.2468  | 0.0284  |
| 2013-12     | Accuracy | 0.3625  | 0.0311  |
| 2013-11     | Accuracy | 0.3000  | 0.0295  |
| 2010-01     | Accuracy | 0.3412  | 0.0296  |
| 2015-18     | Accuracy | 0.2875  | 0.0292  |
| 2014-13     | Accuracy | 0.3500  | 0.0308  |
| 2013-10     | Accuracy | 0.3125  | 0.0300  |
| 2016-20a    | Accuracy | 0.2500  | 0.0279  |
| 2014-14     | Accuracy | 0.3125  | 0.0301  |
| 2012-08     | Accuracy | 0.3000  | 0.0296  |
| 2016-21     | Accuracy | 0.3375  | 0.0304  |
| 2014-15     | Accuracy | 0.4103  | 0.0321  |

### Portuguese Hate Speech (Binary)

| Metric      | Value   | StdErr  |
|-------------|---------|---------|
| F1 Macro    | 0.4119  | 0.0038  |
| Accuracy    | 0.7004  | 0.0111  |

### TweetSentBR

| Metric      | Value   | StdErr  |
|-------------|---------|---------|
| F1 Macro    | 0.5055  | 0.0078  |
| Accuracy    | 0.5697  | 0.0078  |


# Open Portuguese LLM Leaderboard Evaluation Results  

Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/cnmoro/Qwen2.5-0.5B-Portuguese-v2) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

|          Metric          |  Value  |
|--------------------------|---------|
|Average                   |**45.81**|
|ENEM Challenge (No Images)|    36.81|
|BLUEX (No Images)         |    26.84|
|OAB Exams                 |    30.62|
|Assin2 RTE                |    87.91|
|Assin2 STS                |    59.01|
|FaQuAD NLI                |    43.97|
|HateBR Binary             |    33.62|
|PT Hate Speech Binary     |    41.23|
|tweetSentBR               |    52.33|