--- license: mit language: - pt base_model: - Qwen/Qwen2.5-0.5B-Instruct pipeline_tag: text-generation datasets: - adalbertojunior/openHermes_portuguese - cnmoro/smoltalk-555k-ptbr - cnmoro/RagMixPTBR-Legal-Alpaca-2M - adalbertojunior/dolphin-2.9-portuguese model-index: - name: Qwen2.5-0.5B-Portuguese-v2 results: - task: type: text-generation name: Text Generation dataset: name: ENEM Challenge (No Images) type: eduagarcia/enem_challenge split: train args: num_few_shot: 3 metrics: - type: acc value: 36.81 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BLUEX (No Images) type: eduagarcia-temp/BLUEX_without_images split: train args: num_few_shot: 3 metrics: - type: acc value: 26.84 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: OAB Exams type: eduagarcia/oab_exams split: train args: num_few_shot: 3 metrics: - type: acc value: 30.62 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 RTE type: assin2 split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 87.91 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 STS type: eduagarcia/portuguese_benchmark split: test args: num_few_shot: 15 metrics: - type: pearson value: 59.01 name: pearson source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: FaQuAD NLI type: ruanchaves/faquad-nli split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 43.97 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HateBR Binary type: ruanchaves/hatebr split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 33.62 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: PT Hate Speech Binary type: hate_speech_portuguese split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 41.23 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: tweetSentBR type: eduagarcia/tweetsentbr_fewshot split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 52.33 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2 name: Open Portuguese LLM Leaderboard --- Qwen2.5-0.5B finetuned for proficiency in Portuguese language and increased intelligence. ```text https://ollama.com/cnmoro/Qwen2.5-0.5B-Portuguese-v2 ``` ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "cnmoro/Qwen2.5-0.5B-Portuguese-v2" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Escreva uma breve introdução sobre LLMs (Large Language Models) e suas aplicações." # System prompt is always injected and hardcoded automatically # for ideal performance in portuguese language. # No need to write it again. messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] response # As Large Language Models (LLMs) são sistemas computacionais projetados para produzir # linguagem natural com alta precisão e fluência. Eles usam algoritmos avançados para compreender # e gerar texto, permitindo-lhes realizar tarefas como tradução de idiomas, geração de conteúdo # e processamento de linguagem natural. # # Os LLMs têm sido amplamente utilizados na área da inteligência artificial e do aprendizado # de máquina há vários anos. Alguns dos principais usos de LLMs incluem: # # 1. Tradução automática: Os LLMs podem traduzir textos entre diferentes idiomas, tornando-os # úteis em setores onde a comunicação internacional é crítica, como negócios internacionais, # diplomacia ou relações públicas. # # 2. Geração de conteúdo: os LLMs podem criar conteúdo altamente personalizado e adaptado às # necessidades específicas de seus usuários, tornando-os ideais para criação de sites, aplicativos # móveis ou plataformas de mídia social. # # 3. Processamento de Linguagem Natural: Os LLMs podem ser treinados para reconhecer e compreender # padrões de linguagem, permitindo-lhes compreender melhor as intenções humanas e responder adequadamente. # # 4. Análise de sentimento: Os LLMs podem analisar dados de texto e identificar sentimentos, ajudando # a entender como as pessoas se sentem em relação a determinadas questões ou questões sociais. # # No geral, os LLMs estão se tornando cada vez mais importantes à medida que a tecnologia continua a # avançar. À medida que continuamos a usar LLMs em nossas vidas diárias, podemos esperar ver ainda # mais desenvolvimentos interessantes no futuro. ``` ## Overall Results | Task | Metric | Value | StdErr | |---------------------------|---------------|---------|---------| | ASSIN2 RTE | F1 Macro | 0.4486 | 0.0067 | | ASSIN2 RTE | Accuracy | 0.5560 | 0.0071 | | ASSIN2 STS | Pearson | 0.4091 | 0.0104 | | ASSIN2 STS | MSE | 5.6395 | N/A | | BluEX | Accuracy | 0.2503 | 0.0094 | | ENEM Challenge | Accuracy | 0.3128 | 0.0071 | | FAQUAD NLI | F1 Macro | 0.4611 | 0.0094 | | FAQUAD NLI | Accuracy | 0.7877 | 0.0113 | | HateBR Offensive (Binary) | F1 Macro | 0.3439 | 0.0049 | | HateBR Offensive (Binary) | Accuracy | 0.4857 | 0.0095 | | OAB Exams | Accuracy | 0.3062 | 0.0057 | | Portuguese Hate Speech (Binary) | F1 Macro | 0.4119 | 0.0038 | | Portuguese Hate Speech (Binary) | Accuracy | 0.7004 | 0.0111 | | TweetSentBR | F1 Macro | 0.5055 | 0.0078 | | TweetSentBR | Accuracy | 0.5697 | 0.0078 | ## Detailed Results by Task ### ASSIN2 RTE | Metric | Value | StdErr | |-------------|---------|---------| | F1 Macro | 0.4486 | 0.0067 | | Accuracy | 0.5560 | 0.0071 | ### ASSIN2 STS | Metric | Value | StdErr | |-------------|---------|---------| | Pearson | 0.4091 | 0.0104 | | MSE | 5.6395 | N/A | ### BluEX | Exam ID | Metric | Value | StdErr | |-------------------|----------|---------|---------| | All | Accuracy | 0.2503 | 0.0094 | | USP_2018 | Accuracy | 0.2037 | 0.0315 | | UNICAMP_2018 | Accuracy | 0.1852 | 0.0306 | | UNICAMP_2021_1 | Accuracy | 0.0870 | 0.0240 | | USP_2020 | Accuracy | 0.2143 | 0.0317 | | USP_2023 | Accuracy | 0.2045 | 0.0350 | | UNICAMP_2019 | Accuracy | 0.2600 | 0.0358 | | USP_2019 | Accuracy | 0.1500 | 0.0326 | | UNICAMP_2020 | Accuracy | 0.2182 | 0.0321 | | UNICAMP_2021_2 | Accuracy | 0.2941 | 0.0367 | | UNICAMP_2023 | Accuracy | 0.4186 | 0.0433 | | UNICAMP_2024 | Accuracy | 0.3111 | 0.0398 | | USP_2024 | Accuracy | 0.2683 | 0.0398 | | USP_2021 | Accuracy | 0.3269 | 0.0375 | | UNICAMP_2022 | Accuracy | 0.3590 | 0.0444 | | USP_2022 | Accuracy | 0.2857 | 0.0370 | ### ENEM Challenge | Exam ID | Metric | Value | StdErr | |-----------|----------|---------|---------| | All | Accuracy | 0.3128 | 0.0071 | | 2017 | Accuracy | 0.2845 | 0.0241 | | 2016 | Accuracy | 0.2479 | 0.0226 | | 2016_2 | Accuracy | 0.2846 | 0.0235 | | 2022 | Accuracy | 0.3534 | 0.0240 | | 2012 | Accuracy | 0.3362 | 0.0253 | | 2011 | Accuracy | 0.3333 | 0.0251 | | 2010 | Accuracy | 0.3846 | 0.0260 | | 2014 | Accuracy | 0.3211 | 0.0259 | | 2009 | Accuracy | 0.2696 | 0.0239 | | 2015 | Accuracy | 0.2521 | 0.0229 | | 2023 | Accuracy | 0.3481 | 0.0236 | | 2013 | Accuracy | 0.3333 | 0.0261 | ### FAQUAD NLI | Metric | Value | StdErr | |-------------|---------|---------| | F1 Macro | 0.4611 | 0.0094 | | Accuracy | 0.7877 | 0.0113 | ### HateBR Offensive (Binary) | Metric | Value | StdErr | |-------------|---------|---------| | F1 Macro | 0.3439 | 0.0049 | | Accuracy | 0.4857 | 0.0095 | ### OAB Exams | Exam ID | Metric | Value | StdErr | |-------------|----------|---------|---------| | All | Accuracy | 0.3062 | 0.0057 | | 2011-05 | Accuracy | 0.3375 | 0.0304 | | 2012-06a | Accuracy | 0.2625 | 0.0285 | | 2010-02 | Accuracy | 0.3700 | 0.0279 | | 2017-22 | Accuracy | 0.3500 | 0.0309 | | 2016-20 | Accuracy | 0.3125 | 0.0300 | | 2011-03 | Accuracy | 0.2626 | 0.0255 | | 2015-17 | Accuracy | 0.3205 | 0.0304 | | 2017-23 | Accuracy | 0.2875 | 0.0292 | | 2018-25 | Accuracy | 0.3625 | 0.0311 | | 2016-19 | Accuracy | 0.2436 | 0.0281 | | 2017-24 | Accuracy | 0.1625 | 0.0238 | | 2015-16 | Accuracy | 0.3125 | 0.0300 | | 2011-04 | Accuracy | 0.3250 | 0.0301 | | 2012-07 | Accuracy | 0.3500 | 0.0307 | | 2012-06 | Accuracy | 0.1875 | 0.0253 | | 2012-09 | Accuracy | 0.2468 | 0.0284 | | 2013-12 | Accuracy | 0.3625 | 0.0311 | | 2013-11 | Accuracy | 0.3000 | 0.0295 | | 2010-01 | Accuracy | 0.3412 | 0.0296 | | 2015-18 | Accuracy | 0.2875 | 0.0292 | | 2014-13 | Accuracy | 0.3500 | 0.0308 | | 2013-10 | Accuracy | 0.3125 | 0.0300 | | 2016-20a | Accuracy | 0.2500 | 0.0279 | | 2014-14 | Accuracy | 0.3125 | 0.0301 | | 2012-08 | Accuracy | 0.3000 | 0.0296 | | 2016-21 | Accuracy | 0.3375 | 0.0304 | | 2014-15 | Accuracy | 0.4103 | 0.0321 | ### Portuguese Hate Speech (Binary) | Metric | Value | StdErr | |-------------|---------|---------| | F1 Macro | 0.4119 | 0.0038 | | Accuracy | 0.7004 | 0.0111 | ### TweetSentBR | Metric | Value | StdErr | |-------------|---------|---------| | F1 Macro | 0.5055 | 0.0078 | | Accuracy | 0.5697 | 0.0078 | # Open Portuguese LLM Leaderboard Evaluation Results Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/cnmoro/Qwen2.5-0.5B-Portuguese-v2) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) | Metric | Value | |--------------------------|---------| |Average |**45.81**| |ENEM Challenge (No Images)| 36.81| |BLUEX (No Images) | 26.84| |OAB Exams | 30.62| |Assin2 RTE | 87.91| |Assin2 STS | 59.01| |FaQuAD NLI | 43.97| |HateBR Binary | 33.62| |PT Hate Speech Binary | 41.23| |tweetSentBR | 52.33|