Initial commit

a4b1f52 22 days ago

13.2 kB

	---
	license: mit
	language:
	- pt
	base_model:
	- Qwen/Qwen2.5-0.5B-Instruct
	pipeline_tag: text-generation
	datasets:
	- adalbertojunior/openHermes_portuguese
	- cnmoro/smoltalk-555k-ptbr
	- cnmoro/RagMixPTBR-Legal-Alpaca-2M
	- adalbertojunior/dolphin-2.9-portuguese
	model-index:
	- name: Qwen2.5-0.5B-Portuguese-v2
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 36.81
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 26.84
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 30.62
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 87.91
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 59.01
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 43.97
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 33.62
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 41.23
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 52.33
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v2
	name: Open Portuguese LLM Leaderboard
	---

	Qwen2.5-0.5B finetuned for proficiency in Portuguese language and increased intelligence.

	```text
	https://ollama.com/cnmoro/Qwen2.5-0.5B-Portuguese-v2
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "cnmoro/Qwen2.5-0.5B-Portuguese-v2"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	prompt = "Escreva uma breve introdução sobre LLMs (Large Language Models) e suas aplicações."

	# System prompt is always injected and hardcoded automatically
	# for ideal performance in portuguese language.
	# No need to write it again.
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	response
	# As Large Language Models (LLMs) são sistemas computacionais projetados para produzir
	# linguagem natural com alta precisão e fluência. Eles usam algoritmos avançados para compreender
	# e gerar texto, permitindo-lhes realizar tarefas como tradução de idiomas, geração de conteúdo
	# e processamento de linguagem natural.
	#
	# Os LLMs têm sido amplamente utilizados na área da inteligência artificial e do aprendizado
	# de máquina há vários anos. Alguns dos principais usos de LLMs incluem:
	#
	# 1. Tradução automática: Os LLMs podem traduzir textos entre diferentes idiomas, tornando-os
	# úteis em setores onde a comunicação internacional é crítica, como negócios internacionais,
	# diplomacia ou relações públicas.
	#
	# 2. Geração de conteúdo: os LLMs podem criar conteúdo altamente personalizado e adaptado às
	# necessidades específicas de seus usuários, tornando-os ideais para criação de sites, aplicativos
	# móveis ou plataformas de mídia social.
	#
	# 3. Processamento de Linguagem Natural: Os LLMs podem ser treinados para reconhecer e compreender
	# padrões de linguagem, permitindo-lhes compreender melhor as intenções humanas e responder adequadamente.
	#
	# 4. Análise de sentimento: Os LLMs podem analisar dados de texto e identificar sentimentos, ajudando
	# a entender como as pessoas se sentem em relação a determinadas questões ou questões sociais.
	#
	# No geral, os LLMs estão se tornando cada vez mais importantes à medida que a tecnologia continua a
	# avançar. À medida que continuamos a usar LLMs em nossas vidas diárias, podemos esperar ver ainda
	# mais desenvolvimentos interessantes no futuro.
	```

	## Overall Results

	\| Task \| Metric \| Value \| StdErr \|
	\|---------------------------\|---------------\|---------\|---------\|
	\| ASSIN2 RTE \| F1 Macro \| 0.4486 \| 0.0067 \|
	\| ASSIN2 RTE \| Accuracy \| 0.5560 \| 0.0071 \|
	\| ASSIN2 STS \| Pearson \| 0.4091 \| 0.0104 \|
	\| ASSIN2 STS \| MSE \| 5.6395 \| N/A \|
	\| BluEX \| Accuracy \| 0.2503 \| 0.0094 \|
	\| ENEM Challenge \| Accuracy \| 0.3128 \| 0.0071 \|
	\| FAQUAD NLI \| F1 Macro \| 0.4611 \| 0.0094 \|
	\| FAQUAD NLI \| Accuracy \| 0.7877 \| 0.0113 \|
	\| HateBR Offensive (Binary) \| F1 Macro \| 0.3439 \| 0.0049 \|
	\| HateBR Offensive (Binary) \| Accuracy \| 0.4857 \| 0.0095 \|
	\| OAB Exams \| Accuracy \| 0.3062 \| 0.0057 \|
	\| Portuguese Hate Speech (Binary) \| F1 Macro \| 0.4119 \| 0.0038 \|
	\| Portuguese Hate Speech (Binary) \| Accuracy \| 0.7004 \| 0.0111 \|
	\| TweetSentBR \| F1 Macro \| 0.5055 \| 0.0078 \|
	\| TweetSentBR \| Accuracy \| 0.5697 \| 0.0078 \|

	## Detailed Results by Task

	### ASSIN2 RTE

	\| Metric \| Value \| StdErr \|
	\|-------------\|---------\|---------\|
	\| F1 Macro \| 0.4486 \| 0.0067 \|
	\| Accuracy \| 0.5560 \| 0.0071 \|

	### ASSIN2 STS

	\| Metric \| Value \| StdErr \|
	\|-------------\|---------\|---------\|
	\| Pearson \| 0.4091 \| 0.0104 \|
	\| MSE \| 5.6395 \| N/A \|

	### BluEX

	\| Exam ID \| Metric \| Value \| StdErr \|
	\|-------------------\|----------\|---------\|---------\|
	\| All \| Accuracy \| 0.2503 \| 0.0094 \|
	\| USP_2018 \| Accuracy \| 0.2037 \| 0.0315 \|
	\| UNICAMP_2018 \| Accuracy \| 0.1852 \| 0.0306 \|
	\| UNICAMP_2021_1 \| Accuracy \| 0.0870 \| 0.0240 \|
	\| USP_2020 \| Accuracy \| 0.2143 \| 0.0317 \|
	\| USP_2023 \| Accuracy \| 0.2045 \| 0.0350 \|
	\| UNICAMP_2019 \| Accuracy \| 0.2600 \| 0.0358 \|
	\| USP_2019 \| Accuracy \| 0.1500 \| 0.0326 \|
	\| UNICAMP_2020 \| Accuracy \| 0.2182 \| 0.0321 \|
	\| UNICAMP_2021_2 \| Accuracy \| 0.2941 \| 0.0367 \|
	\| UNICAMP_2023 \| Accuracy \| 0.4186 \| 0.0433 \|
	\| UNICAMP_2024 \| Accuracy \| 0.3111 \| 0.0398 \|
	\| USP_2024 \| Accuracy \| 0.2683 \| 0.0398 \|
	\| USP_2021 \| Accuracy \| 0.3269 \| 0.0375 \|
	\| UNICAMP_2022 \| Accuracy \| 0.3590 \| 0.0444 \|
	\| USP_2022 \| Accuracy \| 0.2857 \| 0.0370 \|

	### ENEM Challenge

	\| Exam ID \| Metric \| Value \| StdErr \|
	\|-----------\|----------\|---------\|---------\|
	\| All \| Accuracy \| 0.3128 \| 0.0071 \|
	\| 2017 \| Accuracy \| 0.2845 \| 0.0241 \|
	\| 2016 \| Accuracy \| 0.2479 \| 0.0226 \|
	\| 2016_2 \| Accuracy \| 0.2846 \| 0.0235 \|
	\| 2022 \| Accuracy \| 0.3534 \| 0.0240 \|
	\| 2012 \| Accuracy \| 0.3362 \| 0.0253 \|
	\| 2011 \| Accuracy \| 0.3333 \| 0.0251 \|
	\| 2010 \| Accuracy \| 0.3846 \| 0.0260 \|
	\| 2014 \| Accuracy \| 0.3211 \| 0.0259 \|
	\| 2009 \| Accuracy \| 0.2696 \| 0.0239 \|
	\| 2015 \| Accuracy \| 0.2521 \| 0.0229 \|
	\| 2023 \| Accuracy \| 0.3481 \| 0.0236 \|
	\| 2013 \| Accuracy \| 0.3333 \| 0.0261 \|

	### FAQUAD NLI

	\| Metric \| Value \| StdErr \|
	\|-------------\|---------\|---------\|
	\| F1 Macro \| 0.4611 \| 0.0094 \|
	\| Accuracy \| 0.7877 \| 0.0113 \|

	### HateBR Offensive (Binary)

	\| Metric \| Value \| StdErr \|
	\|-------------\|---------\|---------\|
	\| F1 Macro \| 0.3439 \| 0.0049 \|
	\| Accuracy \| 0.4857 \| 0.0095 \|

	### OAB Exams

	\| Exam ID \| Metric \| Value \| StdErr \|
	\|-------------\|----------\|---------\|---------\|
	\| All \| Accuracy \| 0.3062 \| 0.0057 \|
	\| 2011-05 \| Accuracy \| 0.3375 \| 0.0304 \|
	\| 2012-06a \| Accuracy \| 0.2625 \| 0.0285 \|
	\| 2010-02 \| Accuracy \| 0.3700 \| 0.0279 \|
	\| 2017-22 \| Accuracy \| 0.3500 \| 0.0309 \|
	\| 2016-20 \| Accuracy \| 0.3125 \| 0.0300 \|
	\| 2011-03 \| Accuracy \| 0.2626 \| 0.0255 \|
	\| 2015-17 \| Accuracy \| 0.3205 \| 0.0304 \|
	\| 2017-23 \| Accuracy \| 0.2875 \| 0.0292 \|
	\| 2018-25 \| Accuracy \| 0.3625 \| 0.0311 \|
	\| 2016-19 \| Accuracy \| 0.2436 \| 0.0281 \|
	\| 2017-24 \| Accuracy \| 0.1625 \| 0.0238 \|
	\| 2015-16 \| Accuracy \| 0.3125 \| 0.0300 \|
	\| 2011-04 \| Accuracy \| 0.3250 \| 0.0301 \|
	\| 2012-07 \| Accuracy \| 0.3500 \| 0.0307 \|
	\| 2012-06 \| Accuracy \| 0.1875 \| 0.0253 \|
	\| 2012-09 \| Accuracy \| 0.2468 \| 0.0284 \|
	\| 2013-12 \| Accuracy \| 0.3625 \| 0.0311 \|
	\| 2013-11 \| Accuracy \| 0.3000 \| 0.0295 \|
	\| 2010-01 \| Accuracy \| 0.3412 \| 0.0296 \|
	\| 2015-18 \| Accuracy \| 0.2875 \| 0.0292 \|
	\| 2014-13 \| Accuracy \| 0.3500 \| 0.0308 \|
	\| 2013-10 \| Accuracy \| 0.3125 \| 0.0300 \|
	\| 2016-20a \| Accuracy \| 0.2500 \| 0.0279 \|
	\| 2014-14 \| Accuracy \| 0.3125 \| 0.0301 \|
	\| 2012-08 \| Accuracy \| 0.3000 \| 0.0296 \|
	\| 2016-21 \| Accuracy \| 0.3375 \| 0.0304 \|
	\| 2014-15 \| Accuracy \| 0.4103 \| 0.0321 \|

	### Portuguese Hate Speech (Binary)

	\| Metric \| Value \| StdErr \|
	\|-------------\|---------\|---------\|
	\| F1 Macro \| 0.4119 \| 0.0038 \|
	\| Accuracy \| 0.7004 \| 0.0111 \|

	### TweetSentBR

	\| Metric \| Value \| StdErr \|
	\|-------------\|---------\|---------\|
	\| F1 Macro \| 0.5055 \| 0.0078 \|
	\| Accuracy \| 0.5697 \| 0.0078 \|


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/cnmoro/Qwen2.5-0.5B-Portuguese-v2) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|45.81\|
	\|ENEM Challenge (No Images)\| 36.81\|
	\|BLUEX (No Images) \| 26.84\|
	\|OAB Exams \| 30.62\|
	\|Assin2 RTE \| 87.91\|
	\|Assin2 STS \| 59.01\|
	\|FaQuAD NLI \| 43.97\|
	\|HateBR Binary \| 33.62\|
	\|PT Hate Speech Binary \| 41.23\|
	\|tweetSentBR \| 52.33\|