Files changed (1) hide show
  1. README.md +146 -134
README.md CHANGED
@@ -1,135 +1,147 @@
1
- ---
2
- language:
3
- - pt
4
- metrics:
5
- - accuracy
6
- - f1
7
- - pearsonr
8
- base_model:
9
- - Qwen/Qwen2.5-7B-Instruct
10
- pipeline_tag: text-generation
11
- library_name: transformers
12
- tags:
13
- - text-generation-inference
14
- license: apache-2.0
15
- ---
16
-
17
- ### Amadeus-Verbo-FI-Qwen2.5-7B-PT-BR-Instruct
18
- #### Introduction
19
- Amadeus-Verbo-FI-Qwen2.5-7B-PT-BR-Instruct is a Brazilian-Portuguese language model (PT-BR-LLM) developed from the base model Qwen2.5-7B-Instruct through fine-tuning, for 2 epochs, with 600k instructions dataset.
20
- Read our article [here](https://www.).
21
-
22
- ## Details
23
-
24
- - **Architecture:** a Transformer-based model with RoPE, SwiGLU, RMSNorm, and Attention QKV bias pre-trained via Causal Language Modeling
25
- - **Parameters:** 7.61B parameters
26
- - **Number of Parameters (Non-Embedding):** 6.53B
27
- - **Number of Layers:** 28
28
- - **Number of Attention Heads (GQA):** 28 for Q and 4 for KV
29
- - **Context length:** 131,072 tokens
30
- - **Number of steps:** 78838
31
- - **Language:** Brazilian Portuguese
32
-
33
- #### Usage
34
-
35
- You can use Amadeus-Verbo-FI-Qwen2.5-7B-PT-BR-Instruct with the latest HuggingFace Transformers library and we advise you to use the latest version of Transformers.
36
-
37
- With transformers<4.37.0, you will encounter the following error:
38
-
39
- KeyError: 'qwen2'
40
-
41
- Below, we have provided a simple example of how to load the model and generate text:
42
-
43
- #### Quickstart
44
- The following code snippet uses `pipeline`, `AutoTokenizer`, `AutoModelForCausalLM` and apply_chat_template to show how to load the tokenizer, the model, and how to generate content.
45
-
46
- Using the pipeline:
47
- ```python
48
- from transformers import pipeline
49
-
50
- messages = [
51
- {"role": "user", "content": "Faça uma planilha nutricional para uma alimentação fitness e mediterrânea com todos os dias da semana"},
52
- ]
53
- pipe = pipeline("text-generation", model="amadeusai/AV-FI-Qwen2.5-7B-PT-BR-Instruct")
54
- pipe(messages)
55
- ```
56
- OR
57
- ```python
58
- from transformers import AutoModelForCausalLM, AutoTokenizer
59
-
60
- model_name = "amadeusai/AV-FI-Qwen2.5-7B-PT-BR-Instruct"
61
-
62
- model = AutoModelForCausalLM.from_pretrained(
63
- model_name,
64
- torch_dtype="auto",
65
- device_map="auto"
66
- )
67
- tokenizer = AutoTokenizer.from_pretrained(model_name)
68
-
69
- prompt = "Faça uma planilha nutricional para uma alimentação fitness e mediterrânea com todos os dias da semana."
70
- messages = [
71
- {"role": "system", "content": "Você é um assistente útil."},
72
- {"role": "user", "content": prompt}
73
- ]
74
- text = tokenizer.apply_chat_template(
75
- messages,
76
- tokenize=False,
77
- add_generation_prompt=True
78
- )
79
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
80
-
81
- generated_ids = model.generate(
82
- **model_inputs,
83
- max_new_tokens=512
84
- )
85
- generated_ids = [
86
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
87
- ]
88
-
89
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
90
- ```
91
- OR
92
- ```python
93
- from transformers import GenerationConfig, TextGenerationPipeline, AutoTokenizer, AutoModelForCausalLM
94
- import torch
95
-
96
- # Specify the model and tokenizer
97
- model_id = "amadeusai/AV-FI-Qwen2.5-7B-PT-BR-Instruct"
98
- tokenizer = AutoTokenizer.from_pretrained(model_id)
99
- model = AutoModelForCausalLM.from_pretrained(model_id)
100
-
101
- # Specify the generation parameters as you like
102
- generation_config = GenerationConfig(
103
- **{
104
- "do_sample": True,
105
- "max_new_tokens": 512,
106
- "renormalize_logits": True,
107
- "repetition_penalty": 1.2,
108
- "temperature": 0.1,
109
- "top_k": 50,
110
- "top_p": 1.0,
111
- "use_cache": True,
112
- }
113
- )
114
-
115
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
116
- generator = TextGenerationPipeline(model=model, task="text-generation", tokenizer=tokenizer, device=device)
117
-
118
- # Generate text
119
- prompt = "Faça uma planilha nutricional para uma alimentação fitness e mediterrânea com todos os dias da semana"
120
- completion = generator(prompt, generation_config=generation_config)
121
- print(completion[0]['generated_text'])
122
- ```
123
-
124
- #### Citation
125
-
126
- If you find our work helpful, feel free to cite it.
127
- ```
128
- @misc{Amadeus AI,
129
- title = {Amadeus Verbo: A Brazilian Portuguese large language model.},
130
- url = {https://amadeus-ai.com},
131
- author = {Amadeus AI},
132
- month = {November},
133
- year = {2024}
134
- }
 
 
 
 
 
 
 
 
 
 
 
 
135
  ```
 
1
+ ---
2
+ language:
3
+ - zho
4
+ - eng
5
+ - fra
6
+ - spa
7
+ - por
8
+ - deu
9
+ - ita
10
+ - rus
11
+ - jpn
12
+ - kor
13
+ - vie
14
+ - tha
15
+ - ara
16
+ metrics:
17
+ - accuracy
18
+ - f1
19
+ - pearsonr
20
+ base_model:
21
+ - Qwen/Qwen2.5-7B-Instruct
22
+ pipeline_tag: text-generation
23
+ library_name: transformers
24
+ tags:
25
+ - text-generation-inference
26
+ license: apache-2.0
27
+ ---
28
+
29
+ ### Amadeus-Verbo-FI-Qwen2.5-7B-PT-BR-Instruct
30
+ #### Introduction
31
+ Amadeus-Verbo-FI-Qwen2.5-7B-PT-BR-Instruct is a Brazilian-Portuguese language model (PT-BR-LLM) developed from the base model Qwen2.5-7B-Instruct through fine-tuning, for 2 epochs, with 600k instructions dataset.
32
+ Read our article [here](https://www.).
33
+
34
+ ## Details
35
+
36
+ - **Architecture:** a Transformer-based model with RoPE, SwiGLU, RMSNorm, and Attention QKV bias pre-trained via Causal Language Modeling
37
+ - **Parameters:** 7.61B parameters
38
+ - **Number of Parameters (Non-Embedding):** 6.53B
39
+ - **Number of Layers:** 28
40
+ - **Number of Attention Heads (GQA):** 28 for Q and 4 for KV
41
+ - **Context length:** 131,072 tokens
42
+ - **Number of steps:** 78838
43
+ - **Language:** Brazilian Portuguese
44
+
45
+ #### Usage
46
+
47
+ You can use Amadeus-Verbo-FI-Qwen2.5-7B-PT-BR-Instruct with the latest HuggingFace Transformers library and we advise you to use the latest version of Transformers.
48
+
49
+ With transformers<4.37.0, you will encounter the following error:
50
+
51
+ KeyError: 'qwen2'
52
+
53
+ Below, we have provided a simple example of how to load the model and generate text:
54
+
55
+ #### Quickstart
56
+ The following code snippet uses `pipeline`, `AutoTokenizer`, `AutoModelForCausalLM` and apply_chat_template to show how to load the tokenizer, the model, and how to generate content.
57
+
58
+ Using the pipeline:
59
+ ```python
60
+ from transformers import pipeline
61
+
62
+ messages = [
63
+ {"role": "user", "content": "Faça uma planilha nutricional para uma alimentação fitness e mediterrânea com todos os dias da semana"},
64
+ ]
65
+ pipe = pipeline("text-generation", model="amadeusai/AV-FI-Qwen2.5-7B-PT-BR-Instruct")
66
+ pipe(messages)
67
+ ```
68
+ OR
69
+ ```python
70
+ from transformers import AutoModelForCausalLM, AutoTokenizer
71
+
72
+ model_name = "amadeusai/AV-FI-Qwen2.5-7B-PT-BR-Instruct"
73
+
74
+ model = AutoModelForCausalLM.from_pretrained(
75
+ model_name,
76
+ torch_dtype="auto",
77
+ device_map="auto"
78
+ )
79
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
80
+
81
+ prompt = "Faça uma planilha nutricional para uma alimentação fitness e mediterrânea com todos os dias da semana."
82
+ messages = [
83
+ {"role": "system", "content": "Você é um assistente útil."},
84
+ {"role": "user", "content": prompt}
85
+ ]
86
+ text = tokenizer.apply_chat_template(
87
+ messages,
88
+ tokenize=False,
89
+ add_generation_prompt=True
90
+ )
91
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
92
+
93
+ generated_ids = model.generate(
94
+ **model_inputs,
95
+ max_new_tokens=512
96
+ )
97
+ generated_ids = [
98
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
99
+ ]
100
+
101
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
102
+ ```
103
+ OR
104
+ ```python
105
+ from transformers import GenerationConfig, TextGenerationPipeline, AutoTokenizer, AutoModelForCausalLM
106
+ import torch
107
+
108
+ # Specify the model and tokenizer
109
+ model_id = "amadeusai/AV-FI-Qwen2.5-7B-PT-BR-Instruct"
110
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
111
+ model = AutoModelForCausalLM.from_pretrained(model_id)
112
+
113
+ # Specify the generation parameters as you like
114
+ generation_config = GenerationConfig(
115
+ **{
116
+ "do_sample": True,
117
+ "max_new_tokens": 512,
118
+ "renormalize_logits": True,
119
+ "repetition_penalty": 1.2,
120
+ "temperature": 0.1,
121
+ "top_k": 50,
122
+ "top_p": 1.0,
123
+ "use_cache": True,
124
+ }
125
+ )
126
+
127
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
128
+ generator = TextGenerationPipeline(model=model, task="text-generation", tokenizer=tokenizer, device=device)
129
+
130
+ # Generate text
131
+ prompt = "Faça uma planilha nutricional para uma alimentação fitness e mediterrânea com todos os dias da semana"
132
+ completion = generator(prompt, generation_config=generation_config)
133
+ print(completion[0]['generated_text'])
134
+ ```
135
+
136
+ #### Citation
137
+
138
+ If you find our work helpful, feel free to cite it.
139
+ ```
140
+ @misc{Amadeus AI,
141
+ title = {Amadeus Verbo: A Brazilian Portuguese large language model.},
142
+ url = {https://amadeus-ai.com},
143
+ author = {Amadeus AI},
144
+ month = {November},
145
+ year = {2024}
146
+ }
147
  ```