Files changed (1) hide show
  1. README.md +131 -117
README.md CHANGED
@@ -1,118 +1,132 @@
1
- ---
2
- library_name: peft
3
- base_model: Qwen/Qwen2.5-7B-Instruct
4
- pipeline_tag: text-generation
5
- license: apache-2.0
6
- ---
7
-
8
- # Model Card for Model ID
9
-
10
- <!-- Provide a quick summary of what the model is/does. -->
11
-
12
-
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
-
18
- <!-- Provide a longer summary of what this model is. -->
19
-
20
-
21
-
22
- - **Developed by: hack337**
23
- - **Model type: qwen2**
24
- - **Finetuned from model: Qwen/Qwen2.5-7B-Instruct**
25
-
26
- ### Model Sources [optional]
27
-
28
- <!-- Provide the basic links for the model. -->
29
-
30
- - **Repository: https://huggingface.co/Hack337/WavGPT-2**
31
- - **Demo (WavGPT-1.0): https://huggingface.co/spaces/Hack337/WavGPT**
32
-
33
- ## How to Get Started with the Model
34
-
35
- Use the code below to get started with the model.
36
-
37
- ```python
38
- from transformers import AutoModelForCausalLM, AutoTokenizer
39
- device = "cuda" # the device to load the model onto
40
-
41
- model = AutoModelForCausalLM.from_pretrained(
42
- "Hack337/WavGPT-2",
43
- torch_dtype="auto",
44
- device_map="auto"
45
- )
46
- tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")
47
-
48
- prompt = "Give me a short introduction to large language model."
49
- messages = [
50
- {"role": "system", "content": "Вы очень полезный помощник."},
51
- {"role": "user", "content": prompt}
52
- ]
53
- text = tokenizer.apply_chat_template(
54
- messages,
55
- tokenize=False,
56
- add_generation_prompt=True
57
- )
58
- model_inputs = tokenizer([text], return_tensors="pt").to(device)
59
-
60
- generated_ids = model.generate(
61
- model_inputs.input_ids,
62
- max_new_tokens=512
63
- )
64
- generated_ids = [
65
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
66
- ]
67
-
68
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
69
-
70
- ```
71
-
72
- Use the code below to get started with the model using NPU.
73
-
74
- ```python
75
- from transformers import AutoTokenizer, TextStreamer
76
- from intel_npu_acceleration_library import NPUModelForCausalLM
77
- import torch
78
-
79
- # Load the NPU-optimized model without LoRA
80
- model = NPUModelForCausalLM.from_pretrained(
81
- "Hack337/WavGPT-2",
82
- use_cache=True,
83
- dtype=torch.float16 # Use float16 for the NPU
84
- ).eval()
85
-
86
- # Load the tokenizer
87
- tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")
88
- tokenizer.pad_token_id = tokenizer.eos_token_id
89
- streamer = TextStreamer(tokenizer, skip_special_tokens=True)
90
-
91
- # Prompt handling
92
- prompt = "Give me a short introduction to large language model."
93
- messages = [
94
- {"role": "system", "content": "Вы очень полезный помощник."},
95
- {"role": "user", "content": prompt}
96
- ]
97
-
98
- # Convert to a text format compatible with the model
99
- text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
100
- prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")
101
-
102
- # Generation configuration
103
- generation_kwargs = dict(
104
- input_ids=prefix,
105
- streamer=streamer,
106
- do_sample=True,
107
- top_k=50,
108
- top_p=0.9,
109
- max_new_tokens=512,
110
- )
111
-
112
- # Run inference on the NPU
113
- print("Run inference")
114
- _ = model.generate(**generation_kwargs)
115
-
116
- ```
117
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  - PEFT 0.11.1
 
1
+ ---
2
+ library_name: peft
3
+ base_model: Qwen/Qwen2.5-7B-Instruct
4
+ pipeline_tag: text-generation
5
+ license: apache-2.0
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ ---
21
+
22
+ # Model Card for Model ID
23
+
24
+ <!-- Provide a quick summary of what the model is/does. -->
25
+
26
+
27
+
28
+ ## Model Details
29
+
30
+ ### Model Description
31
+
32
+ <!-- Provide a longer summary of what this model is. -->
33
+
34
+
35
+
36
+ - **Developed by: hack337**
37
+ - **Model type: qwen2**
38
+ - **Finetuned from model: Qwen/Qwen2.5-7B-Instruct**
39
+
40
+ ### Model Sources [optional]
41
+
42
+ <!-- Provide the basic links for the model. -->
43
+
44
+ - **Repository: https://huggingface.co/Hack337/WavGPT-2**
45
+ - **Demo (WavGPT-1.0): https://huggingface.co/spaces/Hack337/WavGPT**
46
+
47
+ ## How to Get Started with the Model
48
+
49
+ Use the code below to get started with the model.
50
+
51
+ ```python
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer
53
+ device = "cuda" # the device to load the model onto
54
+
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ "Hack337/WavGPT-2",
57
+ torch_dtype="auto",
58
+ device_map="auto"
59
+ )
60
+ tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")
61
+
62
+ prompt = "Give me a short introduction to large language model."
63
+ messages = [
64
+ {"role": "system", "content": "Вы очень полезный помощник."},
65
+ {"role": "user", "content": prompt}
66
+ ]
67
+ text = tokenizer.apply_chat_template(
68
+ messages,
69
+ tokenize=False,
70
+ add_generation_prompt=True
71
+ )
72
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
73
+
74
+ generated_ids = model.generate(
75
+ model_inputs.input_ids,
76
+ max_new_tokens=512
77
+ )
78
+ generated_ids = [
79
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
80
+ ]
81
+
82
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
83
+
84
+ ```
85
+
86
+ Use the code below to get started with the model using NPU.
87
+
88
+ ```python
89
+ from transformers import AutoTokenizer, TextStreamer
90
+ from intel_npu_acceleration_library import NPUModelForCausalLM
91
+ import torch
92
+
93
+ # Load the NPU-optimized model without LoRA
94
+ model = NPUModelForCausalLM.from_pretrained(
95
+ "Hack337/WavGPT-2",
96
+ use_cache=True,
97
+ dtype=torch.float16 # Use float16 for the NPU
98
+ ).eval()
99
+
100
+ # Load the tokenizer
101
+ tokenizer = AutoTokenizer.from_pretrained("Hack337/WavGPT-2")
102
+ tokenizer.pad_token_id = tokenizer.eos_token_id
103
+ streamer = TextStreamer(tokenizer, skip_special_tokens=True)
104
+
105
+ # Prompt handling
106
+ prompt = "Give me a short introduction to large language model."
107
+ messages = [
108
+ {"role": "system", "content": "Вы очень полезный помощник."},
109
+ {"role": "user", "content": prompt}
110
+ ]
111
+
112
+ # Convert to a text format compatible with the model
113
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
114
+ prefix = tokenizer([text], return_tensors="pt")["input_ids"].to("npu")
115
+
116
+ # Generation configuration
117
+ generation_kwargs = dict(
118
+ input_ids=prefix,
119
+ streamer=streamer,
120
+ do_sample=True,
121
+ top_k=50,
122
+ top_p=0.9,
123
+ max_new_tokens=512,
124
+ )
125
+
126
+ # Run inference on the NPU
127
+ print("Run inference")
128
+ _ = model.generate(**generation_kwargs)
129
+
130
+ ```
131
+
132
  - PEFT 0.11.1