Files changed (1) hide show
  1. README.md +208 -196
README.md CHANGED
@@ -1,196 +1,208 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - Qwen/Qwen2.5-7B-Instruct
7
- pipeline_tag: text-generation
8
- library_name: transformers
9
- tags:
10
- - LCoT
11
- - Qwen
12
- - v2
13
- datasets:
14
- - PowerInfer/QWQ-LONGCOT-500K
15
- - AI-MO/NuminaMath-CoT
16
- - prithivMLmods/Math-Solve
17
- - amphora/QwQ-LongCoT-130K
18
- - prithivMLmods/Deepthink-Reasoning
19
- model-index:
20
- - name: QwQ-LCoT2-7B-Instruct
21
- results:
22
- - task:
23
- type: text-generation
24
- name: Text Generation
25
- dataset:
26
- name: IFEval (0-Shot)
27
- type: wis-k/instruction-following-eval
28
- split: train
29
- args:
30
- num_few_shot: 0
31
- metrics:
32
- - type: inst_level_strict_acc and prompt_level_strict_acc
33
- value: 55.76
34
- name: averaged accuracy
35
- source:
36
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
37
- name: Open LLM Leaderboard
38
- - task:
39
- type: text-generation
40
- name: Text Generation
41
- dataset:
42
- name: BBH (3-Shot)
43
- type: SaylorTwift/bbh
44
- split: test
45
- args:
46
- num_few_shot: 3
47
- metrics:
48
- - type: acc_norm
49
- value: 34.37
50
- name: normalized accuracy
51
- source:
52
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
53
- name: Open LLM Leaderboard
54
- - task:
55
- type: text-generation
56
- name: Text Generation
57
- dataset:
58
- name: MATH Lvl 5 (4-Shot)
59
- type: lighteval/MATH-Hard
60
- split: test
61
- args:
62
- num_few_shot: 4
63
- metrics:
64
- - type: exact_match
65
- value: 22.21
66
- name: exact match
67
- source:
68
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
69
- name: Open LLM Leaderboard
70
- - task:
71
- type: text-generation
72
- name: Text Generation
73
- dataset:
74
- name: GPQA (0-shot)
75
- type: Idavidrein/gpqa
76
- split: train
77
- args:
78
- num_few_shot: 0
79
- metrics:
80
- - type: acc_norm
81
- value: 6.38
82
- name: acc_norm
83
- source:
84
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
85
- name: Open LLM Leaderboard
86
- - task:
87
- type: text-generation
88
- name: Text Generation
89
- dataset:
90
- name: MuSR (0-shot)
91
- type: TAUR-Lab/MuSR
92
- args:
93
- num_few_shot: 0
94
- metrics:
95
- - type: acc_norm
96
- value: 15.75
97
- name: acc_norm
98
- source:
99
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
100
- name: Open LLM Leaderboard
101
- - task:
102
- type: text-generation
103
- name: Text Generation
104
- dataset:
105
- name: MMLU-PRO (5-shot)
106
- type: TIGER-Lab/MMLU-Pro
107
- config: main
108
- split: test
109
- args:
110
- num_few_shot: 5
111
- metrics:
112
- - type: acc
113
- value: 37.13
114
- name: accuracy
115
- source:
116
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
117
- name: Open LLM Leaderboard
118
- ---
119
-
120
-
121
-
122
- # **QwQ-LCoT2-7B-Instruct**
123
-
124
- The *QwQ-LCoT2-7B-Instruct* is a fine-tuned language model designed for advanced reasoning and instruction-following tasks. It leverages the Qwen2.5-7B base model and has been fine-tuned on the chain of thought reasoning datasets, focusing on chain-of-thought (CoT) reasoning for problems. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
125
-
126
- # **Quickstart with Transformers**
127
-
128
- Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
129
-
130
- ```python
131
- from transformers import AutoModelForCausalLM, AutoTokenizer
132
-
133
- model_name = "prithivMLmods/QwQ-LCoT2-7B-Instruct"
134
-
135
- model = AutoModelForCausalLM.from_pretrained(
136
- model_name,
137
- torch_dtype="auto",
138
- device_map="auto"
139
- )
140
- tokenizer = AutoTokenizer.from_pretrained(model_name)
141
-
142
- prompt = "How many r in strawberry."
143
- messages = [
144
- {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
145
- {"role": "user", "content": prompt}
146
- ]
147
- text = tokenizer.apply_chat_template(
148
- messages,
149
- tokenize=False,
150
- add_generation_prompt=True
151
- )
152
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
153
-
154
- generated_ids = model.generate(
155
- **model_inputs,
156
- max_new_tokens=512
157
- )
158
- generated_ids = [
159
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
160
- ]
161
-
162
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
163
- ```
164
-
165
- # **Intended Use**
166
-
167
- The QwQ-LCoT2-7B-Instruct model is designed for advanced reasoning and instruction-following tasks, with specific applications including:
168
-
169
- 1. **Instruction Following**: Providing detailed and step-by-step guidance for a wide range of user queries.
170
- 2. **Logical Reasoning**: Solving problems requiring multi-step thought processes, such as math problems or complex logic-based scenarios.
171
- 3. **Text Generation**: Crafting coherent, contextually relevant, and well-structured text in response to prompts.
172
- 4. **Problem-Solving**: Analyzing and addressing tasks that require chain-of-thought (CoT) reasoning, making it ideal for education, tutoring, and technical support.
173
- 5. **Knowledge Enhancement**: Leveraging reasoning datasets to offer deeper insights and explanations for a wide variety of topics.
174
-
175
- # **Limitations**
176
-
177
- 1. **Data Bias**: As the model is fine-tuned on specific datasets, its outputs may reflect inherent biases from the training data.
178
- 2. **Context Limitation**: Performance may degrade for tasks requiring knowledge or reasoning that significantly exceeds the model's pretraining or fine-tuning context.
179
- 3. **Complexity Ceiling**: While optimized for multi-step reasoning, exceedingly complex or abstract problems may result in incomplete or incorrect outputs.
180
- 4. **Dependency on Prompt Quality**: The quality and specificity of the user prompt heavily influence the model's responses.
181
- 5. **Non-Factual Outputs**: Despite being fine-tuned for reasoning, the model can still generate hallucinated or factually inaccurate content, particularly for niche or unverified topics.
182
- 6. **Computational Requirements**: Running the model effectively requires significant computational resources, particularly when generating long sequences or handling high-concurrency workloads.
183
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
184
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__QwQ-LCoT2-7B-Instruct-details)!
185
- Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FQwQ-LCoT2-7B-Instruct&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
186
-
187
- | Metric |Value (%)|
188
- |-------------------|--------:|
189
- |**Average** | 28.60|
190
- |IFEval (0-Shot) | 55.76|
191
- |BBH (3-Shot) | 34.37|
192
- |MATH Lvl 5 (4-Shot)| 22.21|
193
- |GPQA (0-shot) | 6.38|
194
- |MuSR (0-shot) | 15.75|
195
- |MMLU-PRO (5-shot) | 37.13|
196
-
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-7B-Instruct
19
+ pipeline_tag: text-generation
20
+ library_name: transformers
21
+ tags:
22
+ - LCoT
23
+ - Qwen
24
+ - v2
25
+ datasets:
26
+ - PowerInfer/QWQ-LONGCOT-500K
27
+ - AI-MO/NuminaMath-CoT
28
+ - prithivMLmods/Math-Solve
29
+ - amphora/QwQ-LongCoT-130K
30
+ - prithivMLmods/Deepthink-Reasoning
31
+ model-index:
32
+ - name: QwQ-LCoT2-7B-Instruct
33
+ results:
34
+ - task:
35
+ type: text-generation
36
+ name: Text Generation
37
+ dataset:
38
+ name: IFEval (0-Shot)
39
+ type: wis-k/instruction-following-eval
40
+ split: train
41
+ args:
42
+ num_few_shot: 0
43
+ metrics:
44
+ - type: inst_level_strict_acc and prompt_level_strict_acc
45
+ value: 55.76
46
+ name: averaged accuracy
47
+ source:
48
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: BBH (3-Shot)
55
+ type: SaylorTwift/bbh
56
+ split: test
57
+ args:
58
+ num_few_shot: 3
59
+ metrics:
60
+ - type: acc_norm
61
+ value: 34.37
62
+ name: normalized accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: MATH Lvl 5 (4-Shot)
71
+ type: lighteval/MATH-Hard
72
+ split: test
73
+ args:
74
+ num_few_shot: 4
75
+ metrics:
76
+ - type: exact_match
77
+ value: 22.21
78
+ name: exact match
79
+ source:
80
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: GPQA (0-shot)
87
+ type: Idavidrein/gpqa
88
+ split: train
89
+ args:
90
+ num_few_shot: 0
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 6.38
94
+ name: acc_norm
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MuSR (0-shot)
103
+ type: TAUR-Lab/MuSR
104
+ args:
105
+ num_few_shot: 0
106
+ metrics:
107
+ - type: acc_norm
108
+ value: 15.75
109
+ name: acc_norm
110
+ source:
111
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: MMLU-PRO (5-shot)
118
+ type: TIGER-Lab/MMLU-Pro
119
+ config: main
120
+ split: test
121
+ args:
122
+ num_few_shot: 5
123
+ metrics:
124
+ - type: acc
125
+ value: 37.13
126
+ name: accuracy
127
+ source:
128
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FQwQ-LCoT2-7B-Instruct
129
+ name: Open LLM Leaderboard
130
+ ---
131
+
132
+
133
+
134
+ # **QwQ-LCoT2-7B-Instruct**
135
+
136
+ The *QwQ-LCoT2-7B-Instruct* is a fine-tuned language model designed for advanced reasoning and instruction-following tasks. It leverages the Qwen2.5-7B base model and has been fine-tuned on the chain of thought reasoning datasets, focusing on chain-of-thought (CoT) reasoning for problems. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
137
+
138
+ # **Quickstart with Transformers**
139
+
140
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
141
+
142
+ ```python
143
+ from transformers import AutoModelForCausalLM, AutoTokenizer
144
+
145
+ model_name = "prithivMLmods/QwQ-LCoT2-7B-Instruct"
146
+
147
+ model = AutoModelForCausalLM.from_pretrained(
148
+ model_name,
149
+ torch_dtype="auto",
150
+ device_map="auto"
151
+ )
152
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
153
+
154
+ prompt = "How many r in strawberry."
155
+ messages = [
156
+ {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
157
+ {"role": "user", "content": prompt}
158
+ ]
159
+ text = tokenizer.apply_chat_template(
160
+ messages,
161
+ tokenize=False,
162
+ add_generation_prompt=True
163
+ )
164
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
165
+
166
+ generated_ids = model.generate(
167
+ **model_inputs,
168
+ max_new_tokens=512
169
+ )
170
+ generated_ids = [
171
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
172
+ ]
173
+
174
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
175
+ ```
176
+
177
+ # **Intended Use**
178
+
179
+ The QwQ-LCoT2-7B-Instruct model is designed for advanced reasoning and instruction-following tasks, with specific applications including:
180
+
181
+ 1. **Instruction Following**: Providing detailed and step-by-step guidance for a wide range of user queries.
182
+ 2. **Logical Reasoning**: Solving problems requiring multi-step thought processes, such as math problems or complex logic-based scenarios.
183
+ 3. **Text Generation**: Crafting coherent, contextually relevant, and well-structured text in response to prompts.
184
+ 4. **Problem-Solving**: Analyzing and addressing tasks that require chain-of-thought (CoT) reasoning, making it ideal for education, tutoring, and technical support.
185
+ 5. **Knowledge Enhancement**: Leveraging reasoning datasets to offer deeper insights and explanations for a wide variety of topics.
186
+
187
+ # **Limitations**
188
+
189
+ 1. **Data Bias**: As the model is fine-tuned on specific datasets, its outputs may reflect inherent biases from the training data.
190
+ 2. **Context Limitation**: Performance may degrade for tasks requiring knowledge or reasoning that significantly exceeds the model's pretraining or fine-tuning context.
191
+ 3. **Complexity Ceiling**: While optimized for multi-step reasoning, exceedingly complex or abstract problems may result in incomplete or incorrect outputs.
192
+ 4. **Dependency on Prompt Quality**: The quality and specificity of the user prompt heavily influence the model's responses.
193
+ 5. **Non-Factual Outputs**: Despite being fine-tuned for reasoning, the model can still generate hallucinated or factually inaccurate content, particularly for niche or unverified topics.
194
+ 6. **Computational Requirements**: Running the model effectively requires significant computational resources, particularly when generating long sequences or handling high-concurrency workloads.
195
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
196
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__QwQ-LCoT2-7B-Instruct-details)!
197
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FQwQ-LCoT2-7B-Instruct&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
198
+
199
+ | Metric |Value (%)|
200
+ |-------------------|--------:|
201
+ |**Average** | 28.60|
202
+ |IFEval (0-Shot) | 55.76|
203
+ |BBH (3-Shot) | 34.37|
204
+ |MATH Lvl 5 (4-Shot)| 22.21|
205
+ |GPQA (0-shot) | 6.38|
206
+ |MuSR (0-shot) | 15.75|
207
+ |MMLU-PRO (5-shot) | 37.13|
208
+