lbourdois commited on
Commit
99bf658
·
verified ·
1 Parent(s): 1fb6cb7

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +234 -222
README.md CHANGED
@@ -1,222 +1,234 @@
1
- ---
2
- license: apache-2.0
3
- base_model:
4
- - Qwen/Qwen2.5-14B-Instruct
5
- language:
6
- - en
7
- pipeline_tag: text-generation
8
- library_name: transformers
9
- tags:
10
- - qwen2.5
11
- - Cot
12
- - elite
13
- - calcium
14
- model-index:
15
- - name: Calcium-Opus-14B-Elite3
16
- results:
17
- - task:
18
- type: text-generation
19
- name: Text Generation
20
- dataset:
21
- name: IFEval (0-Shot)
22
- type: wis-k/instruction-following-eval
23
- split: train
24
- args:
25
- num_few_shot: 0
26
- metrics:
27
- - type: inst_level_strict_acc and prompt_level_strict_acc
28
- value: 54.28
29
- name: averaged accuracy
30
- source:
31
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
32
- name: Open LLM Leaderboard
33
- - task:
34
- type: text-generation
35
- name: Text Generation
36
- dataset:
37
- name: BBH (3-Shot)
38
- type: SaylorTwift/bbh
39
- split: test
40
- args:
41
- num_few_shot: 3
42
- metrics:
43
- - type: acc_norm
44
- value: 47.07
45
- name: normalized accuracy
46
- source:
47
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
48
- name: Open LLM Leaderboard
49
- - task:
50
- type: text-generation
51
- name: Text Generation
52
- dataset:
53
- name: MATH Lvl 5 (4-Shot)
54
- type: lighteval/MATH-Hard
55
- split: test
56
- args:
57
- num_few_shot: 4
58
- metrics:
59
- - type: exact_match
60
- value: 29.38
61
- name: exact match
62
- source:
63
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
64
- name: Open LLM Leaderboard
65
- - task:
66
- type: text-generation
67
- name: Text Generation
68
- dataset:
69
- name: GPQA (0-shot)
70
- type: Idavidrein/gpqa
71
- split: train
72
- args:
73
- num_few_shot: 0
74
- metrics:
75
- - type: acc_norm
76
- value: 16.11
77
- name: acc_norm
78
- source:
79
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
80
- name: Open LLM Leaderboard
81
- - task:
82
- type: text-generation
83
- name: Text Generation
84
- dataset:
85
- name: MuSR (0-shot)
86
- type: TAUR-Lab/MuSR
87
- args:
88
- num_few_shot: 0
89
- metrics:
90
- - type: acc_norm
91
- value: 20.13
92
- name: acc_norm
93
- source:
94
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
95
- name: Open LLM Leaderboard
96
- - task:
97
- type: text-generation
98
- name: Text Generation
99
- dataset:
100
- name: MMLU-PRO (5-shot)
101
- type: TIGER-Lab/MMLU-Pro
102
- config: main
103
- split: test
104
- args:
105
- num_few_shot: 5
106
- metrics:
107
- - type: acc
108
- value: 48.17
109
- name: accuracy
110
- source:
111
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
112
- name: Open LLM Leaderboard
113
- ---
114
-
115
- ![e3.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/W3fBEzosQE1QQ49B5a3vo.gif)
116
-
117
- # **Calcium-Opus-14B-Elite3**
118
-
119
- Calcium-Opus-14B-Elite3 is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving. It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
120
-
121
- Key improvements include:
122
-
123
- 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
124
- 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
125
- 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
126
- 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
127
- 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
128
-
129
- # **Quickstart with transformers**
130
-
131
- Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
132
-
133
- ```python
134
- from transformers import AutoModelForCausalLM, AutoTokenizer
135
-
136
- model_name = "prithivMLmods/Calcium-Opus-14B-Elite3"
137
-
138
- model = AutoModelForCausalLM.from_pretrained(
139
- model_name,
140
- torch_dtype="auto",
141
- device_map="auto"
142
- )
143
- tokenizer = AutoTokenizer.from_pretrained(model_name)
144
-
145
- prompt = "Give me a short introduction to large language model."
146
- messages = [
147
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
148
- {"role": "user", "content": prompt}
149
- ]
150
- text = tokenizer.apply_chat_template(
151
- messages,
152
- tokenize=False,
153
- add_generation_prompt=True
154
- )
155
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
156
-
157
- generated_ids = model.generate(
158
- **model_inputs,
159
- max_new_tokens=512
160
- )
161
- generated_ids = [
162
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
163
- ]
164
-
165
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
166
- ```
167
-
168
- # **Intended Use**
169
-
170
- 1. **Reasoning and Context Understanding**:\
171
- Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
172
-
173
- 2. **Mathematical Problem-Solving**:\
174
- Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
175
-
176
- 3. **Code Generation and Debugging**:\
177
- Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
178
-
179
- 4. **Structured Data Analysis**:\
180
- Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
181
-
182
- 5. **Multilingual Applications**:\
183
- Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
184
-
185
- 6. **Extended Content Generation**:\
186
- Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
187
-
188
- # **Limitations**
189
-
190
- 1. **Hardware Requirements**:\
191
- Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
192
-
193
- 2. **Potential Bias in Multilingual Outputs**:\
194
- While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
195
-
196
- 3. **Inconsistent Outputs for Creative Tasks**:\
197
- The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
198
-
199
- 4. **Limited Real-World Awareness**:\
200
- It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
201
-
202
- 5. **Error Propagation in Long-Text Outputs**:\
203
- In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
204
-
205
- 6. **Dependency on High-Quality Prompts**:\
206
- Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
207
-
208
-
209
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
210
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite3-details)!
211
- Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite3&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
212
-
213
- | Metric |Value (%)|
214
- |-------------------|--------:|
215
- |**Average** | 35.86|
216
- |IFEval (0-Shot) | 54.28|
217
- |BBH (3-Shot) | 47.07|
218
- |MATH Lvl 5 (4-Shot)| 29.38|
219
- |GPQA (0-shot) | 16.11|
220
- |MuSR (0-shot) | 20.13|
221
- |MMLU-PRO (5-shot) | 48.17|
222
-
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/Qwen2.5-14B-Instruct
5
+ language:
6
+ - zho
7
+ - eng
8
+ - fra
9
+ - spa
10
+ - por
11
+ - deu
12
+ - ita
13
+ - rus
14
+ - jpn
15
+ - kor
16
+ - vie
17
+ - tha
18
+ - ara
19
+ pipeline_tag: text-generation
20
+ library_name: transformers
21
+ tags:
22
+ - qwen2.5
23
+ - Cot
24
+ - elite
25
+ - calcium
26
+ model-index:
27
+ - name: Calcium-Opus-14B-Elite3
28
+ results:
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: IFEval (0-Shot)
34
+ type: wis-k/instruction-following-eval
35
+ split: train
36
+ args:
37
+ num_few_shot: 0
38
+ metrics:
39
+ - type: inst_level_strict_acc and prompt_level_strict_acc
40
+ value: 54.28
41
+ name: averaged accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: BBH (3-Shot)
50
+ type: SaylorTwift/bbh
51
+ split: test
52
+ args:
53
+ num_few_shot: 3
54
+ metrics:
55
+ - type: acc_norm
56
+ value: 47.07
57
+ name: normalized accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
60
+ name: Open LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: MATH Lvl 5 (4-Shot)
66
+ type: lighteval/MATH-Hard
67
+ split: test
68
+ args:
69
+ num_few_shot: 4
70
+ metrics:
71
+ - type: exact_match
72
+ value: 29.38
73
+ name: exact match
74
+ source:
75
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
76
+ name: Open LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: GPQA (0-shot)
82
+ type: Idavidrein/gpqa
83
+ split: train
84
+ args:
85
+ num_few_shot: 0
86
+ metrics:
87
+ - type: acc_norm
88
+ value: 16.11
89
+ name: acc_norm
90
+ source:
91
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: MuSR (0-shot)
98
+ type: TAUR-Lab/MuSR
99
+ args:
100
+ num_few_shot: 0
101
+ metrics:
102
+ - type: acc_norm
103
+ value: 20.13
104
+ name: acc_norm
105
+ source:
106
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
107
+ name: Open LLM Leaderboard
108
+ - task:
109
+ type: text-generation
110
+ name: Text Generation
111
+ dataset:
112
+ name: MMLU-PRO (5-shot)
113
+ type: TIGER-Lab/MMLU-Pro
114
+ config: main
115
+ split: test
116
+ args:
117
+ num_few_shot: 5
118
+ metrics:
119
+ - type: acc
120
+ value: 48.17
121
+ name: accuracy
122
+ source:
123
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite3
124
+ name: Open LLM Leaderboard
125
+ ---
126
+
127
+ ![e3.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/W3fBEzosQE1QQ49B5a3vo.gif)
128
+
129
+ # **Calcium-Opus-14B-Elite3**
130
+
131
+ Calcium-Opus-14B-Elite3 is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving. It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
132
+
133
+ Key improvements include:
134
+
135
+ 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
136
+ 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
137
+ 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
138
+ 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
139
+ 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
140
+
141
+ # **Quickstart with transformers**
142
+
143
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
144
+
145
+ ```python
146
+ from transformers import AutoModelForCausalLM, AutoTokenizer
147
+
148
+ model_name = "prithivMLmods/Calcium-Opus-14B-Elite3"
149
+
150
+ model = AutoModelForCausalLM.from_pretrained(
151
+ model_name,
152
+ torch_dtype="auto",
153
+ device_map="auto"
154
+ )
155
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
156
+
157
+ prompt = "Give me a short introduction to large language model."
158
+ messages = [
159
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
160
+ {"role": "user", "content": prompt}
161
+ ]
162
+ text = tokenizer.apply_chat_template(
163
+ messages,
164
+ tokenize=False,
165
+ add_generation_prompt=True
166
+ )
167
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
168
+
169
+ generated_ids = model.generate(
170
+ **model_inputs,
171
+ max_new_tokens=512
172
+ )
173
+ generated_ids = [
174
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
175
+ ]
176
+
177
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
178
+ ```
179
+
180
+ # **Intended Use**
181
+
182
+ 1. **Reasoning and Context Understanding**:\
183
+ Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
184
+
185
+ 2. **Mathematical Problem-Solving**:\
186
+ Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
187
+
188
+ 3. **Code Generation and Debugging**:\
189
+ Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
190
+
191
+ 4. **Structured Data Analysis**:\
192
+ Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
193
+
194
+ 5. **Multilingual Applications**:\
195
+ Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
196
+
197
+ 6. **Extended Content Generation**:\
198
+ Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
199
+
200
+ # **Limitations**
201
+
202
+ 1. **Hardware Requirements**:\
203
+ Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
204
+
205
+ 2. **Potential Bias in Multilingual Outputs**:\
206
+ While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
207
+
208
+ 3. **Inconsistent Outputs for Creative Tasks**:\
209
+ The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
210
+
211
+ 4. **Limited Real-World Awareness**:\
212
+ It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
213
+
214
+ 5. **Error Propagation in Long-Text Outputs**:\
215
+ In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
216
+
217
+ 6. **Dependency on High-Quality Prompts**:\
218
+ Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
219
+
220
+
221
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
222
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite3-details)!
223
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite3&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
224
+
225
+ | Metric |Value (%)|
226
+ |-------------------|--------:|
227
+ |**Average** | 35.86|
228
+ |IFEval (0-Shot) | 54.28|
229
+ |BBH (3-Shot) | 47.07|
230
+ |MATH Lvl 5 (4-Shot)| 29.38|
231
+ |GPQA (0-shot) | 16.11|
232
+ |MuSR (0-shot) | 20.13|
233
+ |MMLU-PRO (5-shot) | 48.17|
234
+