lbourdois commited on
Commit
866d853
·
verified ·
1 Parent(s): ee367a4

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +236 -230
README.md CHANGED
@@ -1,231 +1,237 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - Qwen/Qwen2.5-14B-Instruct
7
- pipeline_tag: text-generation
8
- library_name: transformers
9
- tags:
10
- - opus
11
- - elite
12
- - 14B
13
- - calcium
14
- - qwq
15
- - trl
16
- model-index:
17
- - name: Calcium-Opus-14B-Elite
18
- results:
19
- - task:
20
- type: text-generation
21
- name: Text Generation
22
- dataset:
23
- name: IFEval (0-Shot)
24
- type: wis-k/instruction-following-eval
25
- split: train
26
- args:
27
- num_few_shot: 0
28
- metrics:
29
- - type: inst_level_strict_acc and prompt_level_strict_acc
30
- value: 60.64
31
- name: averaged accuracy
32
- source:
33
- url: >-
34
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
35
- name: Open LLM Leaderboard
36
- - task:
37
- type: text-generation
38
- name: Text Generation
39
- dataset:
40
- name: BBH (3-Shot)
41
- type: SaylorTwift/bbh
42
- split: test
43
- args:
44
- num_few_shot: 3
45
- metrics:
46
- - type: acc_norm
47
- value: 46.53
48
- name: normalized accuracy
49
- source:
50
- url: >-
51
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
52
- name: Open LLM Leaderboard
53
- - task:
54
- type: text-generation
55
- name: Text Generation
56
- dataset:
57
- name: MATH Lvl 5 (4-Shot)
58
- type: lighteval/MATH-Hard
59
- split: test
60
- args:
61
- num_few_shot: 4
62
- metrics:
63
- - type: exact_match
64
- value: 37.08
65
- name: exact match
66
- source:
67
- url: >-
68
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
69
- name: Open LLM Leaderboard
70
- - task:
71
- type: text-generation
72
- name: Text Generation
73
- dataset:
74
- name: GPQA (0-shot)
75
- type: Idavidrein/gpqa
76
- split: train
77
- args:
78
- num_few_shot: 0
79
- metrics:
80
- - type: acc_norm
81
- value: 16.44
82
- name: acc_norm
83
- source:
84
- url: >-
85
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
86
- name: Open LLM Leaderboard
87
- - task:
88
- type: text-generation
89
- name: Text Generation
90
- dataset:
91
- name: MuSR (0-shot)
92
- type: TAUR-Lab/MuSR
93
- args:
94
- num_few_shot: 0
95
- metrics:
96
- - type: acc_norm
97
- value: 20.95
98
- name: acc_norm
99
- source:
100
- url: >-
101
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
102
- name: Open LLM Leaderboard
103
- - task:
104
- type: text-generation
105
- name: Text Generation
106
- dataset:
107
- name: MMLU-PRO (5-shot)
108
- type: TIGER-Lab/MMLU-Pro
109
- config: main
110
- split: test
111
- args:
112
- num_few_shot: 5
113
- metrics:
114
- - type: acc
115
- value: 47.85
116
- name: accuracy
117
- source:
118
- url: >-
119
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
120
- name: Open LLM Leaderboard
121
- ---
122
- ![opus.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BELYApcX2oNMRsOW6nIyR.gif)
123
-
124
- # **Calcium-Opus-14B-Elite**
125
-
126
- Calcium-Opus-14B-Elite is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving.It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
127
-
128
- # **Open-Evals**
129
-
130
- | Rank | Model | Average | IFEval | BBH | MATH | GPQA | MUSR | MMLU | CO₂ Consumption | Dated |
131
- |------|-----------------------------------------|---------|--------|-------|-------|-------|-------|-------|-----------------|---------|
132
- | 108 | [prithivMLmods/Calcium-Opus-14B-Elite](https://huggingface.co/prithivMLmods/Calcium-Opus-14B-Elite) | 38.38 | 60.52 | 46.93 | 37.69 | 16.55 | 20.78 | 47.80 | 2.01 | 01/23/2025
133
-
134
-
135
-
136
- Key improvements include:
137
- 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
138
- 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
139
- 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
140
- 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
141
- 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
142
-
143
- # **Quickstart with transformers**
144
-
145
- Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
146
-
147
- ```python
148
- from transformers import AutoModelForCausalLM, AutoTokenizer
149
-
150
- model_name = "prithivMLmods/Calcium-Opus-14B-Elite"
151
-
152
- model = AutoModelForCausalLM.from_pretrained(
153
- model_name,
154
- torch_dtype="auto",
155
- device_map="auto"
156
- )
157
- tokenizer = AutoTokenizer.from_pretrained(model_name)
158
-
159
- prompt = "Give me a short introduction to large language model."
160
- messages = [
161
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
162
- {"role": "user", "content": prompt}
163
- ]
164
- text = tokenizer.apply_chat_template(
165
- messages,
166
- tokenize=False,
167
- add_generation_prompt=True
168
- )
169
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
170
-
171
- generated_ids = model.generate(
172
- **model_inputs,
173
- max_new_tokens=512
174
- )
175
- generated_ids = [
176
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
177
- ]
178
-
179
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
180
- ```
181
- # **Intended Use**
182
- 1. **Reasoning and Context Understanding**:
183
- Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
184
-
185
- 2. **Mathematical Problem-Solving**:
186
- Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
187
-
188
- 3. **Code Generation and Debugging**:
189
- Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
190
-
191
- 4. **Structured Data Analysis**:
192
- Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
193
-
194
- 5. **Multilingual Applications**:
195
- Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
196
-
197
- 6. **Extended Content Generation**:
198
- Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
199
-
200
- # **Limitations**
201
- 1. **Hardware Requirements**:
202
- Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
203
-
204
- 2. **Potential Bias in Multilingual Outputs**:
205
- While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
206
-
207
- 3. **Inconsistent Outputs for Creative Tasks**:
208
- The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
209
-
210
- 4. **Limited Real-World Awareness**:
211
- It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
212
-
213
- 5. **Error Propagation in Long-Text Outputs**:
214
- In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
215
-
216
- 6. **Dependency on High-Quality Prompts**:
217
- Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
218
-
219
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
220
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite-details)!
221
- Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
222
-
223
- | Metric |Value (%)|
224
- |-------------------|--------:|
225
- |**Average** | 40.08|
226
- |IFEval (0-Shot) | 60.52|
227
- |BBH (3-Shot) | 46.93|
228
- |MATH Lvl 5 (4-Shot)| 47.89|
229
- |GPQA (0-shot) | 16.55|
230
- |MuSR (0-shot) | 20.78|
 
 
 
 
 
 
231
  |MMLU-PRO (5-shot) | 47.80|
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-14B-Instruct
19
+ pipeline_tag: text-generation
20
+ library_name: transformers
21
+ tags:
22
+ - opus
23
+ - elite
24
+ - 14B
25
+ - calcium
26
+ - qwq
27
+ - trl
28
+ model-index:
29
+ - name: Calcium-Opus-14B-Elite
30
+ results:
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: IFEval (0-Shot)
36
+ type: wis-k/instruction-following-eval
37
+ split: train
38
+ args:
39
+ num_few_shot: 0
40
+ metrics:
41
+ - type: inst_level_strict_acc and prompt_level_strict_acc
42
+ value: 60.64
43
+ name: averaged accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
46
+ name: Open LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: BBH (3-Shot)
52
+ type: SaylorTwift/bbh
53
+ split: test
54
+ args:
55
+ num_few_shot: 3
56
+ metrics:
57
+ - type: acc_norm
58
+ value: 46.53
59
+ name: normalized accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
62
+ name: Open LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: MATH Lvl 5 (4-Shot)
68
+ type: lighteval/MATH-Hard
69
+ split: test
70
+ args:
71
+ num_few_shot: 4
72
+ metrics:
73
+ - type: exact_match
74
+ value: 37.08
75
+ name: exact match
76
+ source:
77
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: GPQA (0-shot)
84
+ type: Idavidrein/gpqa
85
+ split: train
86
+ args:
87
+ num_few_shot: 0
88
+ metrics:
89
+ - type: acc_norm
90
+ value: 16.44
91
+ name: acc_norm
92
+ source:
93
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: MuSR (0-shot)
100
+ type: TAUR-Lab/MuSR
101
+ args:
102
+ num_few_shot: 0
103
+ metrics:
104
+ - type: acc_norm
105
+ value: 20.95
106
+ name: acc_norm
107
+ source:
108
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
109
+ name: Open LLM Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ name: Text Generation
113
+ dataset:
114
+ name: MMLU-PRO (5-shot)
115
+ type: TIGER-Lab/MMLU-Pro
116
+ config: main
117
+ split: test
118
+ args:
119
+ num_few_shot: 5
120
+ metrics:
121
+ - type: acc
122
+ value: 47.85
123
+ name: accuracy
124
+ source:
125
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
126
+ name: Open LLM Leaderboard
127
+ ---
128
+ ![opus.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BELYApcX2oNMRsOW6nIyR.gif)
129
+
130
+ # **Calcium-Opus-14B-Elite**
131
+
132
+ Calcium-Opus-14B-Elite is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving.It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
133
+
134
+ # **Open-Evals**
135
+
136
+ | Rank | Model | Average | IFEval | BBH | MATH | GPQA | MUSR | MMLU | CO₂ Consumption | Dated |
137
+ |------|-----------------------------------------|---------|--------|-------|-------|-------|-------|-------|-----------------|---------|
138
+ | 108 | [prithivMLmods/Calcium-Opus-14B-Elite](https://huggingface.co/prithivMLmods/Calcium-Opus-14B-Elite) | 38.38 | 60.52 | 46.93 | 37.69 | 16.55 | 20.78 | 47.80 | 2.01 | 01/23/2025
139
+
140
+
141
+
142
+ Key improvements include:
143
+ 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
144
+ 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
145
+ 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
146
+ 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
147
+ 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
148
+
149
+ # **Quickstart with transformers**
150
+
151
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
152
+
153
+ ```python
154
+ from transformers import AutoModelForCausalLM, AutoTokenizer
155
+
156
+ model_name = "prithivMLmods/Calcium-Opus-14B-Elite"
157
+
158
+ model = AutoModelForCausalLM.from_pretrained(
159
+ model_name,
160
+ torch_dtype="auto",
161
+ device_map="auto"
162
+ )
163
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
164
+
165
+ prompt = "Give me a short introduction to large language model."
166
+ messages = [
167
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
168
+ {"role": "user", "content": prompt}
169
+ ]
170
+ text = tokenizer.apply_chat_template(
171
+ messages,
172
+ tokenize=False,
173
+ add_generation_prompt=True
174
+ )
175
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
176
+
177
+ generated_ids = model.generate(
178
+ **model_inputs,
179
+ max_new_tokens=512
180
+ )
181
+ generated_ids = [
182
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
183
+ ]
184
+
185
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
186
+ ```
187
+ # **Intended Use**
188
+ 1. **Reasoning and Context Understanding**:
189
+ Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
190
+
191
+ 2. **Mathematical Problem-Solving**:
192
+ Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
193
+
194
+ 3. **Code Generation and Debugging**:
195
+ Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
196
+
197
+ 4. **Structured Data Analysis**:
198
+ Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
199
+
200
+ 5. **Multilingual Applications**:
201
+ Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
202
+
203
+ 6. **Extended Content Generation**:
204
+ Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
205
+
206
+ # **Limitations**
207
+ 1. **Hardware Requirements**:
208
+ Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
209
+
210
+ 2. **Potential Bias in Multilingual Outputs**:
211
+ While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
212
+
213
+ 3. **Inconsistent Outputs for Creative Tasks**:
214
+ The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
215
+
216
+ 4. **Limited Real-World Awareness**:
217
+ It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
218
+
219
+ 5. **Error Propagation in Long-Text Outputs**:
220
+ In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
221
+
222
+ 6. **Dependency on High-Quality Prompts**:
223
+ Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
224
+
225
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
226
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite-details)!
227
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
228
+
229
+ | Metric |Value (%)|
230
+ |-------------------|--------:|
231
+ |**Average** | 40.08|
232
+ |IFEval (0-Shot) | 60.52|
233
+ |BBH (3-Shot) | 46.93|
234
+ |MATH Lvl 5 (4-Shot)| 47.89|
235
+ |GPQA (0-shot) | 16.55|
236
+ |MuSR (0-shot) | 20.78|
237
  |MMLU-PRO (5-shot) | 47.80|