File size: 15,380 Bytes
167ca64
 
0410c05
 
 
 
 
 
 
 
167ca64
 
a0dd644
167ca64
ed0fc4c
 
 
 
087ae6a
ed0fc4c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
087ae6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
library_name: transformers
license: mit
datasets:
- ethicalabs/Kurtis-E1-SFT
language:
- en
base_model:
- Qwen/Qwen2.5-3B-Instruct
pipeline_tag: text-generation
---

# Model Card for Kurtis-E1.1-Qwen2.5-3B-Instruct

Kurtis E1.1 fine-tuned with [flower](https://flower.ai/)

## Eval Results

Evaluation tasks were performed with the [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) on an NVIDIA A40.


### hellaswag

```
lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks hellaswag --device cuda:0 --batch_size 8
```

|  Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag|      1|none  |     0|acc     |↑  |0.5555|±  |0.0050|
|         |       |none  |     0|acc_norm|↑  |0.7412|±  |0.0044|

### arc_easy

```
lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks arc_easy --device cuda:0 --batch_size 8
```

| Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_easy|      1|none  |     0|acc     |↑  |0.7710|±  |0.0086|
|        |       |none  |     0|acc_norm|↑  |0.6789|±  |0.0096|


### arc_challenge

```
lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks arc_challenge --device cuda:0 --batch_size 8
```

|    Tasks    |Version|Filter|n-shot| Metric |   |Value|   |Stderr|
|-------------|------:|------|-----:|--------|---|----:|---|-----:|
|arc_challenge|      1|none  |     0|acc     |↑  |0.436|±  |0.0145|
|             |       |none  |     0|acc_norm|↑  |0.448|±  |0.0145|

### mmlu

```
lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct  --tasks mmlu --device cuda:0 --batch_size 8
```

|                 Tasks                 |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
|mmlu                                   |      2|none  |      |acc   |↑  |0.6522|±  |0.0038|
| - humanities                          |      2|none  |      |acc   |↑  |0.5734|±  |0.0066|
|  - formal_logic                       |      1|none  |     0|acc   |↑  |0.4603|±  |0.0446|
|  - high_school_european_history       |      1|none  |     0|acc   |↑  |0.7939|±  |0.0316|
|  - high_school_us_history             |      1|none  |     0|acc   |↑  |0.8333|±  |0.0262|
|  - high_school_world_history          |      1|none  |     0|acc   |↑  |0.8397|±  |0.0239|
|  - international_law                  |      1|none  |     0|acc   |↑  |0.7769|±  |0.0380|
|  - jurisprudence                      |      1|none  |     0|acc   |↑  |0.7963|±  |0.0389|
|  - logical_fallacies                  |      1|none  |     0|acc   |↑  |0.7975|±  |0.0316|
|  - moral_disputes                     |      1|none  |     0|acc   |↑  |0.6850|±  |0.0250|
|  - moral_scenarios                    |      1|none  |     0|acc   |↑  |0.2905|±  |0.0152|
|  - philosophy                         |      1|none  |     0|acc   |↑  |0.7106|±  |0.0258|
|  - prehistory                         |      1|none  |     0|acc   |↑  |0.7438|±  |0.0243|
|  - professional_law                   |      1|none  |     0|acc   |↑  |0.4759|±  |0.0128|
|  - world_religions                    |      1|none  |     0|acc   |↑  |0.8246|±  |0.0292|
| - other                               |      2|none  |      |acc   |↑  |0.7087|±  |0.0079|
|  - business_ethics                    |      1|none  |     0|acc   |↑  |0.7300|±  |0.0446|
|  - clinical_knowledge                 |      1|none  |     0|acc   |↑  |0.7321|±  |0.0273|
|  - college_medicine                   |      1|none  |     0|acc   |↑  |0.6705|±  |0.0358|
|  - global_facts                       |      1|none  |     0|acc   |↑  |0.3900|±  |0.0490|
|  - human_aging                        |      1|none  |     0|acc   |↑  |0.7130|±  |0.0304|
|  - management                         |      1|none  |     0|acc   |↑  |0.7961|±  |0.0399|
|  - marketing                          |      1|none  |     0|acc   |↑  |0.8803|±  |0.0213|
|  - medical_genetics                   |      1|none  |     0|acc   |↑  |0.7600|±  |0.0429|
|  - miscellaneous                      |      1|none  |     0|acc   |↑  |0.7957|±  |0.0144|
|  - nutrition                          |      1|none  |     0|acc   |↑  |0.7353|±  |0.0253|
|  - professional_accounting            |      1|none  |     0|acc   |↑  |0.5426|±  |0.0297|
|  - professional_medicine              |      1|none  |     0|acc   |↑  |0.6434|±  |0.0291|
|  - virology                           |      1|none  |     0|acc   |↑  |0.4880|±  |0.0389|
| - social sciences                     |      2|none  |      |acc   |↑  |0.7618|±  |0.0076|
|  - econometrics                       |      1|none  |     0|acc   |↑  |0.5439|±  |0.0469|
|  - high_school_geography              |      1|none  |     0|acc   |↑  |0.7677|±  |0.0301|
|  - high_school_government_and_politics|      1|none  |     0|acc   |↑  |0.8860|±  |0.0229|
|  - high_school_macroeconomics         |      1|none  |     0|acc   |↑  |0.6949|±  |0.0233|
|  - high_school_microeconomics         |      1|none  |     0|acc   |↑  |0.7773|±  |0.0270|
|  - high_school_psychology             |      1|none  |     0|acc   |↑  |0.8477|±  |0.0154|
|  - human_sexuality                    |      1|none  |     0|acc   |↑  |0.7786|±  |0.0364|
|  - professional_psychology            |      1|none  |     0|acc   |↑  |0.7075|±  |0.0184|
|  - public_relations                   |      1|none  |     0|acc   |↑  |0.6818|±  |0.0446|
|  - security_studies                   |      1|none  |     0|acc   |↑  |0.7224|±  |0.0287|
|  - sociology                          |      1|none  |     0|acc   |↑  |0.8458|±  |0.0255|
|  - us_foreign_policy                  |      1|none  |     0|acc   |↑  |0.8400|±  |0.0368|
| - stem                                |      2|none  |      |acc   |↑  |0.6070|±  |0.0085|
|  - abstract_algebra                   |      1|none  |     0|acc   |↑  |0.4700|±  |0.0502|
|  - anatomy                            |      1|none  |     0|acc   |↑  |0.6667|±  |0.0407|
|  - astronomy                          |      1|none  |     0|acc   |↑  |0.6776|±  |0.0380|
|  - college_biology                    |      1|none  |     0|acc   |↑  |0.7222|±  |0.0375|
|  - college_chemistry                  |      1|none  |     0|acc   |↑  |0.5000|±  |0.0503|
|  - college_computer_science           |      1|none  |     0|acc   |↑  |0.6000|±  |0.0492|
|  - college_mathematics                |      1|none  |     0|acc   |↑  |0.3400|±  |0.0476|
|  - college_physics                    |      1|none  |     0|acc   |↑  |0.4902|±  |0.0497|
|  - computer_security                  |      1|none  |     0|acc   |↑  |0.7000|±  |0.0461|
|  - conceptual_physics                 |      1|none  |     0|acc   |↑  |0.6468|±  |0.0312|
|  - electrical_engineering             |      1|none  |     0|acc   |↑  |0.6690|±  |0.0392|
|  - elementary_mathematics             |      1|none  |     0|acc   |↑  |0.5979|±  |0.0253|
|  - high_school_biology                |      1|none  |     0|acc   |↑  |0.8129|±  |0.0222|
|  - high_school_chemistry              |      1|none  |     0|acc   |↑  |0.5813|±  |0.0347|
|  - high_school_computer_science       |      1|none  |     0|acc   |↑  |0.7800|±  |0.0416|
|  - high_school_mathematics            |      1|none  |     0|acc   |↑  |0.5037|±  |0.0305|
|  - high_school_physics                |      1|none  |     0|acc   |↑  |0.4437|±  |0.0406|
|  - high_school_statistics             |      1|none  |     0|acc   |↑  |0.5972|±  |0.0334|
|  - machine_learning                   |      1|none  |     0|acc   |↑  |0.4554|±  |0.0473|

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.6522|±  |0.0038|
| - humanities     |      2|none  |      |acc   |↑  |0.5734|±  |0.0066|
| - other          |      2|none  |      |acc   |↑  |0.7087|±  |0.0079|
| - social sciences|      2|none  |      |acc   |↑  |0.7618|±  |0.0076|
| - stem           |      2|none  |      |acc   |↑  |0.6070|±  |0.0085|

### mmlu (5-shot)

```
lm_eval   --model hf   --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen2.5-3B-Instruct   --tasks mmlu   --device cuda:0   --batch_size 8 --num_fewshot 5
```

|                 Tasks                 |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
|mmlu                                   |      2|none  |      |acc   |↑  |0.6629|±  |0.0038|
| - humanities                          |      2|none  |      |acc   |↑  |0.5862|±  |0.0067|
|  - formal_logic                       |      1|none  |     5|acc   |↑  |0.4683|±  |0.0446|
|  - high_school_european_history       |      1|none  |     5|acc   |↑  |0.7818|±  |0.0323|
|  - high_school_us_history             |      1|none  |     5|acc   |↑  |0.8284|±  |0.0265|
|  - high_school_world_history          |      1|none  |     5|acc   |↑  |0.8692|±  |0.0219|
|  - international_law                  |      1|none  |     5|acc   |↑  |0.7769|±  |0.0380|
|  - jurisprudence                      |      1|none  |     5|acc   |↑  |0.7963|±  |0.0389|
|  - logical_fallacies                  |      1|none  |     5|acc   |↑  |0.8098|±  |0.0308|
|  - moral_disputes                     |      1|none  |     5|acc   |↑  |0.7110|±  |0.0244|
|  - moral_scenarios                    |      1|none  |     5|acc   |↑  |0.3464|±  |0.0159|
|  - philosophy                         |      1|none  |     5|acc   |↑  |0.7042|±  |0.0259|
|  - prehistory                         |      1|none  |     5|acc   |↑  |0.7284|±  |0.0247|
|  - professional_law                   |      1|none  |     5|acc   |↑  |0.4759|±  |0.0128|
|  - world_religions                    |      1|none  |     5|acc   |↑  |0.8304|±  |0.0288|
| - other                               |      2|none  |      |acc   |↑  |0.7171|±  |0.0078|
|  - business_ethics                    |      1|none  |     5|acc   |↑  |0.7400|±  |0.0441|
|  - clinical_knowledge                 |      1|none  |     5|acc   |↑  |0.7321|±  |0.0273|
|  - college_medicine                   |      1|none  |     5|acc   |↑  |0.6647|±  |0.0360|
|  - global_facts                       |      1|none  |     5|acc   |↑  |0.4100|±  |0.0494|
|  - human_aging                        |      1|none  |     5|acc   |↑  |0.7220|±  |0.0301|
|  - management                         |      1|none  |     5|acc   |↑  |0.7864|±  |0.0406|
|  - marketing                          |      1|none  |     5|acc   |↑  |0.8889|±  |0.0206|
|  - medical_genetics                   |      1|none  |     5|acc   |↑  |0.7900|±  |0.0409|
|  - miscellaneous                      |      1|none  |     5|acc   |↑  |0.7957|±  |0.0144|
|  - nutrition                          |      1|none  |     5|acc   |↑  |0.7680|±  |0.0242|
|  - professional_accounting            |      1|none  |     5|acc   |↑  |0.5532|±  |0.0297|
|  - professional_medicine              |      1|none  |     5|acc   |↑  |0.6471|±  |0.0290|
|  - virology                           |      1|none  |     5|acc   |↑  |0.5120|±  |0.0389|
| - social sciences                     |      2|none  |      |acc   |↑  |0.7735|±  |0.0075|
|  - econometrics                       |      1|none  |     5|acc   |↑  |0.5877|±  |0.0463|
|  - high_school_geography              |      1|none  |     5|acc   |↑  |0.7828|±  |0.0294|
|  - high_school_government_and_politics|      1|none  |     5|acc   |↑  |0.8756|±  |0.0238|
|  - high_school_macroeconomics         |      1|none  |     5|acc   |↑  |0.7051|±  |0.0231|
|  - high_school_microeconomics         |      1|none  |     5|acc   |↑  |0.7773|±  |0.0270|
|  - high_school_psychology             |      1|none  |     5|acc   |↑  |0.8550|±  |0.0151|
|  - human_sexuality                    |      1|none  |     5|acc   |↑  |0.8092|±  |0.0345|
|  - professional_psychology            |      1|none  |     5|acc   |↑  |0.7288|±  |0.0180|
|  - public_relations                   |      1|none  |     5|acc   |↑  |0.6909|±  |0.0443|
|  - security_studies                   |      1|none  |     5|acc   |↑  |0.7551|±  |0.0275|
|  - sociology                          |      1|none  |     5|acc   |↑  |0.8308|±  |0.0265|
|  - us_foreign_policy                  |      1|none  |     5|acc   |↑  |0.8300|±  |0.0378|
| - stem                                |      2|none  |      |acc   |↑  |0.6159|±  |0.0084|
|  - abstract_algebra                   |      1|none  |     5|acc   |↑  |0.5000|±  |0.0503|
|  - anatomy                            |      1|none  |     5|acc   |↑  |0.6222|±  |0.0419|
|  - astronomy                          |      1|none  |     5|acc   |↑  |0.7500|±  |0.0352|
|  - college_biology                    |      1|none  |     5|acc   |↑  |0.7083|±  |0.0380|
|  - college_chemistry                  |      1|none  |     5|acc   |↑  |0.4700|±  |0.0502|
|  - college_computer_science           |      1|none  |     5|acc   |↑  |0.6200|±  |0.0488|
|  - college_mathematics                |      1|none  |     5|acc   |↑  |0.4000|±  |0.0492|
|  - college_physics                    |      1|none  |     5|acc   |↑  |0.4902|±  |0.0497|
|  - computer_security                  |      1|none  |     5|acc   |↑  |0.8200|±  |0.0386|
|  - conceptual_physics                 |      1|none  |     5|acc   |↑  |0.6383|±  |0.0314|
|  - electrical_engineering             |      1|none  |     5|acc   |↑  |0.6483|±  |0.0398|
|  - elementary_mathematics             |      1|none  |     5|acc   |↑  |0.5820|±  |0.0254|
|  - high_school_biology                |      1|none  |     5|acc   |↑  |0.8161|±  |0.0220|
|  - high_school_chemistry              |      1|none  |     5|acc   |↑  |0.6059|±  |0.0344|
|  - high_school_computer_science       |      1|none  |     5|acc   |↑  |0.7500|±  |0.0435|
|  - high_school_mathematics            |      1|none  |     5|acc   |↑  |0.4926|±  |0.0305|
|  - high_school_physics                |      1|none  |     5|acc   |↑  |0.4702|±  |0.0408|
|  - high_school_statistics             |      1|none  |     5|acc   |↑  |0.6343|±  |0.0328|
|  - machine_learning                   |      1|none  |     5|acc   |↑  |0.4911|±  |0.0475|

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.6629|±  |0.0038|
| - humanities     |      2|none  |      |acc   |↑  |0.5862|±  |0.0067|
| - other          |      2|none  |      |acc   |↑  |0.7171|±  |0.0078|
| - social sciences|      2|none  |      |acc   |↑  |0.7735|±  |0.0075|
| - stem           |      2|none  |      |acc   |↑  |0.6159|±  |0.0084|