File size: 7,253 Bytes
406d0f7
 
8918c75
 
 
 
 
 
 
 
ccba35c
 
406d0f7
 
8918c75
406d0f7
37cc040
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: transformers
license: mit
datasets:
- ethicalabs/Kurtis-E1-SFT
language:
- en
base_model:
- Qwen/Qwen3-4B
pipeline_tag: text-generation
tags:
- text-generation-inference
---

# Model Card for ethicalabs/Kurtis-E1.1-Qwen3-4B


Kurtis E1.1 fine-tuned with [flower](https://flower.ai/)

## Eval Results

Evaluation tasks were performed with the [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) on a Mac Mini M4 Pro.

### mmlu

```
lm_eval --model hf --model_args pretrained=ethicalabs/Kurtis-E1.1-Qwen3-4B  --tasks mmlu --device mps --batch_size 4
```

|                 Tasks                 |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
|mmlu                                   |      2|none  |      |acc   |↑  |0.6849|±  |0.0037|
| - humanities                          |      2|none  |      |acc   |↑  |0.5951|±  |0.0067|
|  - formal_logic                       |      1|none  |     0|acc   |↑  |0.5952|±  |0.0439|
|  - high_school_european_history       |      1|none  |     0|acc   |↑  |0.7879|±  |0.0319|
|  - high_school_us_history             |      1|none  |     0|acc   |↑  |0.8333|±  |0.0262|
|  - high_school_world_history          |      1|none  |     0|acc   |↑  |0.8439|±  |0.0236|
|  - international_law                  |      1|none  |     0|acc   |↑  |0.7686|±  |0.0385|
|  - jurisprudence                      |      1|none  |     0|acc   |↑  |0.7685|±  |0.0408|
|  - logical_fallacies                  |      1|none  |     0|acc   |↑  |0.8037|±  |0.0312|
|  - moral_disputes                     |      1|none  |     0|acc   |↑  |0.7081|±  |0.0245|
|  - moral_scenarios                    |      1|none  |     0|acc   |↑  |0.3754|±  |0.0162|
|  - philosophy                         |      1|none  |     0|acc   |↑  |0.7170|±  |0.0256|
|  - prehistory                         |      1|none  |     0|acc   |↑  |0.7346|±  |0.0246|
|  - professional_law                   |      1|none  |     0|acc   |↑  |0.4844|±  |0.0128|
|  - world_religions                    |      1|none  |     0|acc   |↑  |0.7778|±  |0.0319|
| - other                               |      2|none  |      |acc   |↑  |0.7161|±  |0.0078|
|  - business_ethics                    |      1|none  |     0|acc   |↑  |0.7300|±  |0.0446|
|  - clinical_knowledge                 |      1|none  |     0|acc   |↑  |0.7396|±  |0.0270|
|  - college_medicine                   |      1|none  |     0|acc   |↑  |0.7168|±  |0.0344|
|  - global_facts                       |      1|none  |     0|acc   |↑  |0.3300|±  |0.0473|
|  - human_aging                        |      1|none  |     0|acc   |↑  |0.6771|±  |0.0314|
|  - management                         |      1|none  |     0|acc   |↑  |0.8155|±  |0.0384|
|  - marketing                          |      1|none  |     0|acc   |↑  |0.8675|±  |0.0222|
|  - medical_genetics                   |      1|none  |     0|acc   |↑  |0.7600|±  |0.0429|
|  - miscellaneous                      |      1|none  |     0|acc   |↑  |0.8008|±  |0.0143|
|  - nutrition                          |      1|none  |     0|acc   |↑  |0.7255|±  |0.0256|
|  - professional_accounting            |      1|none  |     0|acc   |↑  |0.5390|±  |0.0297|
|  - professional_medicine              |      1|none  |     0|acc   |↑  |0.7390|±  |0.0267|
|  - virology                           |      1|none  |     0|acc   |↑  |0.5000|±  |0.0389|
| - social sciences                     |      2|none  |      |acc   |↑  |0.7813|±  |0.0074|
|  - econometrics                       |      1|none  |     0|acc   |↑  |0.6228|±  |0.0456|
|  - high_school_geography              |      1|none  |     0|acc   |↑  |0.8283|±  |0.0269|
|  - high_school_government_and_politics|      1|none  |     0|acc   |↑  |0.8756|±  |0.0238|
|  - high_school_macroeconomics         |      1|none  |     0|acc   |↑  |0.7590|±  |0.0217|
|  - high_school_microeconomics         |      1|none  |     0|acc   |↑  |0.8151|±  |0.0252|
|  - high_school_psychology             |      1|none  |     0|acc   |↑  |0.8679|±  |0.0145|
|  - human_sexuality                    |      1|none  |     0|acc   |↑  |0.7405|±  |0.0384|
|  - professional_psychology            |      1|none  |     0|acc   |↑  |0.7173|±  |0.0182|
|  - public_relations                   |      1|none  |     0|acc   |↑  |0.6818|±  |0.0446|
|  - security_studies                   |      1|none  |     0|acc   |↑  |0.7265|±  |0.0285|
|  - sociology                          |      1|none  |     0|acc   |↑  |0.8308|±  |0.0265|
|  - us_foreign_policy                  |      1|none  |     0|acc   |↑  |0.8100|±  |0.0394|
| - stem                                |      2|none  |      |acc   |↑  |0.6943|±  |0.0079|
|  - abstract_algebra                   |      1|none  |     0|acc   |↑  |0.5700|±  |0.0498|
|  - anatomy                            |      1|none  |     0|acc   |↑  |0.6370|±  |0.0415|
|  - astronomy                          |      1|none  |     0|acc   |↑  |0.8092|±  |0.0320|
|  - college_biology                    |      1|none  |     0|acc   |↑  |0.8333|±  |0.0312|
|  - college_chemistry                  |      1|none  |     0|acc   |↑  |0.5400|±  |0.0501|
|  - college_computer_science           |      1|none  |     0|acc   |↑  |0.6600|±  |0.0476|
|  - college_mathematics                |      1|none  |     0|acc   |↑  |0.5700|±  |0.0498|
|  - college_physics                    |      1|none  |     0|acc   |↑  |0.5784|±  |0.0491|
|  - computer_security                  |      1|none  |     0|acc   |↑  |0.7800|±  |0.0416|
|  - conceptual_physics                 |      1|none  |     0|acc   |↑  |0.7787|±  |0.0271|
|  - electrical_engineering             |      1|none  |     0|acc   |↑  |0.7586|±  |0.0357|
|  - elementary_mathematics             |      1|none  |     0|acc   |↑  |0.6878|±  |0.0239|
|  - high_school_biology                |      1|none  |     0|acc   |↑  |0.8742|±  |0.0189|
|  - high_school_chemistry              |      1|none  |     0|acc   |↑  |0.7192|±  |0.0316|
|  - high_school_computer_science       |      1|none  |     0|acc   |↑  |0.8500|±  |0.0359|
|  - high_school_mathematics            |      1|none  |     0|acc   |↑  |0.4741|±  |0.0304|
|  - high_school_physics                |      1|none  |     0|acc   |↑  |0.6225|±  |0.0396|
|  - high_school_statistics             |      1|none  |     0|acc   |↑  |0.7083|±  |0.0310|
|  - machine_learning                   |      1|none  |     0|acc   |↑  |0.5268|±  |0.0474|

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.6849|±  |0.0037|
| - humanities     |      2|none  |      |acc   |↑  |0.5951|±  |0.0067|
| - other          |      2|none  |      |acc   |↑  |0.7161|±  |0.0078|
| - social sciences|      2|none  |      |acc   |↑  |0.7813|±  |0.0074|
| - stem           |      2|none  |      |acc   |↑  |0.6943|±  |0.0079|