Update README.md
Browse files
README.md
CHANGED
@@ -71,19 +71,19 @@ In this section, we report the evaluation results of SmolLM2. All evaluations ar
|
|
71 |
|
72 |
## Instruction model Vs. Humanized model
|
73 |
|
74 |
-
| Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized |
|
75 |
-
|
76 |
-
| MMLU | **23.1** | **23.1** |
|
77 |
-
| ARC (Easy) | **54.3** | 50.2 |
|
78 |
-
| ARC (Challenge) | **26.1** | 25.3 |
|
79 |
-
| HellaSwag | **43.0** | 41.6 |
|
80 |
-
| PIQA | **67.2** | 66.2 |
|
81 |
-
| WinoGrande | **52.5** | 52.2 |
|
82 |
-
| TriviaQA | **0.3** | 0.1 |
|
83 |
-
| GSM8K | 0.2 | **0.5** |
|
84 |
-
| OpenBookQA | **32.6** | 32.0 |
|
85 |
-
| CommonSenseQA | **4.8** | 2.2 |
|
86 |
-
| QuAC (F1) | **14.1** | 11.0 |
|
87 |
|
88 |
|
89 |
## Limitations
|
|
|
71 |
|
72 |
## Instruction model Vs. Humanized model
|
73 |
|
74 |
+
| Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized | Difference |
|
75 |
+
|:-----------------------------|:---------------------:|:----------------------:|:----------:|
|
76 |
+
| MMLU | **23.1** | **23.1** | 0.0 |
|
77 |
+
| ARC (Easy) | **54.3** | 50.2 | -4.1 |
|
78 |
+
| ARC (Challenge) | **26.1** | 25.3 | -0.8 |
|
79 |
+
| HellaSwag | **43.0** | 41.6 | -1.4 |
|
80 |
+
| PIQA | **67.2** | 66.2 | -1.0 |
|
81 |
+
| WinoGrande | **52.5** | 52.2 | -0.3 |
|
82 |
+
| TriviaQA | **0.3** | 0.1 | -0.2 |
|
83 |
+
| GSM8K | 0.2 | **0.5** | +0.3 |
|
84 |
+
| OpenBookQA | **32.6** | 32.0 | -0.6 |
|
85 |
+
| CommonSenseQA | **4.8** | 2.2 | -2.6 |
|
86 |
+
| QuAC (F1) | **14.1** | 11.0 | -3.1 |
|
87 |
|
88 |
|
89 |
## Limitations
|