Update README.md
Browse files
README.md
CHANGED
@@ -69,34 +69,21 @@ trl chat --model_name_or_path HuggingFaceTB/SmolLM2-135M-Instruct --device cpu
|
|
69 |
|
70 |
In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
|
71 |
|
72 |
-
##
|
73 |
-
|
74 |
-
|
|
75 |
-
|
76 |
-
|
|
77 |
-
| ARC (
|
78 |
-
|
|
79 |
-
|
|
80 |
-
|
|
81 |
-
|
|
82 |
-
|
|
83 |
-
|
|
84 |
-
|
|
85 |
-
|
86 |
-
|
87 |
-
## Instruction model
|
88 |
-
|
89 |
-
| Metric | SmolLM2-135M-Instruct | SmolLM-135M-Instruct |
|
90 |
-
|:-----------------------------|:---------------------:|:--------------------:|
|
91 |
-
| IFEval (Average prompt/inst) | **29.9** | 17.2 |
|
92 |
-
| MT-Bench | **19.8** | 16.8 |
|
93 |
-
| HellaSwag | **40.9** | 38.9 |
|
94 |
-
| ARC (Average) | **37.3** | 33.9 |
|
95 |
-
| PIQA | **66.3** | 64.0 |
|
96 |
-
| MMLU (cloze) | **29.3** | 28.3 |
|
97 |
-
| BBH (3-shot) | **28.2** | 25.2 |
|
98 |
-
| GSM8K (5-shot) | 1.4 | 1.4 |
|
99 |
-
|
100 |
|
101 |
|
102 |
## Limitations
|
|
|
69 |
|
70 |
In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
|
71 |
|
72 |
+
## Instruction model Vs. Humanized model
|
73 |
+
|
74 |
+
| Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized |
|
75 |
+
|:-----------------------------|:---------------------:|:----------------------:|
|
76 |
+
| MMLU | **23.1** | 23.1 |
|
77 |
+
| ARC (Easy) | **54.3** | 50.2 |
|
78 |
+
| ARC (Challenge) | **26.1** | 25.3 |
|
79 |
+
| HellaSwag | **43.0** | 41.6 |
|
80 |
+
| PIQA | **67.2** | 66.2 |
|
81 |
+
| WinoGrande | **52.5** | 52.2 |
|
82 |
+
| TriviaQA | **0.3** | 0.1 |
|
83 |
+
| GSM8K | 0.2 | **0.5** |
|
84 |
+
| OpenBookQA | **32.6** | 32.0 |
|
85 |
+
| CommonSenseQA | **4.8** | 2.2 |
|
86 |
+
| QuAC (F1) | **14.1** | 11.0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
|
88 |
|
89 |
## Limitations
|