Michielo commited on
Commit
9eec746
·
verified ·
1 Parent(s): 28cbde0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -28
README.md CHANGED
@@ -69,34 +69,21 @@ trl chat --model_name_or_path HuggingFaceTB/SmolLM2-135M-Instruct --device cpu
69
 
70
  In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
71
 
72
- ## Base pre-trained model
73
-
74
- | Metrics | SmolLM2-135M-8k | SmolLM-135M |
75
- |:-------------------|:----------------:|:------------:|
76
- | HellaSwag | **42.1** | 41.2 |
77
- | ARC (Average) | **43.9** | 42.4 |
78
- | PIQA | 68.4 | 68.4 |
79
- | MMLU (cloze) | **31.5** | 30.2 |
80
- | CommonsenseQA | **33.9** | 32.7 |
81
- | TriviaQA | 4.1 | **4.3** |
82
- | Winogrande | 51.3 | 51.3 |
83
- | OpenBookQA | **34.6** | 34.0 |
84
- | GSM8K (5-shot) | **1.4** | 1.0 |
85
-
86
-
87
- ## Instruction model
88
-
89
- | Metric | SmolLM2-135M-Instruct | SmolLM-135M-Instruct |
90
- |:-----------------------------|:---------------------:|:--------------------:|
91
- | IFEval (Average prompt/inst) | **29.9** | 17.2 |
92
- | MT-Bench | **19.8** | 16.8 |
93
- | HellaSwag | **40.9** | 38.9 |
94
- | ARC (Average) | **37.3** | 33.9 |
95
- | PIQA | **66.3** | 64.0 |
96
- | MMLU (cloze) | **29.3** | 28.3 |
97
- | BBH (3-shot) | **28.2** | 25.2 |
98
- | GSM8K (5-shot) | 1.4 | 1.4 |
99
-
100
 
101
 
102
  ## Limitations
 
69
 
70
  In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
71
 
72
+ ## Instruction model Vs. Humanized model
73
+
74
+ | Metric | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized |
75
+ |:-----------------------------|:---------------------:|:----------------------:|
76
+ | MMLU | **23.1** | 23.1 |
77
+ | ARC (Easy) | **54.3** | 50.2 |
78
+ | ARC (Challenge) | **26.1** | 25.3 |
79
+ | HellaSwag | **43.0** | 41.6 |
80
+ | PIQA | **67.2** | 66.2 |
81
+ | WinoGrande | **52.5** | 52.2 |
82
+ | TriviaQA | **0.3** | 0.1 |
83
+ | GSM8K | 0.2 | **0.5** |
84
+ | OpenBookQA | **32.6** | 32.0 |
85
+ | CommonSenseQA | **4.8** | 2.2 |
86
+ | QuAC (F1) | **14.1** | 11.0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
 
89
  ## Limitations