AssistantsLab
/

SmolLM2-135M-humanized

Text Generation

text-generation-inference

Model card Files Files and versions Community

Michielo commited on Jan 29

Commit

9eec746

·

verified ·

1 Parent(s): 28cbde0

Update README.md

Files changed (1) hide show

README.md +15 -28

README.md CHANGED Viewed

@@ -69,34 +69,21 @@ trl chat --model_name_or_path HuggingFaceTB/SmolLM2-135M-Instruct --device cpu
 In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
-## Base pre-trained model
-| Metrics            | SmolLM2-135M-8k | SmolLM-135M  |
-|:-------------------|:----------------:|:------------:|
-| HellaSwag         | **42.1**         | 41.2         |
-| ARC (Average)     | **43.9**         | 42.4         |
-| PIQA              | 68.4             | 68.4         |
-| MMLU (cloze)      | **31.5**         | 30.2         |
-| CommonsenseQA     | **33.9**         | 32.7         |
-| TriviaQA          | 4.1              | **4.3**      |
-| Winogrande        | 51.3             | 51.3         |
-| OpenBookQA        | **34.6**         | 34.0         |
-| GSM8K (5-shot)    | **1.4**          | 1.0          |
-## Instruction model
-| Metric                       | SmolLM2-135M-Instruct | SmolLM-135M-Instruct |
-|:-----------------------------|:---------------------:|:--------------------:|
-| IFEval (Average prompt/inst) | **29.9**                 | 17.2                |
-| MT-Bench                     | **19.8**                 | 16.8                |
-| HellaSwag                    | **40.9**                 | 38.9                |
-| ARC (Average)                | **37.3**                 | 33.9                |
-| PIQA                         | **66.3**                 | 64.0                |
-| MMLU (cloze)                 | **29.3**                 | 28.3                |
-| BBH (3-shot)                 | **28.2**                 | 25.2                |
-| GSM8K (5-shot)               | 1.4                  | 1.4                 |
 ## Limitations

 In this section, we report the evaluation results of SmolLM2. All evaluations are zero-shot unless stated otherwise, and we use [lighteval](https://github.com/huggingface/lighteval) to run them.
+## Instruction model Vs. Humanized model
+| Metric                       | SmolLM2-135M-Instruct | SmolLM2-135M-Humanized |
+|:-----------------------------|:---------------------:|:----------------------:|
+| MMLU                         | **23.1**              | 23.1                   |
+| ARC (Easy)                   | **54.3**              | 50.2                   |
+| ARC (Challenge)              | **26.1**              | 25.3                   |
+| HellaSwag                    | **43.0**              | 41.6                   |
+| PIQA                         | **67.2**              | 66.2                   |
+| WinoGrande                   | **52.5**              | 52.2                   |
+| TriviaQA                     | **0.3**               | 0.1                    |
+| GSM8K                        | 0.2                   | **0.5**                |
+| OpenBookQA                   | **32.6**              | 32.0                   |
+| CommonSenseQA                | **4.8**               | 2.2                    |
+| QuAC (F1)                    | **14.1**              | 11.0                   |
 ## Limitations