Update README.md
Browse files
README.md
CHANGED
@@ -184,7 +184,7 @@ Llama-Krikri-8B Instruct exhibits very strong chat capabilities by scoring **hig
|
|
184 |
|
185 |
Below, we show the scores for the original Arena-Hard-Auto dataset for various open and closed chat models. We followed the original methodology by using **gpt-4-1106-preview as the judge model** and **gpt-4-0314 as the baseline model**.
|
186 |
|
187 |
-
Llama-Krikri-8B Instruct performs very well in the English variant of Arena-Hard-Auto as well, since we can observe that it is **competitive with
|
188 |

|
189 |
|
190 |
***Please note** that judge models are biased towards student models trained on distilled data from them. You can read more [here](https://arxiv.org/pdf/2502.01534?).
|
|
|
184 |
|
185 |
Below, we show the scores for the original Arena-Hard-Auto dataset for various open and closed chat models. We followed the original methodology by using **gpt-4-1106-preview as the judge model** and **gpt-4-0314 as the baseline model**.
|
186 |
|
187 |
+
Llama-Krikri-8B Instruct performs very well in the English variant of Arena-Hard-Auto as well, since we can observe that it is **competitive with similarly sized LLMs** and that it **improves upon Llama-3.1-8B Instruct by +24.5% / +16%** (No style control / With style control).
|
188 |

|
189 |
|
190 |
***Please note** that judge models are biased towards student models trained on distilled data from them. You can read more [here](https://arxiv.org/pdf/2502.01534?).
|