ilsp
/

Llama-Krikri-8B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

droussis commited on about 22 hours ago

Commit

93cd609

·

verified ·

1 Parent(s): 0c79215

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -184,7 +184,7 @@ Llama-Krikri-8B Instruct exhibits very strong chat capabilities by scoring **hig
 Below, we show the scores for the original Arena-Hard-Auto dataset for various open and closed chat models. We followed the original methodology by using **gpt-4-1106-preview as the judge model** and **gpt-4-0314 as the baseline model**.
-Llama-Krikri-8B Instruct performs very well in the English variant of Arena-Hard-Auto as well, since we can observe that it is **competitive with significantly larger previous-generation LLMs** (such as Qwen 2 72B Instruct) and that it **improves upon Llama-3.1-8B Instruct by +24.5% / +16%** (No style control / With style control).
 ![image/png](arena_hard_en.png)
 ***Please note** that judge models are biased towards student models trained on distilled data from them. You can read more [here](https://arxiv.org/pdf/2502.01534?).

 Below, we show the scores for the original Arena-Hard-Auto dataset for various open and closed chat models. We followed the original methodology by using **gpt-4-1106-preview as the judge model** and **gpt-4-0314 as the baseline model**.
+Llama-Krikri-8B Instruct performs very well in the English variant of Arena-Hard-Auto as well, since we can observe that it is **competitive with similarly sized LLMs** and that it **improves upon Llama-3.1-8B Instruct by +24.5% / +16%** (No style control / With style control).
 ![image/png](arena_hard_en.png)
 ***Please note** that judge models are biased towards student models trained on distilled data from them. You can read more [here](https://arxiv.org/pdf/2502.01534?).