atahanuz commited on
Commit
097139b
·
verified ·
1 Parent(s): f084f4d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -39,6 +39,12 @@ The table below summarizes the evaluation results:
39
  | google/gemma-2-9b-it | 54.13% |
40
  | ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 36.89% |
41
 
 
 
 
 
 
 
42
  ### 📊 Turkish Evaluation Benchmark Results (via `malhajar17/lm-evaluation-harness_turkish`)
43
 
44
  | Model Name | Average | MMLU | Truthful_QA | ARC | Hellaswag | Gsm8K | Winogrande |
 
39
  | google/gemma-2-9b-it | 54.13% |
40
  | ytu-ce-cosmos/Turkish-Llama-8b-DPO-v0.1 | 36.89% |
41
 
42
+
43
+ ### Voting Metodology
44
+
45
+ A question and two answers from different models were presented to human judges. The judges selected the better answer based on their preferences. For example, in the question below, the judge selected the answer on the right:
46
+ ![Alt text](https://i.imgur.com/AcR9ymM.png)
47
+
48
  ### 📊 Turkish Evaluation Benchmark Results (via `malhajar17/lm-evaluation-harness_turkish`)
49
 
50
  | Model Name | Average | MMLU | Truthful_QA | ARC | Hellaswag | Gsm8K | Winogrande |