File size: 2,334 Bytes
45f8fc7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b51c97
 
45f8fc7
 
5b51c97
45f8fc7
 
 
5b51c97
 
45f8fc7
5b51c97
45f8fc7
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
####  \[EN\] Guide for Input .jsonl Files
If you have five models to compare, upload five .jsonl files.
  * ๐Ÿ’ฅAll `.jsonl` files must have the same number of rows.
  * ๐Ÿ’ฅThe `model_id` field must be different for each file and unique within each file.
  * ๐Ÿ’ฅEach `.jsonl` file should have different `generated`, `model_id` from the other files. `instruction`, `task` should be the same.

**Required `.jsonl` Fields**
  * Reserved Fields (Mandatory)
    * `model_id`: The name of the model being evaluated. (Recommended to be short)
    * `instruction`: The instruction given to the model. This corresponds to the test set prompt (not the evaluation prompt).
    * `generated`: Enter the response generated by the model for the test set instruction.
    * `task`: Used to group and display overall results as a subset. Can be utilized when you want to use different evaluation prompts per row.
  * Additional
    * Depending on the evaluation prompt you use, you can utilize other additional fields. You can freely add them to your `.jsonl` files, avoiding the keywords
      mentioned above.
      * Example: For `translation_pair.yaml` and `translation_fortunecookie.yaml` prompts, the `source_lang` and `target_lang` fields are read from the `.jsonl` and
        utilized.

For example, when evaluating with the `translation_pair` prompt, each .jsonl file looks like this:
```python
# model1.jsonl
{"model_id": "๋ชจ๋ธ1", "task": "์˜ํ•œ", "instruction": "์–ด๋””๋กœ ๊ฐ€์•ผํ•˜์˜ค", "generated": "Where should I go", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "๋ชจ๋ธ1", "task": "ํ•œ์˜", "instruction": "1+1?", "generated": "1+1?", "source_lang": "English", "target_lang": "Korean"} 

# model2.jsonl -* model1.jsonl๊ณผ `instruction`์€ ๊ฐ™๊ณ  `generated`, `model_id` ๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค!
{"model_id": "๋ชจ๋ธ2", "task": "์˜ํ•œ", "instruction": "์–ด๋””๋กœ ๊ฐ€์•ผํ•˜์˜ค", "generated": "๊ธ€์Ž„๋‹ค", "source_lang": "Korean", "target_lang": "English"}
{"model_id": "๋ชจ๋ธ2", "task": "ํ•œ์˜", "instruction": "1+1?", "generated": "2", "source_lang": "English", "target_lang": "Korean"} 
...
..

```
On the other hand, when evaluating with the `llmbar` prompt, fields like source_lang and target_lang are not used, similar to translation evaluation, and naturally, you don't need to add them to your .jsonl.