Spaces:

NCSOFT
/

ArenaLite

Sleeping

App Files Files Community

ArenaLite / guide_mds /input_jsonls_en.md

sonsus

rebrand: varco-arena -> arena-lite

45f8fc7 2 months ago

preview code

raw

history blame

2.33 kB

	#### \[EN\] Guide for Input .jsonl Files
	If you have five models to compare, upload five .jsonl files.
	* 💥All `.jsonl` files must have the same number of rows.
	* 💥The `model_id` field must be different for each file and unique within each file.
	* 💥Each `.jsonl` file should have different `generated`, `model_id` from the other files. `instruction`, `task` should be the same.

	Required `.jsonl` Fields
	* Reserved Fields (Mandatory)
	* `model_id`: The name of the model being evaluated. (Recommended to be short)
	* `instruction`: The instruction given to the model. This corresponds to the test set prompt (not the evaluation prompt).
	* `generated`: Enter the response generated by the model for the test set instruction.
	* `task`: Used to group and display overall results as a subset. Can be utilized when you want to use different evaluation prompts per row.
	* Additional
	* Depending on the evaluation prompt you use, you can utilize other additional fields. You can freely add them to your `.jsonl` files, avoiding the keywords
	mentioned above.
	* Example: For `translation_pair.yaml` and `translation_fortunecookie.yaml` prompts, the `source_lang` and `target_lang` fields are read from the `.jsonl` and
	utilized.

	For example, when evaluating with the `translation_pair` prompt, each .jsonl file looks like this:
	```python
	# model1.jsonl
	{"model_id": "모델1", "task": "영한", "instruction": "어디로 가야하오", "generated": "Where should I go", "source_lang": "Korean", "target_lang": "English"}
	{"model_id": "모델1", "task": "한영", "instruction": "1+1?", "generated": "1+1?", "source_lang": "English", "target_lang": "Korean"}

	# model2.jsonl -* model1.jsonl과 `instruction`은 같고 `generated`, `model_id` 는 다릅니다!
	{"model_id": "모델2", "task": "영한", "instruction": "어디로 가야하오", "generated": "글쎄다", "source_lang": "Korean", "target_lang": "English"}
	{"model_id": "모델2", "task": "한영", "instruction": "1+1?", "generated": "2", "source_lang": "English", "target_lang": "Korean"}
	...
	..

	```
	On the other hand, when evaluating with the `llmbar` prompt, fields like source_lang and target_lang are not used, similar to translation evaluation, and naturally, you don't need to add them to your .jsonl.