Spaces:
Running
Running
| #### \[EN\] Upload guide (`jsonl`) | |
| **Basic Requirements** | |
| * Upload one `jsonl` file per model (e.g., five files to compare five LLMs) | |
| * ⚠️ Important: All `jsonl` files must have the same number of rows | |
| * ⚠️ Important: The `model_id` field must be unique within and across all files | |
| **Required Fields** | |
| * Per Model Fields | |
| * `model_id`: Unique identifier for the model (recommendation: keep it short) | |
| * `generated`: The LLM's response to the test instruction | |
| * Required only for Translation (`translation_pair` prompt need those. See `streamlit_app_local/user_submit/mt/llama5.jsonl`) | |
| * `source_lang`: input language (e.g. Korean, KR, kor, ...) | |
| * `target_lang`: output language (e.g. English, EN, ...) | |
| * Common Fields (Must be identical across all files) | |
| * `instruction`: The input prompt or test instruction given to the model | |
| * `task`: Category label used to group results (useful when using different evaluation prompts per task) | |
| **Example Format** | |
| ```python | |
| # model1.jsonl | |
| {"model_id": "model1", "task": "directions", "instruction": "Where should I go?", "generated": "Over there"} | |
| {"model_id": "model1", "task": "arithmetic", "instruction": "1+1", "generated": "2"} | |
| # model2.jsonl | |
| {"model_id": "model2", "task": "directions", "instruction": "Where should I go?", "generated": "Head north"} | |
| {"model_id": "model2", "task": "arithmetic", "instruction": "1+1", "generated": "3"} | |
| ... | |
| .. | |
| . | |
| ``` | |
| **Use Case Example** | |
| If you want to compare different prompting strategies for the same model: | |
| * Use the same `instruction` across files (using unified test scenarios). | |
| * `generated` responses of each prompting strategy will vary across the files. | |
| * Use descriptive `model_id` values like "prompt1", "prompt2", etc. | |