File size: 3,454 Bytes

01cd64f
52aaf8b
 
 
01cd64f
1fb9362
 
 
 
 
 
 
 
01cd64f
 
1fb9362
01cd64f
 
1fb9362
 
01cd64f
1fb9362
01cd64f
1fb9362
 
 
 
01cd64f
1fb9362
 
01cd64f
1fb9362
 
01cd64f
1fb9362
 
01cd64f
1fb9362
 
 
 
 
 
 
 
01cd64f
1fb9362
 
 
 
 
 
 
01cd64f
1fb9362
 
 
 
01cd64f
1fb9362
 
 
 
 
01cd64f
1fb9362
 
 
 
01cd64f
1fb9362
 
 
 
01cd64f
1fb9362
 
01cd64f
1fb9362
 
01cd64f
1fb9362
 
 
01cd64f
 
1fb9362
 
01cd64f
1fb9362
01cd64f
4e2e2b3
 
 
 
 
 
 
1fb9362
01cd64f
1fb9362

---
language:
- ko
license: apache-2.0
library_name: transformers
tags:
- text-generation-inference
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
---

# EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval


## About the Model
This model has been fine-tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.

The base model for this model is [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0).

## Prompt Template
```
주어진 질문과 정보가 주어졌을 때 질문에 답하기에 충분한 정보인지 평가해줘.
정보가 충분한지를 평가하기 위해 "예" 또는 "아니오"로 답해줘. 

### 질문: 
{question}

### 정보: 
{context}

### 평가: 
```

## How to Use it
```python
import torch
from transformers import (
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
)

model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)

prompt_template = '주어진 질문과 정보가 주어졌을 때 질문에 답하기에 충분한 정보인지 평가해줘.\n정보가 충분한지를 평가하기 위해 "예" 또는 "아니오"로 답해줘.\n\n### 질문:\n{question}\n\n### 정보:\n{context}\n\n### 평가:\n'
query = {
    "question": "동아리 종강총회가 언제인가요?",
    "context": "종강총회 날짜는 6월 21일입니다."
}

model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)
```

### Example Output
```
주어진 질문과 정보가 주어졌을 때 질문에 답하기에 충분한 정보인지 평가해줘.
정보가 충분한지를 평가하기 위해 "예" 또는 "아니오"로 답해줘.

### 질문:
동아리 종강총회가 언제인가요?

### 정보:
종강총회 날짜는 6월 21일입니다.

### 평가:
예<|end_of_text|>
```

### Training Data
- Referenced generated_instruction by [stanford_alpaca](https://github.com/tatsu-lab/stanford_alpaca)
- use [yanolja/EEVE-Korean-Instruct-10.8B-v1.0](https://huggingface.co/yanolja/EEVE-Korean-Instruct-10.8B-v1.0) as the model for question generation.

## Metrics

### Korean LLM Benchmark

|         Model                                   | Average |  Ko-ARC   | Ko-HellaSwag | Ko-MMLU | Ko-TruthfulQA | Ko-CommonGen V2|
|:-------------------------------:|:--------:|:-----:|:---------:|:------:|:------:|:------:|
| EEVE-Korean-Instruct-10.8B-v1.0                 | 56.08    | 55.2 | 66.11     | 56.48  | 49.14 | 53.48 |
| EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 56.1    | 55.55 | 65.95     | 56.24  | 48.66 | 54.07 |

### Generated Dataset

|         Model                                   | Accuracy |  F1   | Precision | Recall |
|:-------------------------------:|:--------:|:-----:|:---------:|:------:|
| EEVE-Korean-Instruct-10.8B-v1.0                 | 0.824    | 0.800 | 0.885     | 0.697  |
| EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval | 0.892    | 0.875 | 0.903     | 0.848  |