File size: 2,204 Bytes

ff1d73d
aa196a1
 
 
65a709c
 
 
 
 
 
 
 
 
 
 
ff1d73d
 
 
 
 
 
 
 
 
65a709c
ff1d73d
65a709c
ff1d73d
65a709c
ff1d73d
 
65a709c
ff1d73d
33a64f6
83e8a7a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33a64f6
ff1d73d
65a709c
ff1d73d
378d000
65a709c
ff1d73d
65a709c
ff1d73d
65a709c
378d000

---
language:
- en
- ko
license: other
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
- llama-3-ko
pipeline_tag: text-generation
license_name: llama3
license_link: LICENSE
---

# Model Card for Model ID




## Model Details

Llama-3-Open-Ko-8B model is continued pretrained language model based on Llama-3-8B.

This model is trained fully with publicily available resource, with 60GB+ of deduplicated texts.

With the new Llama-3 tokenizer, the pretraining conducted with 17.7B+ tokens, which slightly more than Korean tokenizer(Llama-2-Ko tokenizer).


**Sample usage**

```
  from transformers import pipeline
  import torch
  
  pipe = pipeline(
      task="text-generation",
      model=model,
      tokenizer=tokenizer,
      model_kwargs={"torch_dtype": torch.bfloat16},
      truncation=True
  )
  
  def extract_response_llama3(question):
      messages = [
          {"role": "system", "content": ""},
          {"role": "user", "content": question},
      ]
  
      prompt = pipe.tokenizer.apply_chat_template(
          messages,
          tokenize=False,
          add_generation_prompt=True
      )
  
      terminators = [
          pipe.tokenizer.eos_token_id,
          pipe.tokenizer.convert_tokens_to_ids("<|eot_id|>")
      ]
  
      outputs = pipe(
          prompt,
          max_new_tokens=256,
          eos_token_id=terminators,
          do_sample=True,
          temperature=0.1,
          top_p=0.9,
          num_return_sequences=1
      )
  
      return outputs[0]['generated_text'].split('\n')[-1]
  
  
  question = "예산을 분배할 때 사업의 우선 순위를 정해서 차등 지원하는 방법을 뭐라고 하지"
  response = extract_response_llama3(question)
  print(response)
  
  question = "미세먼지 생성물질의 배출을 저감하고 종합적으로 관리하기 위한 법을 어디서 제정했니"
  response = extract_response_llama3(question)
  print(response)
  
  question = "어떤 장소의 대기오염을 방지하기 위한 정책의 법적 근거가 특별법의 제정으로 준비되었지"
  response = extract_response_llama3(question)
  print(response)
```

**Sample Output**

```
선택과 집중

환경부

항만
```