|
--- |
|
language: |
|
- en |
|
- ko |
|
license: other |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-3 |
|
- llama-3-ko |
|
pipeline_tag: text-generation |
|
license_name: llama3 |
|
license_link: LICENSE |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
Llama-3-Open-Ko-8B model is continued pretrained language model based on Llama-3-8B. |
|
|
|
This model is trained fully with publicily available resource, with 60GB+ of deduplicated texts. |
|
|
|
With the new Llama-3 tokenizer, the pretraining conducted with 17.7B+ tokens, which slightly more than Korean tokenizer(Llama-2-Ko tokenizer). |
|
|
|
|
|
**Sample usage** |
|
|
|
``` |
|
from transformers import pipeline |
|
import torch |
|
|
|
pipe = pipeline( |
|
task="text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
truncation=True |
|
) |
|
|
|
def extract_response_llama3(question): |
|
messages = [ |
|
{"role": "system", "content": ""}, |
|
{"role": "user", "content": question}, |
|
] |
|
|
|
prompt = pipe.tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
terminators = [ |
|
pipe.tokenizer.eos_token_id, |
|
pipe.tokenizer.convert_tokens_to_ids("<|eot_id|>") |
|
] |
|
|
|
outputs = pipe( |
|
prompt, |
|
max_new_tokens=256, |
|
eos_token_id=terminators, |
|
do_sample=True, |
|
temperature=0.1, |
|
top_p=0.9, |
|
num_return_sequences=1 |
|
) |
|
|
|
return outputs[0]['generated_text'].split('\n')[-1] |
|
|
|
|
|
question = "์์ฐ์ ๋ถ๋ฐฐํ ๋ ์ฌ์
์ ์ฐ์ ์์๋ฅผ ์ ํด์ ์ฐจ๋ฑ ์ง์ํ๋ ๋ฐฉ๋ฒ์ ๋ญ๋ผ๊ณ ํ์ง" |
|
response = extract_response_llama3(question) |
|
print(response) |
|
|
|
question = "๋ฏธ์ธ๋จผ์ง ์์ฑ๋ฌผ์ง์ ๋ฐฐ์ถ์ ์ ๊ฐํ๊ณ ์ข
ํฉ์ ์ผ๋ก ๊ด๋ฆฌํ๊ธฐ ์ํ ๋ฒ์ ์ด๋์ ์ ์ ํ๋" |
|
response = extract_response_llama3(question) |
|
print(response) |
|
|
|
question = "์ด๋ค ์ฅ์์ ๋๊ธฐ์ค์ผ์ ๋ฐฉ์งํ๊ธฐ ์ํ ์ ์ฑ
์ ๋ฒ์ ๊ทผ๊ฑฐ๊ฐ ํน๋ณ๋ฒ์ ์ ์ ์ผ๋ก ์ค๋น๋์์ง" |
|
response = extract_response_llama3(question) |
|
print(response) |
|
``` |
|
|
|
**Sample Output** |
|
|
|
``` |
|
์ ํ๊ณผ ์ง์ค |
|
|
|
ํ๊ฒฝ๋ถ |
|
|
|
ํญ๋ง |
|
``` |
|
|