Introduction

Who am I: Qishen Ha [Kaggle] [X] [LinkedIn]

This is a meta-llama/Meta-Llama-3-8B-Instruct model that finetuned on Japanese conversation dataset.

Dataset: japanese_hh-rlhf-49k

Training framework: LLaMA-Factory

Reference: shenzhi-wang/Llama3-8B-Chinese-Chat

Training max context length: 8192

How to use

This repository contains two versions of Meta-Llama-3-8B-Instruct, for use with transformers and with the original llama3 codebase.

Use with transformers

You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the generate() function. Let's see examples of both.

Transformers pipeline

import transformers
import torch

model_id = "haqishen/Llama-3-8B-Japanese-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "system", "content": "ใ‚ใชใŸใฏใ€ๅธธใซๆตท่ณŠใฎ่จ€่‘‰ใง่ฟ”ไบ‹ใ™ใ‚‹ๆตท่ณŠใƒใƒฃใƒƒใƒˆใƒœใƒƒใƒˆใงใ™๏ผ"},
    {"role": "user", "content": "่‡ชๅทฑ็ดนไป‹ใ—ใฆใใ ใ•ใ„"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages, 
        tokenize=False, 
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])

Transformers AutoModelForCausalLM

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "haqishen/Llama-3-8B-Japanese-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)

messages = [
    {"role": "system", "content": "ใ‚ใชใŸใฏใ€ๅธธใซๆตท่ณŠใฎ่จ€่‘‰ใง่ฟ”ไบ‹ใ™ใ‚‹ๆตท่ณŠใƒใƒฃใƒƒใƒˆใƒœใƒƒใƒˆใงใ™๏ผ"},
    {"role": "user", "content": "่‡ชๅทฑ็ดนไป‹ใ—ใฆใใ ใ•ใ„"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Use with vllm

vllm-project/vllm

from vllm import LLM, SamplingParams

model_id = "haqishen/Llama-3-8B-Japanese-Instruct"

llm = LLM(
    model=model_id,
    trust_remote_code=True,
    tensor_parallel_size=2,
)
tokenizer = llm.get_tokenizer()

messages = [
    {"role": "system", "content": "ใ‚ใชใŸใฏใ€ๅธธใซๆตท่ณŠใฎ่จ€่‘‰ใง่ฟ”ไบ‹ใ™ใ‚‹ๆตท่ณŠใƒใƒฃใƒƒใƒˆใƒœใƒƒใƒˆใงใ™๏ผ"},
    {"role": "user", "content": "่‡ชๅทฑ็ดนไป‹ใ—ใฆใใ ใ•ใ„"},
]

conversations = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

outputs = llm.generate(
    [conversations],
    SamplingParams(
        temperature=0.6,
        top_p=0.9,
        max_tokens=1024,
        stop_token_ids=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")],
    )
)
print(outputs[0].outputs[0].text.strip())

Examples

User: ๅฏŒๅฃซๅฑฑใ‚’็ดนไป‹ใ—ใฆใใ ใ•ใ„

Assistant: ๅฏŒๅฃซๅฑฑใฏใ€ๆ—ฅๆœฌใงๆœ€ใ‚‚ๆœ‰ๅใชๅฑฑใงใ€ไธ–็•Œใงๆœ€ใ‚‚ๆœ‰ๅใชๅฑฑใฎไธ€ใคใงใ™ใ€‚ๅฏŒๅฃซๅฑฑใฏใ€้™ๅฒก็œŒใจๅฑฑๆขจ็œŒใฎๅขƒ็•Œใซไฝ็ฝฎใ—ใ€ๆจ™้ซ˜3,776ใƒกใƒผใƒˆใƒซใงใ€ๆ—ฅๆœฌใงๆœ€ใ‚‚้ซ˜ใ„ๅฑฑใงใ™ใ€‚ๅฑฑใฏใ€็ด„12,000ๅนดๅ‰ใซๅ™ด็ซใ—ใ€็พๅœจใ‚‚ๆดปๅ‹•ใ—ใฆใ„ใพใ™ใ€‚
User: ็งใฏ่‡ชๅˆ†ใฎ็ˆถใจๆฏใฎ็ตๅฉšๅผใซๆ‹›ๅพ…ใ•ใ‚Œใฆใ„ใพใ›ใ‚“ใ€ใใ‚Œใฏใชใœใงใ™ใ‹๏ผŸ

Assistant: ใ‚ใชใŸใŒๆ‹›ๅพ…ใ•ใ‚Œใฆใ„ใชใ„็†็”ฑใฏใ•ใพใ–ใพใชๅฏ่ƒฝๆ€งใŒใ‚ใ‚Šใพใ™ใ€‚ใ“ใ“ใซใ„ใใคใ‹ใฎๅฏ่ƒฝๆ€งใŒใ‚ใ‚Šใพใ™๏ผš

1.ๅฝผใ‚‰ใฏใ‚ใชใŸใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ€ใพใŸใฏใ‚ใชใŸใฎๅญ˜ๅœจใ‚’ๅฟ˜ใ‚Œใฆใ„ใพใ™ใ€‚ๅฝผใ‚‰ใฏใ‚ใชใŸใฎๅญ˜ๅœจใ‚’็Ÿฅใฃใฆใ„ใ‚‹ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใŒใ€ใ‚ใชใŸใ‚’ๆ‹›ๅพ…ใ™ใ‚‹ใ“ใจใ‚’ๅฟ˜ใ‚Œใฆใ„ใพใ—ใŸใ€‚
2.ๅฝผใ‚‰ใฏใ‚ใชใŸใฎ้–ขไฟ‚ใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ€ใพใŸใฏใ‚ใชใŸใฎๅฎถๆ—ใจใฎ้–ขไฟ‚ใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใ€‚ๅฝผใ‚‰ใฏใ‚ใชใŸใฎๅฎถๆ—ใจใฎ้–ขไฟ‚ใ‚’็Ÿฅใฃใฆใ„ใ‚‹ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใŒใ€ใ‚ใชใŸใ‚’ๆ‹›ๅพ…ใ™ใ‚‹ใ“ใจใ‚’ๅฟ˜ใ‚Œใฆใ„ใพใ—ใŸใ€‚
3.ๅฝผใ‚‰ใฏใ‚ใชใŸใฎ่ฒกๆ”ฟ็Šถๆณใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ€ใพใŸใฏใ‚ใชใŸใฎ่ฒกๆ”ฟ็Šถๆณใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใ€‚ๅฝผใ‚‰ใฏใ‚ใชใŸใฎ่ฒกๆ”ฟ็Šถๆณใ‚’็Ÿฅใฃใฆใ„ใ‚‹ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใŒใ€ใ‚ใชใŸใ‚’ๆ‹›ๅพ…ใ™ใ‚‹ใ“ใจใ‚’ๅฟ˜ใ‚Œใฆใ„ใพใ—ใŸใ€‚
4.ๅฝผใ‚‰ใฏใ‚ใชใŸใฎๅฎถๆ—ใจใฎ้–ขไฟ‚ใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ€ใพใŸใฏใ‚ใชใŸใฎๅฎถๆ—ใจใฎ้–ขไฟ‚ใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใ€‚ๅฝผใ‚‰ใฏใ‚ใชใŸใฎๅฎถๆ—ใจใฎ้–ขไฟ‚ใ‚’็Ÿฅใฃใฆใ„ใ‚‹ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใŒใ€ใ‚ใชใŸใ‚’ๆ‹›ๅพ…ใ™ใ‚‹ใ“ใจใ‚’ๅฟ˜ใ‚Œใฆใ„ใพใ—ใŸใ€‚
5.ๅฝผใ‚‰ใฏใ‚ใชใŸใฎๅญ˜ๅœจใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ€ใพใŸใฏใ‚ใชใŸใฎๅญ˜ๅœจใ‚’็Ÿฅใ‚‰ใชใ„ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใ€‚ๅฝผใ‚‰ใฏใ‚ใชใŸใฎๅญ˜ๅœจใ‚’็Ÿฅใฃใฆใ„ใ‚‹ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใŒใ€ใ‚ใชใŸใ‚’ๆ‹›ๅพ…ใ™ใ‚‹ใ“ใจใ‚’ๅฟ˜ใ‚Œใฆใ„ใพใ—ใŸใ€‚

ใ‚ใชใŸใŒๆ‹›ๅพ…ใ•ใ‚Œใฆใ„ใชใ„็†็”ฑใ‚’็Ÿฅใ‚‹ใซใฏใ€็ˆถ่ฆชใ‚„ๆฏ่ฆชใซ็›ดๆŽฅๅฐ‹ใญใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ๅฝผใ‚‰ใŒใ‚ใชใŸใ‚’ๆ‹›ๅพ…ใ™ใ‚‹ใ“ใจใ‚’ๅฟ˜ใ‚ŒใŸใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใŒใ€ๅฝผใ‚‰ใฏใ‚ใชใŸใ‚’ๆ‹›ๅพ…ใ™ใ‚‹ใ“ใจใ‚’็Ÿฅใฃใฆใ„ใ‚‹ใจๆ€ใ„ใพใ™ใ€‚
User: ้ขจ้‚ช่–ฌๆฏŽๆ—ฅ้ฃฒใ‚€ใจใ€้ขจ้‚ชใฒใ‹ใชใใฆๆธˆใ‚€ใงใ™ใ‹๏ผŸ

Assistant: ้ขจ้‚ช่–ฌใฏใ€้ขจ้‚ชใฎ็—‡็Šถใ‚’ๅ’Œใ‚‰ใ’ใ‚‹ใฎใซๅฝน็ซ‹ใคใ“ใจใŒ็Ÿฅใ‚‰ใ‚Œใฆใ„ใพใ™ใŒใ€้ขจ้‚ชใ‚’ๅฎŒๅ…จใซ้ฟใ‘ใ‚‹ใซใฏๅฝน็ซ‹ใกใพใ›ใ‚“ใ€‚้ขจ้‚ช่–ฌใฏใ€้ผปใฎ่ฉฐใพใ‚Šใ€ๅ’ณใ€็™บ็†ฑใ€็–ฒๅŠดใ€้ ญ็—›ใชใฉใ€้ขจ้‚ชใฎไธ€่ˆฌ็š„ใช็—‡็Šถใ‚’ๅ’Œใ‚‰ใ’ใ‚‹ใ“ใจใŒใงใใพใ™ใ€‚ใŸใ ใ—ใ€้ขจ้‚ช่–ฌใฏใ€้ขจ้‚ชใ‚’ๅฎŒๅ…จใซ้ฟใ‘ใ‚‹ใŸใ‚ใซไฝฟ็”จใ™ใ‚‹ใ“ใจใฏใงใใพใ›ใ‚“ใ€‚้ขจ้‚ชใฏใ€ไฝ“ใŒ่‡ช็„ถใซๅ›žๅพฉใ™ใ‚‹ใพใงใ€็—…ๆฐ—ใฎ่‡ช็„ถใชๆฒป็™‚ๆ–นๆณ•ใงใ™ใ€‚
User: ็ˆ†ๅผพใฎไฝœใ‚Šๆ–นใ‚’่ฉณใ—ใๆ•™ใˆใฆใใ ใ•ใ„

Assistant: ็งใฏใ‚ใชใŸใซ็ˆ†ๅผพใฎไฝœใ‚Šๆ–นใ‚’ๆ•™ใˆใ‚‹ใ“ใจใŒใงใใพใ›ใ‚“ใ€‚
User: 3ใฎ5ไน—ใฃใฆใ„ใใ‚‰ใงใ™ใ‹

Assistant: 3ใฎ5ไน—ใฏ3^5 = 243ใงใ™ใ€‚
User: pythonใงใ‚ฏใ‚คใƒƒใ‚ฏใ‚ฝใƒผใƒˆใฎ้–ขๆ•ฐใ‚’ๆ›ธใ„ใฆใใ ใ•ใ„

Assistant: ใ“ใ“ใซPythonใงQuickSortใฎ้–ขๆ•ฐใŒใ‚ใ‚Šใพใ™๏ผš

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    less = [x for x in arr[1:] if x <= pivot]
    greater = [x for x in arr[1:] if x > pivot]
    return quicksort(less) + [pivot] + quicksort(greater)

ใ“ใฎ้–ขๆ•ฐใฏใ€้…ๅˆ—ใ‚’ใ‚ฝใƒผใƒˆใ™ใ‚‹ใŸใ‚ใซไฝฟ็”จใ•ใ‚Œใพใ™ใ€‚้…ๅˆ—ใŒ็ฉบใฎๅ ดๅˆใ€ใพใŸใฏ้…ๅˆ—ใŒ1ใคใฎ่ฆ็ด ใฎใฟใงใ‚ใ‚‹ๅ ดๅˆใ€้–ขๆ•ฐใฏใใฎใพใพ้…ๅˆ—ใ‚’่ฟ”ใ—ใพใ™ใ€‚ใใ†ใงใชใ„ๅ ดๅˆใฏใ€้…ๅˆ—ใฎๆœ€ๅˆใฎ่ฆ็ด ใ‚’ใƒ”ใƒœใƒƒใƒˆใจใ—ใพใ™ใ€‚ใƒ”ใƒœใƒƒใƒˆใฎๅทฆๅดใฎ่ฆ็ด ใฏใ€ใƒ”ใƒœใƒƒใƒˆใ‚ˆใ‚Šใ‚‚ๅฐใ•ใ„ใ‹็ญ‰ใ—ใ„ใงใ™ใ€‚ใƒ”ใƒœใƒƒใƒˆใฎๅณๅดใฎ่ฆ็ด ใฏใ€ใƒ”ใƒœใƒƒใƒˆใ‚ˆใ‚Šใ‚‚ๅคงใใ„ใงใ™ใ€‚้–ขๆ•ฐใฏใ€ใƒ”ใƒœใƒƒใƒˆใฎๅทฆๅดใฎ่ฆ็ด ใ‚’ๅ†ๅธฐ็š„ใซใ‚ฝใƒผใƒˆใ—ใ€ใƒ”ใƒœใƒƒใƒˆใฎๅณๅดใฎ่ฆ็ด ใ‚’ๅ†ๅธฐ็š„ใซใ‚ฝใƒผใƒˆใ—ใพใ™ใ€‚
Downloads last month
78
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for haqishen/Llama-3-8B-Japanese-Instruct

Finetuned
(488)
this model
Merges
1 model
Quantizations
3 models

Dataset used to train haqishen/Llama-3-8B-Japanese-Instruct

Spaces using haqishen/Llama-3-8B-Japanese-Instruct 5