KARAKURI LM Instruct
Collection
1 item
•
Updated
•
1
[email protected]
The model uses the same prompt template as Command R+, except that it contains attribute values.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("karakuri-ai/karakuri-lm-8x7b-instruct-v0.1")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hello! How can I help you today?"},
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
messages = [
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
tools = [
{
"name": "internet_search",
"description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Query to search the internet with"
}
},
"required": ["query"]
}
},
{
"name": "directly_answer",
"description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
"parameters": {
"type": "object",
"properties": {}
}
}
]
tokenizer.apply_chat_template(
messages,
chat_template="tool_use",
tools=tools,
add_generation_prompt=True,
tokenize=False,
)
messages = [
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
documents = [
{
"title": "Tsukiji Outer Market",
"text": "While the inner wholesale market has moved to Toyosu, Tsukiji Outer Market remains a bustling hub for fresh seafood and street food. Enjoy sushi, sashimi, and other delicacies while exploring the vibrant market streets.",
},
{
"title": "Meiji Shrine",
"text": "Nestled in a lush forest in the heart of the city, Meiji Shrine offers a peaceful retreat from the urban hustle. Dedicated to Emperor Meiji and Empress Shoken, the shrine is a popular site for traditional Japanese weddings. Stroll along the serene paths and experience a moment of tranquility."
}
]
tokenizer.apply_chat_template(
messages,
chat_template="rag",
documents=documents,
add_generation_prompt=True,
tokenize=False,
)
The prompt template contains nine attributes. The first five are derived from HelpSteer, while the remaining four are derived from OASST2. The values are represented by integers ranging from 0 to 4, with 0 being the lowest and 4 being the highest.
If you want to change the attribute values from the default values specified in the template, you can pass them as arguments to the apply_chat_template
method as follows:
messages = [
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=False,
helpfulness=0,
correctness=0,
coherence=2,
complexity=0,
verbosity=3,
quality=0,
toxicity=4,
humor=1,
creativity=1,
)
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"karakuri-ai/karakuri-lm-8x7b-instruct-v0.1",
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(input_ids, max_new_tokens=512)
tokenizer.decode(outputs[0][input_ids.shape[-1]:])
The model was trained on approximately 1 billion tokens of fine-tuning data. The details are as follows:
Dataset | # Tokens / Epoch | # Epochs | # Tokens | Percent |
---|---|---|---|---|
databricks/databricks-dolly-15k | 3M | 5 | 16M | 1.5% |
glaiveai/glaive-code-assistant-v3 | 520M | 0.3 | 156M | 14.6% |
glaiveai/glaive-function-calling-v2 | 52M | 3 | 157M | 14.7% |
gretelai/synthetic_text_to_sql | 19M | 3 | 57M | 5.3% |
meta-math/MetaMathQA | 81M | 1 | 81M | 7.6% |
microsoft/orca-math-word-problems-200k | 67M | 1 | 67M | 6.3% |
neural-bridge/rag-dataset-12000 | 12M | 5 | 61M | 5.7% |
neural-bridge/rag-hallucination-dataset-1000 | 1M | 5 | 5M | 0.5% |
nvidia/HelpSteer | 24M | 5 | 118M | 11.0% |
OpenAssistant/oasst2 | 27M | 5 | 133M | 12.4% |
KARAKURI Instruction Dataset | 1M | 5 | 6M | 0.6% |
KARAKURI Corpus | 214M | 1 | 214M | 20.0% |
The model sometimes attempts to call unprovided tools. You should implement a post-process to exclude those tools.
@misc{karakuri_lm_8x7b_instruct_v01,
author = { {KARAKURI} {I}nc. },
title = { {KARAKURI} {LM} 8x7{B} {I}nstruct v0.1 },
year = { 2024 },
url = { https://huggingface.co/karakuri-ai/karakuri-lm-8x7b-instruct-v0.1 },
publisher = { Hugging Face },
journal = { Hugging Face repository }
}
Base model
tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1