Update README.md
Browse files
README.md
CHANGED
@@ -1,199 +1,227 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
#
|
7 |
-
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
10 |
|
|
|
11 |
|
12 |
## Model Details
|
13 |
|
14 |
### Model Description
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
- **
|
21 |
-
- **
|
22 |
-
- **
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
## Training Details
|
77 |
|
78 |
### Training Data
|
79 |
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
[
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
-
|
103 |
-
##
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
|
108 |
-
|
109 |
-
|
110 |
-
|
111 |
-
|
112 |
-
|
113 |
-
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
-
|
121 |
-
#### Metrics
|
122 |
-
|
123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
-
|
125 |
-
[More Information Needed]
|
126 |
-
|
127 |
-
### Results
|
128 |
-
|
129 |
-
[More Information Needed]
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
-
|
159 |
-
### Compute Infrastructure
|
160 |
-
|
161 |
-
[More Information Needed]
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
-
|
169 |
-
[More Information Needed]
|
170 |
-
|
171 |
-
## Citation [optional]
|
172 |
-
|
173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
-
|
175 |
-
**BibTeX:**
|
176 |
-
|
177 |
-
[More Information Needed]
|
178 |
-
|
179 |
-
**APA:**
|
180 |
-
|
181 |
-
[More Information Needed]
|
182 |
-
|
183 |
-
## Glossary [optional]
|
184 |
-
|
185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
-
|
197 |
-
## Model Card Contact
|
198 |
-
|
199 |
-
[More Information Needed]
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
license: apache-2.0
|
4 |
+
datasets:
|
5 |
+
- databricks/databricks-dolly-15k
|
6 |
+
- glaiveai/glaive-code-assistant-v3
|
7 |
+
- glaiveai/glaive-function-calling-v2
|
8 |
+
- gretelai/synthetic_text_to_sql
|
9 |
+
- meta-math/MetaMathQA
|
10 |
+
- microsoft/orca-math-word-problems-200k
|
11 |
+
- neural-bridge/rag-dataset-12000
|
12 |
+
- neural-bridge/rag-hallucination-dataset-1000
|
13 |
+
- nvidia/HelpSteer
|
14 |
+
- OpenAssistant/oasst2
|
15 |
+
language:
|
16 |
+
- en
|
17 |
+
- ja
|
18 |
+
tags:
|
19 |
+
- mixtral
|
20 |
+
- steerlm
|
21 |
+
base_model: tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1
|
22 |
---
|
23 |
|
24 |
+
# KARAKURI LM 8x7B Instruct v0.1
|
|
|
|
|
|
|
25 |
|
26 |
+

|
27 |
|
28 |
## Model Details
|
29 |
|
30 |
### Model Description
|
31 |
|
32 |
+
- **Developed by:** [KARAKURI Inc.](https://about.karakuri.ai/)
|
33 |
+
- **Model type:** Mixture of Experts (MoE)
|
34 |
+
- **Languages**: Primarily English and Japanese
|
35 |
+
- **License:** Apache 2.0
|
36 |
+
- **Finetuned from model:** [tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1](https://huggingface.co/tokyotech-llm/Swallow-MX-8x7b-NVE-v0.1)
|
37 |
+
- **Contact**: For questions and comments about the model, please email `[email protected]`
|
38 |
+
- **Demo**: https://lm.karakuri.cc/
|
39 |
+
|
40 |
+
## Usage
|
41 |
+
|
42 |
+
### Prompt Template
|
43 |
+
|
44 |
+
The model uses the same prompt template as [Command R+](https://huggingface.co/CohereForAI/c4ai-command-r-plus), except that it contains [attribute values](#attribute-values).
|
45 |
+
|
46 |
+
#### Chat
|
47 |
+
|
48 |
+
```python
|
49 |
+
from transformers import AutoTokenizer
|
50 |
+
|
51 |
+
tokenizer = AutoTokenizer.from_pretrained("karakuri-ai/karakuri-lm-8x7b-instruct-v0.1")
|
52 |
+
|
53 |
+
messages = [
|
54 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
55 |
+
{"role": "user", "content": "Hello!"},
|
56 |
+
{"role": "assistant", "content": "Hello! How can I help you today?"},
|
57 |
+
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
|
58 |
+
]
|
59 |
+
tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
|
60 |
+
```
|
61 |
+
|
62 |
+
#### Tool Use
|
63 |
+
|
64 |
+
```python
|
65 |
+
messages = [
|
66 |
+
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
|
67 |
+
]
|
68 |
+
tools = [
|
69 |
+
{
|
70 |
+
"name": "internet_search",
|
71 |
+
"description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
|
72 |
+
"parameters": {
|
73 |
+
"type": "object",
|
74 |
+
"properties": {
|
75 |
+
"query": {
|
76 |
+
"type": "string",
|
77 |
+
"description": "Query to search the internet with"
|
78 |
+
}
|
79 |
+
},
|
80 |
+
"required": ["query"]
|
81 |
+
}
|
82 |
+
},
|
83 |
+
{
|
84 |
+
"name": "directly_answer",
|
85 |
+
"description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
|
86 |
+
"parameters": {
|
87 |
+
"type": "object",
|
88 |
+
"properties": {}
|
89 |
+
}
|
90 |
+
}
|
91 |
+
]
|
92 |
+
tokenizer.apply_chat_template(
|
93 |
+
messages,
|
94 |
+
chat_template="tool_use",
|
95 |
+
tools=tools,
|
96 |
+
add_generation_prompt=True,
|
97 |
+
tokenize=False,
|
98 |
+
)
|
99 |
+
```
|
100 |
+
|
101 |
+
#### RAG
|
102 |
+
|
103 |
+
```python
|
104 |
+
messages = [
|
105 |
+
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
|
106 |
+
]
|
107 |
+
documents = [
|
108 |
+
{
|
109 |
+
"title": "Tsukiji Outer Market",
|
110 |
+
"text": "While the inner wholesale market has moved to Toyosu, Tsukiji Outer Market remains a bustling hub for fresh seafood and street food. Enjoy sushi, sashimi, and other delicacies while exploring the vibrant market streets.",
|
111 |
+
},
|
112 |
+
{
|
113 |
+
"title": "Meiji Shrine",
|
114 |
+
"text": "Nestled in a lush forest in the heart of the city, Meiji Shrine offers a peaceful retreat from the urban hustle. Dedicated to Emperor Meiji and Empress Shoken, the shrine is a popular site for traditional Japanese weddings. Stroll along the serene paths and experience a moment of tranquility."
|
115 |
+
}
|
116 |
+
]
|
117 |
+
tokenizer.apply_chat_template(
|
118 |
+
messages,
|
119 |
+
chat_template="rag",
|
120 |
+
documents=documents,
|
121 |
+
add_generation_prompt=True,
|
122 |
+
tokenize=False,
|
123 |
+
)
|
124 |
+
```
|
125 |
+
|
126 |
+
### Attribute Values
|
127 |
+
|
128 |
+
The prompt template contains nine attributes.
|
129 |
+
The first five are derived from HelpSteer, while the remaining four are derived from OASST2.
|
130 |
+
The values are represented by integers ranging from 0 to 4, with 0 being the lowest and 4 being the highest.
|
131 |
+
|
132 |
+
- helpfulness (default: 4): Overall helpfulness of the response to the prompt.
|
133 |
+
- correctness (default: 4): Inclusion of all pertinent facts without errors.
|
134 |
+
- coherence (default: 4): Consistency and clarity of expression.
|
135 |
+
- complexity (default: 4): Intellectual depth required to write response (i.e. whether the response can be written by anyone with basic language competency or requires deep domain expertise).
|
136 |
+
- verbosity (default: 4): Amount of detail included in the response, relative to what is asked for in the prompt.
|
137 |
+
- quality (default: 4): Perceived goodness of response.
|
138 |
+
- toxicity (default: 0): Undesirable elements such as vulgar, harmful or potentially biased response.
|
139 |
+
- humor (default: 0): Sense of humor within response.
|
140 |
+
- creativity (default: 0): Willingness to generate non-conventional response.
|
141 |
+
|
142 |
+
If you want to change the attribute values from the default values specified in the template, you can pass them as arguments to the `apply_chat_template` method as follows:
|
143 |
+
|
144 |
+
```python
|
145 |
+
messages = [
|
146 |
+
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
|
147 |
+
]
|
148 |
+
tokenizer.apply_chat_template(
|
149 |
+
messages,
|
150 |
+
add_generation_prompt=True,
|
151 |
+
tokenize=False,
|
152 |
+
helpfulness=0,
|
153 |
+
correctness=0,
|
154 |
+
coherence=2,
|
155 |
+
complexity=0,
|
156 |
+
verbosity=3,
|
157 |
+
quality=0,
|
158 |
+
toxicity=4,
|
159 |
+
humor=1,
|
160 |
+
creativity=1,
|
161 |
+
)
|
162 |
+
```
|
163 |
+
|
164 |
+
### Run the model
|
165 |
+
|
166 |
+
```python
|
167 |
+
from transformers import AutoModelForCausalLM
|
168 |
+
|
169 |
+
model = AutoModelForCausalLM.from_pretrained(
|
170 |
+
"karakuri-ai/karakuri-lm-8x7b-instruct-v0.1",
|
171 |
+
torch_dtype="auto",
|
172 |
+
device_map="auto",
|
173 |
+
)
|
174 |
+
|
175 |
+
messages = [
|
176 |
+
{"role": "system", "content": "You are a helpful assistant."},
|
177 |
+
{"role": "user", "content": "I'm planning a day trip to Tokyo this weekend. Can you recommend a quick sightseeing plan?"}
|
178 |
+
]
|
179 |
+
|
180 |
+
input_ids = tokenizer.apply_chat_template(
|
181 |
+
messages,
|
182 |
+
add_generation_prompt=True,
|
183 |
+
return_tensors="pt",
|
184 |
+
).to(model.device)
|
185 |
+
outputs = model.generate(input_ids, max_new_tokens=512)
|
186 |
+
tokenizer.decode(outputs[0][input_ids.shape[-1]:])
|
187 |
+
```
|
188 |
|
189 |
## Training Details
|
190 |
|
191 |
### Training Data
|
192 |
|
193 |
+
The model was trained on approximately 1 billion tokens of fine-tuning data.
|
194 |
+
The details are as follows:
|
195 |
+
|
196 |
+
| Dataset | # Tokens / Epoch | # Epochs | # Tokens | Percent |
|
197 |
+
| :--------------------------------------------------------------------------------------------------------------------------- | ---------------: | -------: | -------: | ------: |
|
198 |
+
| [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | 3M | 5 | 16M | 1.5% |
|
199 |
+
| [glaiveai/glaive-code-assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3) | 520M | 0.3 | 156M | 14.6% |
|
200 |
+
| [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 52M | 3 | 157M | 14.7% |
|
201 |
+
| [gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) | 19M | 3 | 57M | 5.3% |
|
202 |
+
| [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 81M | 1 | 81M | 7.6% |
|
203 |
+
| [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 67M | 1 | 67M | 6.3% |
|
204 |
+
| [neural-bridge/rag-dataset-12000](https://huggingface.co/datasets/neural-bridge/rag-dataset-12000) | 12M | 5 | 61M | 5.7% |
|
205 |
+
| [neural-bridge/rag-hallucination-dataset-1000](https://huggingface.co/datasets/neural-bridge/rag-hallucination-dataset-1000) | 1M | 5 | 5M | 0.5% |
|
206 |
+
| [nvidia/HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) | 24M | 5 | 118M | 11.0% |
|
207 |
+
| [OpenAssistant/oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) | 27M | 5 | 133M | 12.4% |
|
208 |
+
| KARAKURI Instruction Dataset | 1M | 5 | 6M | 0.6% |
|
209 |
+
| KARAKURI Corpus | 214M | 1 | 214M | 20.0% |
|
210 |
+
|
211 |
+
### Training Infrastructure
|
212 |
+
|
213 |
+
- **Hardware**: The model was trained on 8 nodes of an Amazon EC2 trn1.32xlarge instance.
|
214 |
+
- **Software**: We use code based on [neuronx-nemo-megatron](https://github.com/aws-neuron/neuronx-nemo-megatron).
|
215 |
+
|
216 |
+
## Citation
|
217 |
+
|
218 |
+
```
|
219 |
+
@misc{karakuri_lm_8x7b_instruct_v01,
|
220 |
+
author = { {KARAKURI} {I}nc. },
|
221 |
+
title = { {KARAKURI} {LM} 8x7{B} {I}nstruct v0.1 },
|
222 |
+
year = { 2024 },
|
223 |
+
url = { https://huggingface.co/karakuri-ai/karakuri-lm-8x7b-instruct-v0.1 },
|
224 |
+
publisher = { Hugging Face },
|
225 |
+
journal = { Hugging Face repository }
|
226 |
+
}
|
227 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|