danielhanchen commited on
Commit
09ac730
·
verified ·
1 Parent(s): 90c9869

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +277 -3
README.md CHANGED
@@ -1,3 +1,277 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - unsloth
4
+ license: mit
5
+ library_name: transformers
6
+ base_model:
7
+ - deepcogito/cogito-v2-preview-deepseek-671B-MoE
8
+ ---
9
+ > [!NOTE]
10
+ > Includes Unsloth **chat template fixes**! <br> For `llama.cpp`, use `--jinja`
11
+ >
12
+
13
+ <div>
14
+ <p style="margin-top: 0;margin-bottom: 0;">
15
+ <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
16
+ </p>
17
+ <div style="display: flex; gap: 5px; align-items: center; ">
18
+ <a href="https://github.com/unslothai/unsloth/">
19
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
20
+ </a>
21
+ <a href="https://discord.gg/unsloth">
22
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
23
+ </a>
24
+ <a href="https://docs.unsloth.ai/">
25
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
26
+ </a>
27
+ </div>
28
+ </div>
29
+
30
+
31
+ <p align="center">
32
+ <img src="images/deep-cogito-logo.png" alt="Logo" width="40%">
33
+ </p>
34
+
35
+
36
+ # Cogito v2 preview - 671B MoE
37
+
38
+ [Blog Post](https://www.deepcogito.com/research/cogito-v2-preview)
39
+
40
+ The Cogito v2 LLMs are instruction tuned generative models. All models are released under an open license for commercial use.
41
+
42
+ - Cogito v2 models are hybrid reasoning models. Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
43
+ - The LLMs are trained using **Iterated Distillation and Amplification (IDA)** - an scalable and efficient alignment strategy for superintelligence using iterative self-improvement.
44
+ - The models have been optimized for coding, STEM, instruction following and general helpfulness, and have significantly higher multilingual, coding and tool calling capabilities than size equivalent counterparts.
45
+ - In both standard and reasoning modes, Cogito v2-preview models outperform their size equivalent counterparts on common industry benchmarks.
46
+ - This model is trained in over 30 languages and supports a context length of 128k.
47
+
48
+ # Evaluations
49
+ For detailed evaluations, please refer to the [Blog Post](https://www.deepcogito.com/research/cogito-v2-preview).
50
+
51
+
52
+ # Usage
53
+ Here is a snippet below for usage with Transformers:
54
+
55
+ ```python
56
+ import transformers
57
+ import torch
58
+
59
+ model_id = "deepcogito/cogito-v2-preview-deepseek-671B-MoE"
60
+
61
+ pipeline = transformers.pipeline(
62
+ "text-generation",
63
+ model=model_id,
64
+ model_kwargs={"torch_dtype": torch.bfloat16},
65
+ device_map="auto",
66
+ )
67
+
68
+ messages = [
69
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
70
+ {"role": "user", "content": "Give me a short introduction to LLMs."},
71
+ ]
72
+
73
+ outputs = pipeline(
74
+ messages,
75
+ max_new_tokens=512,
76
+ )
77
+
78
+ print(outputs[0]["generated_text"][-1])
79
+ ```
80
+
81
+
82
+
83
+ ## Implementing extended thinking
84
+ - By default, the model will answer in the standard mode.
85
+ - To enable thinking, you can do any one of the two methods:
86
+ - Set `enable_thinking=True` while applying the chat template.
87
+ - Add a specific system prompt, along with prefilling the response with "\<think\>\n".
88
+
89
+ **NOTE: Unlike Cogito v1 models, we initiate the response with "\<think\>\n" at the beginning of every output when reasoning is enabled. This is because hybrid models can be brittle at times, and adding a "\<think\>\n" ensures that the model does indeed respect thinking.**
90
+
91
+ ### Method 1 - Set enable_thinking=True in the tokenizer
92
+ If you are using Huggingface tokenizers, then you can simply use add the argument `enable_thinking=True` to the tokenization (this option is added to the chat template).
93
+
94
+ Here is an example -
95
+ ```python
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
+
98
+ model_name = "deepcogito/cogito-v2-preview-deepseek-671B-MoE"
99
+
100
+ model = AutoModelForCausalLM.from_pretrained(
101
+ model_name,
102
+ torch_dtype="auto",
103
+ device_map="auto"
104
+ )
105
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
106
+
107
+ prompt = "Give me a short introduction to LLMs."
108
+ messages = [
109
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
110
+ {"role": "user", "content": prompt}
111
+ ]
112
+
113
+ text = tokenizer.apply_chat_template(
114
+ messages,
115
+ tokenize=False,
116
+ add_generation_prompt=True,
117
+ enable_thinking=True
118
+ )
119
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
120
+
121
+ generated_ids = model.generate(
122
+ **model_inputs,
123
+ max_new_tokens=512
124
+ )
125
+ generated_ids = [
126
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
127
+ ]
128
+
129
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
130
+ print(response)
131
+ ```
132
+
133
+ ### Method 2 - Add a specific system prompt, along with prefilling the response with "\<think\>\n".
134
+ To enable thinking using this method, you need to do two parts -
135
+
136
+
137
+ Step 1 - Simply use this in the system prompt `system_instruction = 'Enable deep thinking subroutine.'`
138
+
139
+ If you already have a system_instruction, then use `system_instruction = 'Enable deep thinking subroutine.' + '\n\n' + system_instruction`.
140
+
141
+ Step 2 - Prefil the response with the tokens `"<think>\n"`.
142
+
143
+ Here is an example -
144
+
145
+ ```python
146
+ import transformers
147
+ import torch
148
+
149
+ model_name = "deepcogito/cogito-v2-preview-deepseek-671B-MoE"
150
+
151
+ model = AutoModelForCausalLM.from_pretrained(
152
+ model_name,
153
+ torch_dtype="auto",
154
+ device_map="auto"
155
+ )
156
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
157
+
158
+ # Step 1 - Add deep thinking instruction.
159
+ DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
160
+
161
+ messages = [
162
+ {"role": "system", "content": DEEP_THINKING_INSTRUCTION},
163
+ {"role": "user", "content": "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."},
164
+ ]
165
+
166
+ text = tokenizer.apply_chat_template(
167
+ messages,
168
+ tokenize=False,
169
+ add_generation_prompt=True
170
+ )
171
+
172
+ # Step 2 - Prefill response with "<think>\n".
173
+ text += "<think>\n"
174
+
175
+ # Now, continue as usual.
176
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
177
+
178
+ generated_ids = model.generate(
179
+ **model_inputs,
180
+ max_new_tokens=512
181
+ )
182
+ generated_ids = [
183
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
184
+ ]
185
+
186
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
187
+ print(response)
188
+ ```
189
+
190
+
191
+ Similarly, if you have a system prompt, you can append the `DEEP_THINKING_INSTRUCTION` to the beginning in this way -
192
+
193
+ ```python
194
+ DEEP_THINKING_INSTRUCTION = "Enable deep thinking subroutine."
195
+
196
+ system_prompt = "Reply to each prompt with only the actual code - no explanations."
197
+ prompt = "Write a bash script that takes a matrix represented as a string with format '[1,2],[3,4],[5,6]' and prints the transpose in the same format."
198
+
199
+ messages = [
200
+ {"role": "system", "content": DEEP_THINKING_INSTRUCTION + '\n\n' + system_prompt},
201
+ {"role": "user", "content": prompt}
202
+ ]
203
+ ```
204
+
205
+
206
+ # Tool Calling
207
+ Cogito models support tool calling (single, parallel, multiple and parallel_multiple) both in standard and extended thinking mode.
208
+
209
+ Here is a snippet -
210
+
211
+ ```python
212
+ # First, define a tool
213
+ def get_current_temperature(location: str) -> float:
214
+ """
215
+ Get the current temperature at a location.
216
+
217
+ Args:
218
+ location: The location to get the temperature for, in the format "City, Country"
219
+ Returns:
220
+ The current temperature at the specified location in the specified units, as a float.
221
+ """
222
+ return 22. # A real function should probably actually get the temperature!
223
+
224
+ # Next, create a chat and apply the chat template
225
+ messages = [
226
+ {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
227
+ ]
228
+
229
+ model_inputs = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True)
230
+
231
+ text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
232
+ inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
233
+ outputs = model.generate(**inputs, max_new_tokens=512)
234
+ output_text = tokenizer.batch_decode(outputs)[0][len(text):]
235
+ print(output_text)
236
+ ```
237
+
238
+ This will result in the output -
239
+ ```
240
+ <|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>get_current_temperature
241
+ ```json
242
+ {"location":"Paris, France"}
243
+ ```<|tool▁call▁end|><|tool▁calls▁end|><|end▁of▁sentence|>
244
+ ```
245
+
246
+ You can then generate text from this input as normal. If the model generates a tool call, you should add it to the chat like so:
247
+
248
+ ```python
249
+ tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France"}}
250
+ messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
251
+ ```
252
+
253
+ and then call the tool and append the result, with the `tool` role, like so:
254
+
255
+ ```python
256
+ messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})
257
+ ```
258
+
259
+ After that, you can `generate()` again to let the model use the tool result in the chat:
260
+
261
+ ```python
262
+ text = tokenizer.apply_chat_template(messages, tools=[get_current_temperature], add_generation_prompt=True, tokenize=False)
263
+ inputs = tokenizer(text, return_tensors="pt", add_special_tokens=False).to(model.device)
264
+ outputs = model.generate(**inputs, max_new_tokens=512)
265
+ output_text = tokenizer.batch_decode(outputs)[0][len(text):]
266
+ ```
267
+
268
+ This should result in the string -
269
+ ```
270
+ 'The current temperature in Paris is 22.0 degrees.<|end▁of▁sentence|>'
271
+ ```
272
+
273
+ ## License
274
+ This repository and the model weights are licensed under **MIT License**.
275
+
276
+ ## Contact
277
+ If you would like to reach out to our team, send an email to [[email protected]]([email protected]).