moonshotai
/

Kimi-K2-Instruct-0905

@@ -1,241 +1,258 @@
-## Tool Calling
-To enable the tool calling feature, you may need to set certain tool calling parser options when starting the service. See [deploy_guidance](./deploy_guidance.md) for details.
-In Kimi-K2, a tool calling process includes:
-- Passing function descriptions to Kimi-K2
-- Kimi-K2 decides to make a function call and returns the necessary information for the function call to the user
-- The user performs the function call, collects the call results, and passes the function call results to Kimi-K2
-- Kimi-K2 continues to generate content based on the function call results until the model believes it has obtained sufficient information to respond to the user
-### Preparing Tools
-Suppose we have a function `get_weather` that can query the weather conditions in real-time.
-This function accepts a city name as a parameter and returns the weather conditions. We need to prepare a structured description for it so that Kimi-K2 can understand its functionality.
-```python
-def get_weather(city):
-    return {"weather": "Sunny"}
-# Collect the tool descriptions in tools
-tools = [{
-    "type": "function",
-    "function": {
-        "name": "get_weather",
-        "description": "Get weather information. Call this tool when the user needs to get weather information",
-         "parameters": {
-              "type": "object",
-              "required": ["city"],
-              "properties": {
-                  "city": {
-                      "type": "string",
-                      "description": "City name",
-                }
-            }
-        }
-    }
-}]
-# Tool name->object mapping for easy calling later
-tool_map = {
-    "get_weather": get_weather
-}
-```
-### Chat with tools
-We use `openai.OpenAI` to send messages to Kimi-K2 along with tool descriptions. Kimi-K2 will autonomously decide whether to use and how to use the provided tools.
-If Kimi-K2 believes a tool call is needed, it will return a result with `finish_reason='tool_calls'`. At this point, the returned result includes the tool call information.
-After calling tools with the provided information, we then need to append the tool call results to the chat history and continue calling Kimi-K2.
-Kimi-K2 may need to call tools multiple times until the model believes the current results can answer the user's question. We should check `finish_reason` until it is not `tool_calls`.
-The results obtained by the user after calling the tools should be added to `messages` with `role='tool'`.
-```python
-import json
-from openai import OpenAI
-model_name='moonshotai/Kimi-K2-Instruct'
-client = OpenAI(base_url=endpoint,
-                        api_key='xxx')
-messages = [
-{"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
-]
-finish_reason = None
-while finish_reason is None or finish_reason == "tool_calls":
-    completion = client.chat.completions.create(
-        model=model_name,
-        messages=messages,
-        temperature=0.3,
-        tools=tools,
-        tool_choice="auto",
-    )
-    choice = completion.choices[0]
-    finish_reason = choice.finish_reason
-    # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly
-    if finish_reason == "tool_calls":
-        messages.append(choice.message)
-        for tool_call in choice.message.tool_calls:
-            tool_call_name = tool_call.function.name
-            tool_call_arguments = json.loads(tool_call.function.arguments)
-            tool_function = tool_map[tool_call_name]
-            tool_result = tool_function(tool_call_arguments)
-            print("tool_result", tool_result)
-            messages.append({
-                "role": "tool",
-                "tool_call_id": tool_call.id,
-                "name": tool_call_name,
-                "content": json.dumps(tool_result),
-            })
-print('-' * 100)
-print(choice.message.content)
-```
-### Tool Calling in Streaming Mode
-Tool calling can also be used in streaming mode. In this case, we need to collect the tool call information returned in the stream until we have a complete tool call. Please refer to the code below:
-```python
-messages = [
-    {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
-]
-finish_reason = None
-msg = ''
-while finish_reason is None or finish_reason == "tool_calls":
-    completion = client.chat.completions.create(
-        model=model_name,
-        messages=messages,
-        temperature=0.3,
-        tools=tools,
-        tool_choice="auto",
-        stream=True
-    )
-    tool_calls = []
-    for chunk in completion:
-        delta = chunk.choices[0].delta
-        if delta.content:
-            msg += delta.content
-        if delta.tool_calls:
-            for tool_call_chunk in delta.tool_calls:
-                if tool_call_chunk.index is not None:
-                    # Extend the tool_calls list
-                    while len(tool_calls) <= tool_call_chunk.index:
-                        tool_calls.append({
-                            "id": "",
-                            "type": "function",
-                            "function": {
-                                "name": "",
-                                "arguments": ""
-                            }
-                        })
-                    tc = tool_calls[tool_call_chunk.index]
-                    if tool_call_chunk.id:
-                        tc["id"] += tool_call_chunk.id
-                    if tool_call_chunk.function.name:
-                        tc["function"]["name"] += tool_call_chunk.function.name
-                    if tool_call_chunk.function.arguments:
-                        tc["function"]["arguments"] += tool_call_chunk.function.arguments
-        finish_reason = chunk.choices[0].finish_reason
-    # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly
-    if finish_reason == "tool_calls":
-        for tool_call in tool_calls:
-            tool_call_name = tool_call['function']['name']
-            tool_call_arguments = json.loads(tool_call['function']['arguments'])
-            tool_function = tool_map[tool_call_name]
-            tool_result = tool_function(tool_call_arguments)
-            messages.append({
-                "role": "tool",
-                "tool_call_id": tool_call['id'],
-                "name": tool_call_name,
-                "content": json.dumps(tool_result),
-            })
-        # The text generated by the tool call is not the final version, reset msg
-        msg = ''
-    print(msg)
-```
-### Manually Parsing Tool Calls
-The tool call requests generated by Kimi-K2 can also be parsed manually, which is especially useful when the service you are using does not provide a tool-call parser.
-The tool call requests generated by Kimi-K2 are wrapped by `<|tool_calls_section_begin|>` and `<|tool_calls_section_end|>`,
-with each tool call wrapped by `<|tool_call_begin|>` and `<|tool_call_end|>`. The tool ID and arguments are separated by `<|tool_call_argument_begin|>`.
-The format of the tool ID is `functions.{func_name}:{idx}`, from which we can parse the function name.
-Based on the above rules, we can directly post request to the completions interface and manually parse tool calls.
-```python
-import requests
-from transformers import AutoTokenizer
-messages = [
-    {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
-]
-msg = ''
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-while True:
-    text = tokenizer.apply_chat_template(
-        messages,
-        tokenize=False,
-        tools=tools,
-        add_generation_prompt=True,
-    )
-    payload = {
-        "model": model_name,
-        "prompt": text,
-        "max_tokens": 512
-    }
-    response = requests.post(
-        f"{endpoint}/completions",
-        headers={"Content-Type": "application/json"},
-        json=payload,
-        stream=False,
-    )
-    raw_out = response.json()
-    raw_output = raw_out["choices"][0]["text"]
-    tool_calls = extract_tool_call_info(raw_output)
-    if len(tool_calls) == 0:
-        # No tool calls
-        msg = raw_output
-        break
-    else:
-        for tool_call in tool_calls:
-            tool_call_name = tool_call['function']['name']
-            tool_call_arguments = json.loads(tool_call['function']['arguments'])
-            tool_function = tool_map[tool_call_name]
-            tool_result = tool_function(tool_call_arguments)
-            messages.append({
-                "role": "tool",
-                "tool_call_id": tool_call['id'],
-                "name": tool_call_name,
-                "content": json.dumps(tool_result),
-            })
-print('-' * 100)
-print(msg)
-```
-Here, `extract_tool_call_info` parses the model output and returns the model call information. A simple implementation would be:
-```python
-def extract_tool_call_info(tool_call_rsp: str):
-    if '<|tool_calls_section_begin|>' not in tool_call_rsp:
-        # No tool calls
-        return []
-    import re
-    pattern = r"<\|tool_calls_section_begin\|>(.*?)<\|tool_calls_section_end\|>"
-    tool_calls_sections = re.findall(pattern, tool_call_rsp, re.DOTALL)
-    # Extract multiple tool calls
-    func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
-    tool_calls = []
-    for match in re.findall(func_call_pattern, tool_calls_sections[0], re.DOTALL):
-        function_id, function_args = match
-        # function_id: functions.get_weather:0
-        function_name = function_id.split('.')[1].split(':')[0]
-        tool_calls.append(
-            {
-                "id": function_id,
-                "type": "function",
-                "function": {
-                    "name": function_name,
-                    "arguments": function_args
-                }
-            }
-        )
-    return tool_calls
-```

+## Tool Calling
+To enable the tool calling feature, you may need to set certain tool calling parser options when starting the service. See [deploy_guidance](./deploy_guidance.md) for details.
+In Kimi-K2, a tool calling process includes:
+- Passing function descriptions to Kimi-K2
+- Kimi-K2 decides to make a function call and returns the necessary information for the function call to the user
+- The user performs the function call, collects the call results, and passes the function call results to Kimi-K2
+- Kimi-K2 continues to generate content based on the function call results until the model believes it has obtained sufficient information to respond to the user
+### Preparing Tools
+Suppose we have a function `get_weather` that can query the weather conditions in real-time.
+This function accepts a city name as a parameter and returns the weather conditions. We need to prepare a structured description for it so that Kimi-K2 can understand its functionality.
+```python
+def get_weather(city):
+    return {"weather": "Sunny"}
+# Collect the tool descriptions in tools
+tools = [{
+    "type": "function",
+    "function": {
+        "name": "get_weather",
+        "description": "Get weather information. Call this tool when the user needs to get weather information",
+         "parameters": {
+              "type": "object",
+              "required": ["city"],
+              "properties": {
+                  "city": {
+                      "type": "string",
+                      "description": "City name",
+                }
+            }
+        }
+    }
+}]
+# Tool name->object mapping for easy calling later
+tool_map = {
+    "get_weather": get_weather
+}
+```
+### Chat with tools
+We use `openai.OpenAI` to send messages to Kimi-K2 along with tool descriptions. Kimi-K2 will autonomously decide whether to use and how to use the provided tools.
+If Kimi-K2 believes a tool call is needed, it will return a result with `finish_reason='tool_calls'`. At this point, the returned result includes the tool call information.
+After calling tools with the provided information, we then need to append the tool call results to the chat history and continue calling Kimi-K2.
+Kimi-K2 may need to call tools multiple times until the model believes the current results can answer the user's question. We should check `finish_reason` until it is not `tool_calls`.
+The results obtained by the user after calling the tools should be added to `messages` with `role='tool'`.
+```python
+import json
+from openai import OpenAI
+model_name='moonshotai/Kimi-K2-Instruct'
+client = OpenAI(base_url=endpoint,
+                        api_key='xxx')
+messages = [
+{"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
+]
+finish_reason = None
+while finish_reason is None or finish_reason == "tool_calls":
+    completion = client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0.3,
+        tools=tools,
+        tool_choice="auto",
+    )
+    choice = completion.choices[0]
+    finish_reason = choice.finish_reason
+    # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly
+    if finish_reason == "tool_calls":
+        messages.append(choice.message)
+        for tool_call in choice.message.tool_calls:
+            tool_call_name = tool_call.function.name
+            tool_call_arguments = json.loads(tool_call.function.arguments)
+            tool_function = tool_map[tool_call_name]
+            tool_result = tool_function(tool_call_arguments)
+            print("tool_result", tool_result)
+            messages.append({
+                "role": "tool",
+                "tool_call_id": tool_call.id,
+                "name": tool_call_name,
+                "content": json.dumps(tool_result),
+            })
+print('-' * 100)
+print(choice.message.content)
+```
+### Tool Calling in Streaming Mode
+Tool calling can also be used in streaming mode. In this case, we need to collect the tool call information returned in the stream until we have a complete tool call. Please refer to the code below:
+```python
+messages = [
+    {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
+]
+finish_reason = None
+msg = ''
+while finish_reason is None or finish_reason == "tool_calls":
+    completion = client.chat.completions.create(
+        model=model_name,
+        messages=messages,
+        temperature=0.3,
+        tools=tools,
+        tool_choice="auto",
+        stream=True
+    )
+    tool_calls = []
+    for chunk in completion:
+        delta = chunk.choices[0].delta
+        if delta.content:
+            msg += delta.content
+        if delta.tool_calls:
+            for tool_call_chunk in delta.tool_calls:
+                if tool_call_chunk.index is not None:
+                    # Extend the tool_calls list
+                    while len(tool_calls) <= tool_call_chunk.index:
+                        tool_calls.append({
+                            "id": "",
+                            "type": "function",
+                            "function": {
+                                "name": "",
+                                "arguments": ""
+                            }
+                        })
+                    tc = tool_calls[tool_call_chunk.index]
+                    if tool_call_chunk.id:
+                        tc["id"] += tool_call_chunk.id
+                    if tool_call_chunk.function.name:
+                        tc["function"]["name"] += tool_call_chunk.function.name
+                    if tool_call_chunk.function.arguments:
+                        tc["function"]["arguments"] += tool_call_chunk.function.arguments
+        finish_reason = chunk.choices[0].finish_reason
+    # Note: The finish_reason when tool calls end may vary across different engines, so this condition check needs to be adjusted accordingly
+    if finish_reason == "tool_calls":
+        for tool_call in tool_calls:
+            tool_call_name = tool_call['function']['name']
+            tool_call_arguments = json.loads(tool_call['function']['arguments'])
+            tool_function = tool_map[tool_call_name]
+            tool_result = tool_function(tool_call_arguments)
+            messages.append({
+                "role": "tool",
+                "tool_call_id": tool_call['id'],
+                "name": tool_call_name,
+                "content": json.dumps(tool_result),
+            })
+        # The text generated by the tool call is not the final version, reset msg
+        msg = ''
+    print(msg)
+```
+### Manually Parsing Tool Calls
+The tool call requests generated by Kimi-K2 can also be parsed manually, which is especially useful when the service you are using does not provide a tool-call parser.
+The tool call requests generated by Kimi-K2 are wrapped by `<|tool_calls_section_begin|>` and `<|tool_calls_section_end|>`,
+with each tool call wrapped by `<|tool_call_begin|>` and `<|tool_call_end|>`. The tool ID and arguments are separated by `<|tool_call_argument_begin|>`.
+The format of the tool ID is `functions.{func_name}:{idx}`, from which we can parse the function name.
+Based on the above rules, we can directly post request to the completions interface and manually parse tool calls.
+```python
+import requests
+from transformers import AutoTokenizer
+messages = [
+    {"role": "user", "content": "What's the weather like in Beijing today? Let's check using the tool."}
+]
+msg = ''
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+while True:
+    text = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        tools=tools,
+        add_generation_prompt=True,
+    )
+    payload = {
+        "model": model_name,
+        "prompt": text,
+        "max_tokens": 512
+    }
+    response = requests.post(
+        f"{endpoint}/completions",
+        headers={"Content-Type": "application/json"},
+        json=payload,
+        stream=False,
+    )
+    raw_out = response.json()
+    raw_output = raw_out["choices"][0]["text"]
+    tool_calls = extract_tool_call_info(raw_output)
+    if len(tool_calls) == 0:
+        # No tool calls
+        msg = raw_output
+        break
+    else:
+        for tool_call in tool_calls:
+            tool_call_name = tool_call['function']['name']
+            tool_call_arguments = json.loads(tool_call['function']['arguments'])
+            tool_function = tool_map[tool_call_name]
+            tool_result = tool_function(tool_call_arguments)
+            messages.append({
+                "role": "tool",
+                "tool_call_id": tool_call['id'],
+                "name": tool_call_name,
+                "content": json.dumps(tool_result),
+            })
+print('-' * 100)
+print(msg)
+```
+Here, `extract_tool_call_info` parses the model output and returns the model call information. A simple implementation would be:
+```python
+def extract_tool_call_info(tool_call_rsp: str):
+    if '<|tool_calls_section_begin|>' not in tool_call_rsp:
+        # No tool calls
+        return []
+    import re
+    pattern = r"<\|tool_calls_section_begin\|>(.*?)<\|tool_calls_section_end\|>"
+    tool_calls_sections = re.findall(pattern, tool_call_rsp, re.DOTALL)
+    # Extract multiple tool calls
+    func_call_pattern = r"<\|tool_call_begin\|>\s*(?P<tool_call_id>[\w\.]+:\d+)\s*<\|tool_call_argument_begin\|>\s*(?P<function_arguments>.*?)\s*<\|tool_call_end\|>"
+    tool_calls = []
+    for match in re.findall(func_call_pattern, tool_calls_sections[0], re.DOTALL):
+        function_id, function_args = match
+        # function_id: functions.get_weather:0
+        function_name = function_id.split('.')[1].split(':')[0]
+        tool_calls.append(
+            {
+                "id": function_id,
+                "type": "function",
+                "function": {
+                    "name": function_name,
+                    "arguments": function_args
+                }
+            }
+        )
+    return tool_calls
+```
+## FAQ
+#### Q1: I received special tokens like '<|tool_call_begin|>' in the 'content' field instead of a normal tool_call.
+This indicates a tool-call crash, which most often occurs in multi-turn tool-calling scenarios due to incorrect tool-call ID. K2 expects the ID to follow the format `functions.func_name:idx`, where `functions` is a fixed string; `func_name` is the actual function name, like `get_weather`, and `idx` is a global counter that starts at 0 and increments with each function invocation.
+Please check all tool-call IDs in the message list.
+#### Q2: My tool-call ID is incorrect—how can I fix it?
+First, make sure your code and chat template are up to date with the latest version from the Hugging Face repo.
+If you're using vLLM or SGLang and they are generating random tool-call IDs, upgrade them to the latest release. For other frameworks, you must either parse the tool-call ID from the model output and set it correctly in the server-side response, or rewrite every tool-call ID according to the rules above on the client side before sending the messages to Kimi K2.
+#### Q3: My tool call id is correct, but I still get crashed in multiturn tool call.
+Please describe your situation in the [discussion](https://huggingface.co/moonshotai/Kimi-K2-Instruct-0905/discussions)