Fixed πŸ† GLM Tool calling support in llama.cpp, raised PR

#8
by xbruce22 - opened

PR link

Bonus tip: MCP and tools now works great with Cherry studio windows app

There is one thing to note here is, OpenAI uses json in backend while llama.cpp uses python dict format to jinja parser.
Thats why Traditional OpenAI tool calling SDK may fail. The fix for this is to always put assistant's response in dict format,

Example

import json
import os
from openai import OpenAI
from datetime import datetime

openai_api_key = "EMPTY"

client = OpenAI(
    api_key=openai_api_key,
    base_url=ENDPOINT, # Assuming its defined somewhere
)

def list_files(directory_path: str = "."):
    """List all files in a given directory."""
    print(f"TOOL CALLED: list_files(directory_path='{directory_path}')")
    try:
        files = [f for f in os.listdir(directory_path) if os.path.isfile(os.path.join(directory_path, f))]
        if not files:
            return f"No files found in directory '{directory_path}'."
        return "Files in directory '{}':\n{}".format(directory_path, "\n".join(f"- {file}" for file in files))
    except Exception as e:
        return f"Error listing files: {str(e)}"

def read_file(file_name: str, start_line, end_line):
    """Read a specified range of lines from a file."""
    print(f"TOOL CALLED: read_file(file_name='{file_name}', start_line={start_line}, end_line={end_line})")
    try:
        start = int(start_line)
        end = int(end_line)
        
        with open(file_name, "r") as f:
            lines = f.readlines()
        
        # Slice the lines (adjusting for 0-based index)
        content_lines = lines[start-1:end]
        return "".join(content_lines)
    except FileNotFoundError:
        return f"Error: File '{file_name}' not found."
    except Exception as e:
        return f"Error reading file: {str(e)}"

available_functions = {
    "list_files": list_files,
    "read_file": read_file,
}

tools = [
    {
        "type": "function",
        "function": {
            "name": "list_files",
            "description": "List all the files in a specified directory.",
            "parameters": {"type": "object", "properties": {"directory_path": {"type": "string", "description": "The path to the directory."}}},
        },
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read content from a file.",
            "parameters": {
                "type": "object",
                "properties": {
                    "file_name": {"type": "string", "description": "The name of the file to read."},
                    "start_line": {"type": "integer", "description": "The starting line number."},
                    "end_line": {"type": "integer", "description": "The ending line number."}
                },
                "required": ["file_name", "start_line", "end_line"],
            },
        },
    }
]

with open("code.py", "w") as f:
    f.write("import os\n\n")
    f.write("def hello_world():\n")
    f.write("    print('Hello from code.py!')\n")

messages = [
    {"role": "system", "content": "You are a helpful file-system assistant. Think step-by-step. First, list files. Then, if the file exists, read it."},
    {"role": "user", "content": "Do I have a code.py file? If yes, what does it contain?"},
]

print("--- Initial user prompt ---")
print(messages[-1]['content'])

step = 1
max_steps = 5
while step <= max_steps:
    print(f"\n----------- STEP {step} -----------")
    
    print("--- Calling model... ---")
    completion = client.chat.completions.create(
        model="zai-org/GLM-4.5",
        messages=messages,
        tools=tools,
        temperature=0.0,
    )
    response_message = completion.choices[0].message

    if not response_message.tool_calls:
        print("\n===== FINAL RESPONSE =====")
        print(response_message.content)
        break

    print("\n===== REASONING TRACE =====")
    print(response_message.content)
    print("===========================")
    
    print("\n--- Tool call(s) received, executing... ---")

    # DICT format messages
    messages.append({
        "role": "assistant",
        "content": None,
        "tool_calls": [
            {"id": tc.id, "type": tc.type, "function": {"name": tc.function.name, "arguments": tc.function.arguments}}
            for tc in response_message.tool_calls
        ]
    })
    
    for tool_call in response_message.tool_calls:
        function_name = tool_call.function.name
        function_to_call = available_functions[function_name]
        function_args = json.loads(tool_call.function.arguments)
        
        function_response = function_to_call(**function_args)
        
        print(f"TOOL RESPONSE ({function_name}): {function_response}")

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(function_response),
        })
    
    step += 1

if step > max_steps:
    print("\nReached maximum steps. Exiting.")

Output:

--- Initial user prompt ---
Do I have a code.py file? If yes, what does it contain?

----------- STEP 1 -----------
--- Calling model... ---

===== REASONING TRACE =====

<think>The user is asking if they have a code.py file and what it contains. To answer this, I need to:

1. First, list the files in the current directory to see if there's a code.py file
2. If it exists, read its contents

Let me start by listing the files in the current directory.</think>
I'll help you check if you have a code.py file and show you its contents. Let me first list the files in the current directory to see what's available.

===========================

--- Tool call(s) received, executing... ---
TOOL CALLED: list_files(directory_path='.')
TOOL RESPONSE (list_files): Files in directory '.':
- agent_backup.py
- tool_calling.ipynb
- agent.py
- status.md
- code.py
- agent_sessions.db
- summary.md
- agent_functions.py

----------- STEP 2 -----------
--- Calling model... ---

===== REASONING TRACE =====

<think>The user asked if they have a code.py file and what it contains. From the tool response, I can see that there is indeed a code.py file in the directory. Now I need to read the contents of this file to show the user what it contains.</think>
Yes, I can see you have a code.py file in your directory. Let me read its contents for you.

===========================

--- Tool call(s) received, executing... ---
TOOL CALLED: read_file(file_name='code.py', start_line=1, end_line=100)
TOOL RESPONSE (read_file): import os

def hello_world():
    print('Hello from code.py!')


----------- STEP 3 -----------
--- Calling model... ---

===== FINAL RESPONSE =====

<think>The user asked if they have a code.py file and what it contains. From the function results, I can see:

1. The list_files function showed that there is indeed a "code.py" file in the directory.
2. The read_file function showed the contents of code.py, which contains:
   ```python
   import os

   def hello_world():
       print('Hello from code.py!')

So I can answer the user's question directly.
Yes, you do have a code.py file! It contains the following Python code:

import os

def hello_world():
    print('Hello from code.py!')

The file imports the os module and defines a simple function called hello_world() that prints a greeting message.
```

Nice! Looking forward for the PR to be proved. For now I'm building your branch to test.

Please, Thank you. Let me know how it works out.

I confirmed that I was able to use the above python script to verify tool calling with the shared template and @xbruce22 's PR. It did not work from llama.cpp master.

I'm still having some issues with tool calling in open-webui and Kilo code and left a note about it here: https://huggingface.co/unsloth/GLM-4.5-Air-GGUF/discussions/9#689bf8674e780735fd6286f1

Thanks for the awesome contribution btw @xbruce22 !

Kilo code working well for me even after re-pulling and rebuilding from scratch. I hope you have used common-glm45-tool-calls branch for building llama.cpp server executable.

used ./llama.cpp/build/bin/llama-server -hf unsloth/GLM-4.5-Air-GGUF:IQ2_M --alias GLM-4.5-Air-GPUs -c 60000 --host 0.0.0.0 -np 1 -ngl 999 -ts 72,28 -b 1024 -ub 256 --jinja --chat-template-file template/chat_template.jinja to run my llama.cpp server

image.png

Sign up or log in to comment