I do use an imatrix see file GLM-Z1-32B-0414.imatrix . Its is created with the text data that bartowski uses. I wonder if the fact that it is mostly english affects the imatrix accuracy.

I found this on llama.cpp github : https://github.com/ggml-org/llama.cpp/issues/12946 ... Related? it does mention a 64 token boundery
I am going to redo all the quants using the latest build. Lets see if the bugs are fixed

The chat template. I asked Deepseek is this chat template broken. It replied:

'''
Your template is very close to correct — but yes, it is slightly broken in subtle ways.
I'll walk you through it carefully.

First, your template:

[gMASK]
{%- if tools -%}
<|system|>你是一个名为 ChatGLM 的人工智能助手。你是基于智谱 AI 公司训练的语言模型 GLM-4 模型开发的，你的任务是针对用户的问题和要求提供适当的答复和支持。

可用工具

{% for tool in tools %}
{%- set function = tool.function if tool.get("function") else tool %}
## {{ function.name }}
{{ function | tojson(indent=4, ensure_ascii=False) }}
在调用上述函数时，请使用 Json 格式表示调用的参数。
{%- endfor %}
{%- endif -%}
{%- for msg in messages %}
{%- if msg.role == 'system' %}
<|system|> {{ msg.content }}
{%- endif %}
{%- endfor %}
{%- for message in messages if message.role != 'system' %}
{%- set role = message['role'] %}
{%- set content = message['content'] %}
{%- set visible = content.split('')[-1].strip() %}
{%- set meta = message.get("metadata", "") %}
{%- if role == 'user' %}
<|user|> {{ visible }}
{%- elif role == 'assistant' and not meta %}
<|assistant|> {{ visible }}
{%- elif role == 'assistant' and meta %}
<|assistant|>{{ meta }} {{ visible }}
{%- elif role == 'observation' %}
<|observation|> {{ visible }}
{%- endif %}
{%- endfor %}
{% if add_generation_prompt %}
<|assistant|>
{% endif %}

Problems:

Extra <|system|> duplication

    You already print <|system|> when tools exist, but then again you loop over messages and if a system message is there, you print another <|system|>.

    If your initial tool description is considered a system message, it could double-print.

混在工具描述和消息历史

    The "工具描述" (tools descriptions) and the "聊天消息" (messages) are mixed but not separated clearly.

    Some LLMs (especially GLM series) are very sensitive to input format consistency. Mixing meta sections with chat history without clear breaks can confuse them.

visible extraction is naive

    You're doing .split('</think>')[-1].strip(), assuming every message might have <think>...</think>, and you just want the part after thinking.

    But what if there's no <think> at all? Or what if multiple </think> exist?

    This should be more robust.

Assistant metadata (meta) handling is a bit messy

    If meta exists, you're simply writing {{ meta }} {{ visible }} without structure. Depending on your meta, it could look weird.

A better, corrected version

Here’s a cleaner, more correct version of your template:

[gMASK]
{%- if tools %}
<|system|>你是一个名为 ChatGLM 的人工智能助手。你是基于智谱 AI 公司训练的语言模型 GLM-4 模型开发的，你的任务是针对用户的问题和要求提供适当的答复和支持。

可用工具

{% for tool in tools %}
{%- set function = tool.function if tool.get("function") else tool %}

{{ function | tojson(indent=4, ensure_ascii=False) }}
{% endfor %}
在调用上述函数时，请使用 Json 格式表示调用的参数。
{%- endif %}

{%- for message in messages %}
{%- set role = message['role'] %}
{%- set content = message['content'] %}
{%- set visible = content.split('')[-1].strip() if '' in content else content %}
{%- set meta = message.get("metadata") %}

{%- if role == 'system' %}

<|system|> {{ visible }}
{%- elif role == 'user' %}
<|user|> {{ visible }}
{%- elif role == 'assistant' %}
{%- if meta %}
<|assistant|>{{ meta | tojson(indent=4, ensure_ascii=False) }} {{ visible }}
{%- else %}
<|assistant|> {{ visible }}
{%- endif %}
{%- elif role == 'observation' %}
<|observation|> {{ visible }}
{%- endif %}
{%- endfor %}

{%- if add_generation_prompt %}
<|assistant|>
{%- endif %}

Key Improvements:

Only one <|system|> per system message.

Correct and safe splitting for </think>.

meta is dumped cleanly as JSON if exists.

No mixing tools and history incorrectly.

Bonus

If you want to make it even cleaner, you might want to separate the tool section into a fake "system" message inside the messages array too.
That way your template is truly linear and simple.
'''

....If you are not testing using tools then the issues in the template won't create big problems

Mungert
/

GLM-Z1-32B-0414-GGUF

iMatrix/even more bugs in llama.cpp/ollama?

可用工具

可用工具

{{ function.name }}