bartowski/ServiceNow-AI_Apriel-Nemotron-15b-Thinker-GGUF · Improved Jinja Chat Template for Apriel-Nemotron-15b-Thinker GGUF (e.g., for LM Studio)

about 1 month ago

Hi everyone,

I'm debeast6 on Hugging Face, and I've been working on getting the ServiceNow-AI/Apriel-Nemotron-15b-Thinker model (specifically the GGUF versions like this one from bartowski) to run smoothly in LM Studio. I wanted to share a Jinja chat template that has proven to be quite robust.

This template was developed with the assistance of Google's Gemini AI. It aims to:

Resolve Jinja Parsing Errors: It includes defensive checks for message types and context variables (like tools and messages) to prevent common parsing errors (e.g., "Expected iterable type," "Unknown statement type") that can occur with less guarded templates in environments like LM Studio.
Correctly Prompt for Native Reasoning: It prompts the model to use its intended output format, starting with "Here are my reasoning steps:" followed by the detailed reasoning, and then the final answer enclosed in [BEGIN FINAL RESPONSE]...[END FINAL RESPONSE] tags.
Support Tool Usage (Placeholders): The template structure includes logic for defining and using tools, compatible with the model's expected format for tool descriptions and calls.
I found this template necessary because default or simpler templates sometimes led to issues. This one has been working well for me, allowing the model to demonstrate its reasoning capabilities effectively.

Here is the Jinja template:

{%- set reasoning_prompt_instruction = "You are a thoughtful and systematic AI assistant built by ServiceNow Language Models (SLAM) lab. Before providing an answer, analyze the problem carefully and present your reasoning step by step. After explaining your thought process, provide the final solution in the following format: [BEGIN FINAL RESPONSE] ... [END FINAL RESPONSE]." -%}
{%- set assistant_starts_reasoning_with = "Here are my reasoning steps:\n" -%}
{%- set available_tools_text_block = "" -%}
{%- set tools_list_string = "" -%}

{# --- Safely prepare tools string --- #}
{%- set tools_input_iterable = [] -%}
{%- if tools is defined and tools is not none and tools is iterable and not tools is string and not tools is mapping -%}
{%- set tools_input_iterable = tools -%}
{%- endif -%}

{%- if tools_input_iterable|length > 0 -%}
{%- for tool_item in tools_input_iterable -%}
{%- set tools_list_string = tools_list_string + (tool_item|tojson if tool_item is defined and tool_item is not none else '{}') -%}
{%- if not loop.last %}{%- set tools_list_string = tools_list_string + ', ' -%}{%- endif -%}
{%- endfor -%}
{%- set available_tools_text_block = "You are provided with function signatures within XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about the arguments. You should infer the argument values from previous user responses and the system message. Here are the available tools: " + tools_list_string + "\n\nReturn all function calls as a list of json objects within XML tags. Each json object should contain a function name and arguments as follows: [{"name": "", "arguments": }, {"name": "", "arguments": },...]" -%}
{%- endif -%}

{# --- Safely prepare messages for iteration --- #}
{%- set messages_input_iterable = [] -%}
{%- if messages is defined and messages is not none and messages is iterable and not messages is string and not messages is mapping -%}
{%- set messages_input_iterable = messages -%}
{%- endif -%}

{# --- System Prompt Construction --- #}
{{- '<|system|>\n' + reasoning_prompt_instruction -}}
{%- set final_messages_for_main_loop = messages_input_iterable -%}
{%- if messages_input_iterable and messages_input_iterable[0] is defined and messages_input_iterable[0]['role'] is defined and messages_input_iterable[0]['role'] == 'system' -%}
{%- if messages_input_iterable[0]['content'] is defined and messages_input_iterable[0]['content'] is not none -%}
{{- '\n' + messages_input_iterable[0]['content'] -}} {# This is the user-defined system message from chat settings #}
{%- endif -%}
{%- set final_messages_for_main_loop = messages_input_iterable[1:] if messages_input_iterable|length > 1 else [] -%}
{%- endif -%}
{%- if available_tools_text_block != "" -%}
{{- '\n\n' + available_tools_text_block -}}
{%- endif -%}
{{- '\n<|end|>\n' -}}

{# --- Main Message Loop --- #}
{%- for message_item in final_messages_for_main_loop -%}
{%- if message_item is defined and message_item['role'] is defined -%}
{%- if message_item['role'] == 'user' -%}
{{- '<|user|>\n' + (message_item['content'] if message_item['content'] is defined and message_item['content'] is not none else '') + '\n<|end|>\n' -}}
{%- elif message_item['role'] == 'assistant' -%}
{{- '<|assistant|>\n' -}}
{%- set assistant_text_content = "" -%}
{%- if message_item['content'] is defined and message_item['content'] is not none and message_item['content']|trim != "" -%}
{%- set assistant_text_content = message_item['content'] -%}
{%- endif -%}
{{- assistant_text_content -}}

        {%- set tool_calls_input_iterable = [] -%}
        {%- if message_item['tool_calls'] is defined and message_item['tool_calls'] is not none and message_item['tool_calls'] is iterable and not message_item['tool_calls'] is string and not message_item['tool_calls'] is mapping -%}
            {%- set tool_calls_input_iterable = message_item['tool_calls'] -%}
        {%- endif -%}

        {%- if tool_calls_input_iterable|length > 0 -%}
            {%- if assistant_text_content|trim != "" and not assistant_text_content.endswith('\n') -%}
                {{- '\n' -}} 
            {%- endif -%}
            {{- '<tool_calls>[' -}}
            {%- for tool_call_item in tool_calls_input_iterable -%}
                {%- set current_func_args = '{}' -%}
                {%- if tool_call_item is defined and tool_call_item['function'] is defined and tool_call_item['function']['arguments'] is defined and tool_call_item['function']['arguments'] is not none -%}
                    {%- set current_func_args = tool_call_item['function']['arguments']|tojson -%}
                {%- endif -%}
                {%- set current_func_name = "" -%}
                {%- if tool_call_item is defined and tool_call_item['function'] is defined and tool_call_item['function']['name'] is defined and tool_call_item['function']['name'] is not none -%}
                    {%- set current_func_name = tool_call_item['function']['name'] -%}
                {%- endif -%}
                {{- '{"name": "' + current_func_name + '", "arguments": ' + current_func_args -}}
                {%- if tool_call_item is defined and tool_call_item['id'] is defined and tool_call_item['id'] is not none -%}
                    {{- ', "id": "' + tool_call_item['id'] + '"' -}}
                {%- endif -%}
                {{- '}' -}}
                {%- if not loop.last %}{{ ', ' }}{% endif -%}
            {%- endfor -%}
            {{- ']</tool_calls>' -}}
        {%- endif -%}
        {{- '\n<|end|>\n' -}}
    {%- elif message_item['role'] == 'tool' -%}
        {{- '<|tool_result|>\n' + (message_item['content']|string if message_item['content'] is defined and message_item['content'] is not none else '') + '\n<|end|>\n' -}}
    {%- endif -%}
{%- endif -%}

{%- endfor -%}

{# --- Final Assistant Generation Prompt --- #}
{%- set trigger_assistant_prompt = true -%}
{%- if final_messages_for_main_loop and final_messages_for_main_loop[-1] is defined -%}
{%- set last_message_in_history = final_messages_for_main_loop[-1] -%}
{%- if last_message_in_history['role'] is defined and last_message_in_history['role'] == 'assistant' -%}
{%- set has_tool_calls_in_last = false -%}
{%- if last_message_in_history['tool_calls'] is defined and last_message_in_history['tool_calls'] is not none and last_message_in_history['tool_calls'] is iterable and not last_message_in_history['tool_calls'] is string and not last_message_in_history['tool_calls'] is mapping and last_message_in_history['tool_calls']|length > 0 -%}
{%- set has_tool_calls_in_last = true -%}
{%- endif -%}
{%- if not has_tool_calls_in_last -%} {# If last was assistant text response (no tool calls) #}
{%- set trigger_assistant_prompt = false -%}
{%- endif -%}
{%- elif last_message_in_history['role'] is defined and last_message_in_history['role'] == 'tool' -%}
{%- set trigger_assistant_prompt = true -%}
{%- elif last_message_in_history['role'] is defined and last_message_in_history['role'] == 'user' -%}
{%- set trigger_assistant_prompt = true -%}
{%- endif -%}
{%- endif -%}

{%- if trigger_assistant_prompt %}
{{- '<|assistant|>\n' + assistant_starts_reasoning_with -}} {# Model is prompted to start its reasoning #}
{%- endif -%}

How to use in LM Studio:

Go to the model settings for Apriel-Nemotron-15b-Thinker.
Select "Jinja Template" as the prompt format type.
Copy the entire template above and paste it into the text box.
Save/apply.
A quick note on the process: This discussion post text was generated by Google's Gemini AI based on our conversation.

Hopefully, this helps other users! It might also be useful for @bartowski (the GGUF repo maintainer) if they are looking to provide a recommended chat template.

Thanks,
debeast6

uzvisa

28 days ago

Hello. Thank your. But its still not work (
Failed to parse Jinja template: Parser Error: Expected closing statement token. Identifier !== CloseStatement.
Can you please check it?

debeast6

28 days ago

This comment has been hidden

debeast6 changed discussion status to closed 28 days ago

uzvisa

28 days ago

I have LM Studio 0.3.15, with the update channel set to Stable. I copied the Jinja template you provided, but LM Studio still generates an error. Can you please suggest how to fix this error? I really like your LLM model and would love to utilize it fully.

debeast6 changed discussion status to open 28 days ago

debeast6

28 days ago

Apologies for the formatting issues in my previous post with the Jinja template. I'm not very technically savvy and didn't realize copy-pasting would break it.

To fix this, I've put the correctly formatted template in a GitHub Gist here:

https://gist.github.com/Ragnar-D/cad30721cbdf9e196b79c325b4f2129d

Please use the link above to get the working version. Once you're on the Gist page, just select all the template text you see there and copy-paste it directly into LM Studio. Thanks for your understanding!

uzvisa

27 days ago

Thank you very much! Now everything worked out! Could you please provide recommendations on the settings for temperature and other parameters (Top K, Min, Repeat)?

debeast6

27 days ago

•

edited 27 days ago

Hi! Regarding settings for bartowski's Nemotron-15B-Thinker GGUF in LM Studio – specifically, I'm using the Q6_K_L quant – here's what I'm using and finding works quite well:

Temperature: 0.6 (I saw this recommended on the model card).
Top K: 40 (This is the LM Studio default, I believe).
Top P: 0.95 (Also LM Studio default).
Repeat Penalty: 1.1 (LM Studio default).
Min P: 0.05 (LM Studio default).
(Context Overflow): I use "Truncate Middle".
I've mostly stuck with the LM Studio defaults apart from the temperature noted on the model card. With the Q6_K_L version, I've been really happy with the results – personally, I think the model gives great quality answers and seems quite efficient compared to some other 'thinking' models.

These settings should be a good starting point. While my positive experience with speed and quality is tied to this Q6_K_L quant, I believe these specific generation settings (Temperature, Top K, etc.) should generally apply and work well as a baseline for other quantizations of this particular Nemotron-15B-Thinker model too. Feel free to experiment based on your specific needs!

uzvisa

27 days ago

•

edited 27 days ago

Currently, I noticed that tags are still being output during the responses - [BEGIN FINAL RESPONSE] [END FINAL RESPONSE]<|end|>

Meanwhile, I also really liked the model; I’m not a programmer, but it thinks and writes surprisingly well.
I use IQ4_NL - i have macbook pro m1 16gb

debeast6

27 days ago

I think you're right about those tags ([BEGIN FINAL RESPONSE], etc.). From what I've seen while working with this model and adapting its prompt template, that seems to be the model's standard way of outputting. It's likely stemming from its original training data or how it was fine-tuned by its creators to clearly delimit the main response.

The <|end|> token is generally the designated "stop token," and it's common for models to generate it visibly just before stopping.

I realize that's different from some other models. I have to admit, I'm not strong on the technical side and leaned a lot on AI assistance just to get the Jinja prompt template's content and structure working correctly for this particular GGUF model. Since getting the template itself working correctly took a fair bit of effort, I hesitated to tinker with its content more to potentially hide those tags, especially since the model was performing well.

It seems this specific GGUF version of the model just includes them visibly. And I definitely agree with you – the quality of the writing itself is excellent!

uzvisa

27 days ago

Thank you again for your quick help. I’ll try reaching out to the chat where the model is hosted – maybe the developers can assist us. I’m even less experienced with settings, so I’m really glad you helped! Have a good day.

debeast6

27 days ago

Thank you, you as well.