Chimera not separating reasoning from response
It seems as though Chimera has stopped wrapping its reasoning process in think tags on both ends, making it harder to separate its reasoning from its actual answer - unlike Deepseek R1 for example. This manifests as both Chutes and Openrouter not separating the reasoning from the response in the API response and not separating out reasoning in their builtin chat functionality. Is this intentional?
The model doesn't even reason for me anymore haha.
Hello there,
thank you for pointing this out. Maybe OpenRouter changed their chat template? If you look at the statistics, the Chimera usually had a relation of 1:1 to 3:1 between completion and reasoning tokens.
Since May 13th, this has changed significantly. Now, the completion-reasoning-relation has changed to something like 50:1, which suggests that reasoning has become very rare in comparison.
Maybe ask OpenRouter?
PS: We did not change the model.
Update: We wrote this to OpenRouter on X.
"To us it seems that you no longer use the chat-template provided with in the tokenizer_config.json file."
"We followed the suggested way from DeepSeek for using the original R1 version: prefix the Assistant message with "think" (in angle brackets) to ensure reasoning. This got added to the tokenizer, as can be seen by this change from DeepSeek in HF: https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f
We provide the exact same tokenizer_config.json file as DeepSeek does. [...]
Have you got any updates on this? The problem still appears to be happening. I reached out on the openrouter discord server but have not heard anything back from them.
I just got an answer from the person at chutes who is in charge of the model on their side and I think they fixed it (thanks, JD!). I also just checked using the OpenRouter chat and asked simple questions, e.g.
"Can you give me the sum of the squares of the integers from 1 to 10, please?"
The Chimera started thinking properly:
"Okay, so I need to find the sum of the squares of the integers from 1 to 10. Let me think about how to approach this.
First, I recall that the squares of integers from 1 to 10 are each number multiplied by itself. So, I can list them out:
1² = 1
2² = 4
..."
Can you try again on your side?
Probably it was SGLang 0.4.6.post4, which was released on May 13th, on the day the reasoning change appeared in the OpenRouter statistics.
They upgraded from SGLang 0.4.5 on May 13th. This seems to be the cause.
We're thinking about it if we can recreate it on our side or otherwise advise.
The chat template defined in this repo includes the opening think tag, meaning that since it is now part of the chat template itself the model will likely never emit one in it's output.
Therefore, all tokens before the closing tag are reasoning tokens.
To revert this, simply remove the opening think tag as part of the chat template and let the model generate it.
SGLang update was on the 14th, but does not change chat template to my knowledge.
I just got an answer from the person at chutes who is in charge of the model on their side and I think they fixed it (thanks, JD!). I also just checked using the OpenRouter chat and asked simple questions, e.g.
"Can you give me the sum of the squares of the integers from 1 to 10, please?"
The Chimera started thinking properly:
"Okay, so I need to find the sum of the squares of the integers from 1 to 10. Let me think about how to approach this.
First, I recall that the squares of integers from 1 to 10 are each number multiplied by itself. So, I can list them out:
1² = 1
2² = 4
..."Can you try again on your side?
The issue described has not been resolved. Using your example, the reasoning is still being written with the output - compare the output in OpenRouter Chat to something like Deepseek:R1 and you can clearly see that R1 is separating out reasoning while Chimera is not. This also happens in Chutes Chat and in Chutes Playground.
Same, it still doesnt do reasoning tokens correctly /:
Is there any update on this? OpenRouter and Chutes both continue not to separate out reasoning from response.
Hello,
thanks for asking. We also did try some TNG-local adaptions to deal with it, but that is not finished yet. Do you have some example prompts, preferably of a type that is relevant to you, that show the undesired behaviour? We can then test if the newest local version can deal with those prompts correctly right away.
You can also email us these or send them via LinkedIn etc.
Cheers,
Henrik
Hello,
The example prompt used earlier works just fine "Can you give me the sum of the squares of the integers from 1 to 10, please?" - although any prompt that triggers Chimeras reasoning works to illustrate the issue. They could be Maths questions, questions about characters in popular media, interesting facts about the world. Things like that.
Is there any update on this? OpenRouter and Chutes both continue not to separate out reasoning from response.
https://huggingface.co/tngtech/DeepSeek-R1T-Chimera/discussions/3#682cab7ac2f5e0c9b99bb2cb
As mentioned, the opening think tag is now baked in to the chat template so the model never produces one, it's inherent.
The maintainers of this model can remove the forced think tag as part of the chat template and it will work as before.
We can try to update our front end to display it nicely given this change but for example we can't/won't change the API. If this is the chat template that the model maintainer wants to use we won't override it.
Whoever maintains the chat template determines whether or not a think tag is produced as part of model output or the template itself (and therefore not included in output).
Hello, our new R1T2 version should be very well behaved with the think tokens. Mistakes should be rare. Cheers!
Great to hear! Is this new version going to be made available on Openrouter as well?
We've just added the new variant to chutes: https://chutes.ai/app/chute/4fa0c7f5-82f7-59d1-8996-661bb778893d
For this version, the responses explicitly separate the reasoning and output contents, via SGLang's reasoning-parser flag. We're still updating our front-end to properly parse, and I suspect OpenRouter will need to do the same as last time we discussed they did not have a way to do so.
For example, via the API, here is a non-streamed example:
$ time curl -s -XPOST https://llm.chutes.ai/v1/chat/completions -H 'content-type: application/json' -d '{"model": "tngtech/DeepSeek-TNG-R1T2-Chimera", "messages": [{"role": "user", "content": "Hello."}]}' -H "authorization: $CHUTES_API_KEY" | jq .
{
"id": "c438a70a1b0943d6b0062da587c42b3b",
"object": "chat.completion",
"created": 1751565959,
"model": "tngtech/DeepSeek-TNG-R1T2-Chimera",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I assist you today? 😊",
"reasoning_content": "Okay, the user just said \"Hello.\" That's a straightforward greeting. I should respond warmly to make them feel welcome. Maybe add a friendly emoji to keep it light and approachable. Since they haven't specified what they need help with yet, I'll offer assistance in a general way. Let me keep it simple and open-ended so they can guide the conversation from here. Something like, \"Hello! How can I assist you today?\" with a smiley face should work. I don't want to overwhelm them with too much information right away. Just set a positive tone and let them lead.\n",
"tool_calls": null
},
"logprobs": null,
"finish_reason": "stop",
"matched_stop": 1
}
],
"usage": {
"prompt_tokens": 5,
"total_tokens": 143,
"completion_tokens": 138,
"prompt_tokens_details": null
}
}
As you can see, there is a separation between reasoning content and output.
Here's an example when streaming of reasoning chunks:
$ time curl -s -XPOST https://llm.chutes.ai/v1/chat/completions -H 'content-type: application/json' -d '{"model": "tngtech/DeepSeek-TNG-R1T2-Chimera", "messages": [{"role": "user", "content": "Hello."}], "stream": true}' -H "authorization: $CHUTES_API_KEY"
data: {"id":"e7751f51533441e0aeb65cf0b62a07d6","object":"chat.completion.chunk","created":1751565980,"model":"tngtech/DeepSeek-TNG-R1T2-Chimera","choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"e7751f51533441e0aeb65cf0b62a07d6","object":"chat.completion.chunk","created":1751565980,"model":"tngtech/DeepSeek-TNG-R1T2-Chimera","choices":[{"index":0,"delta":{"role":null,"content":null,"reasoning_content":"\n","tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
data: {"id":"e7751f51533441e0aeb65cf0b62a07d6","object":"chat.completion.chunk","created":1751565980,"model":"tngtech/DeepSeek-TNG-R1T2-Chimera","choices":[{"index":0,"delta":{"role":null,"content":null,"reasoning_content":"Okay","tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
Which then in the actual output (after reasoning) it would have this:
data: {"id":"e7751f51533441e0aeb65cf0b62a07d6","object":"chat.completion.chunk","created":1751565982,"model":"tngtech/DeepSeek-TNG-R1T2-Chimera","choices":[{"index":0,"delta":{"role":null,"content":"😊","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":null,"matched_stop":null}],"usage":null}
Thanks to @jondurbin and team the new model R1T2 is now available on chutes.ai
Openrouter seems to not have included the new model in their portfolio yet.