Jan-nano Local Deployment Issues - Lack of Reasoning and Poor MCP Performance
Discussion Post: Jan-nano Local Deployment Issues - Lack of Reasoning and Poor MCP Performance
Hello everyone! I recently deployed the Jan-nano model locally, but I’ve encountered some issues during testing. I’d greatly appreciate your insights and guidance. Below are the specific problems I’m facing, along with my observations and questions.
Problem Description
Discrepancy Between Online and Local Inference
- When using the online API, the model behaves as expected, showing reasoning steps (e.g., step-by-step analysis, logical deduction), which aligns with the expected output.
- However, when deploying Jan-nano locally, the model does not perform reasoning and directly generates responses, leading to suboptimal performance on tasks requiring logical inference.
- Question: Is there a missing configuration or parameter in the local deployment? Do I need to explicitly enable a "reasoning mode" or adjust the inference pipeline?
Poor MCP Performance
- The MCP (possibly a plugin or inference mode) performs significantly worse in the local deployment compared to Qwen3-8b when using the "reasoning mode."
- Question: Could this be due to model architecture differences, training data, or parameter settings? Are there specific adjustments I can make to the MCP configuration?
Steps I’ve Already Taken
- Verified that the local deployment version of Jan-nano matches the online API version.
- Checked the model’s configuration files and found no obvious discrepancies.
- Experimented with inference parameters (e.g., temperature, top_p) but saw no significant improvement.
- Local deployment environment: Python 3.10 + CUDA 11.8, with hardware matching the online service.
What I’m Looking For
- Insights from others who have deployed Jan-nano locally and encountered similar issues.
- Guidance on enabling "reasoning mode" or adjusting inference parameters.
- Analysis of potential causes for the MCP performance gap and strategies to address it.
Thank you for your time and expertise!
If you have examples of configurations, parameter explanations, or relevant documentation, I’d be incredibly grateful. Looking forward to your responses! 😊
Hi Jan-nano is a 4b (not 8b) non-reasoning model.
so the offline behavior is correct.
I think on the online API they support both, but at the end of the day we trained the model to not think.
Hi Jan-nano is a 4b (not 8b) non-reasoning model.
so the offline behavior is correct.
I think on the online API they support both, but at the end of the day we trained the model to not think.
Hi
@alandao
,
Thank you so much for your clear reply! That definitely clears up why I was seeing different behaviors between the online and local versions.
Just to clarify, my mention of an "8b model" in the original post was referring to Qwen3-8b, which I was using as a benchmark for comparison.
I understand now that Jan-nano is a 4b non-reasoning model and its behavior in my local deployment is correct. What I'm still trying to understand is the extent of the performance difference on our MCP task. The drop in accuracy compared to a reasoning model like Qwen3-8b was larger than I had anticipated.
Is such a significant performance gap expected when a non-reasoning model is applied to tasks that might implicitly benefit from the underlying capabilities of a reasoning model?
Thanks again for your help