I'd like the model to produce concise answer in json but the inference process is too long. And tuning using prompt engineering seems to have little impact on the output. It's a bit hard to use the model in a practical way(e.g. act as agent, request API calls).