Spaces:
Running
[MODELS] Discussion
what are limits of using these? how many api calls can i send them per month?
How can I know which model am using
Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding
Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.
apparently gemma cannot code?
Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations
Still seeing <|im_end|> at the end of some responses and sometimes causing the AI to respond as the user, in Model: meta-llama/Llama-3.3-70B-Instruct.
If you simply want to try out Llama 4 in a chat UI right away, you can sign up for OpenRouter and, by allowing your input data to be used for model improvement, you can use both Marverick and Scout for free. I tried it this way, and personally, I felt that the model's performance wasn't quite up to expectations. (This doesn't mean I'm against the idea of adding it to HuggingChat.)
LLaMA-4 when? π€
unfortunately, this reddit post confirmed that all llama-4 models fell short to and even underperformed against many of the current models existing on huggingchat, why bother adding it? plus, llama4 is not truly open source (unlike zuckerberg's claims), maybe we should a better model for this month (such as openthinker2-32b or command a).
https://www.reddit.com/r/LocalLLaMA/comments/1jt0bx3/qwq32b_outperforms_llama4_by_a_lot/Although Llama 4 was somewhat disappointing in terms of expectations, I believe it's still worth featuring in Hugging Chat. We come here to be able to try the latest advancements in open-source models, and Llama 4 is at least noteworthy. Even the scout version, which is on par with Gemma 3 27B, or potentially the maverick version, which claims to be comparable to GPT-4o, Gemini 2 Flash, and DeepSeek V3, would be a valuable addition. Of course, at the end it all depends if the team determines it has the capacity to serve those models, which are not so small compared to the others.
yeah, but we think that out of all the current models available on huggingchat, deepseek-r1-distill-32b and qwq-32b are the most viable LLMs to use (despite them having random hallucinations when popping random chinese/other language letters in the first place!) so we need a more capable open-source LLM to be added here, such as command a or openthinker2-32b.
If you're really curious about LLaMA 4, I recommend trying it out on the free and instantly accessible OpenRouter.
Seeing how it performs might give you a different perspective on whether it should be added to HuggingChat.
if you look right inside this image, llama 4 maverick and command r+ is falling short in all 7 tests.
thats why we need to add another reasoning model instead of llama 4, such as openthinker2-32b, a revamped version based on qwen2.5-32b instruct which completely outpaces the current deepseek-r1-distill-32b (in which the latter hallucinates when generating a prompt written in other languages (see this problem here: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B/discussions/45 ). Therefore it is best not to either add command a (as this chart shows it), nor command r+, but instead, openthinker-2.
QwQ-32B and DeepSeek-R1-32B often generate strange, unreadable output when responding in Japanese β with mixed languages or garbled characters. This suggests limited multilingual support. At least for Japanese, their performance is far behind models like Gemma or Phi, and in some languages they become nearly unusable. These models likely require fine-tuning for each language. Iβm not sure how strong OpenThinker2 is in this regard, but given that it's based on DeepSeek-R1 and also 32B scale, I don't expect much.
High-benchmark models like QwQ and OpenThinker are certainly worth supporting, but I donβt think model selection should be driven by benchmark results alone. Cohere models (Command R+, Command A) offer better language coverage, fewer ethical restrictions, and strong performance on creative tasks. Personally, Command R+ is a key reason I keep coming back to HuggingChat, and I hope it remains part of the lineup.
Today the : mistralai/Mistral-Nemo-Instruct-2407 model is giving me the same text over and over again or it's not working at all and I'm getting an error plus I thought as it was old chat I would try a new one but I'm not kidding it give me the same text it just give me in the old one like word for word.
QwQ-32B and DeepSeek-R1-32B often generate strange, unreadable output when responding in Japanese β with mixed languages or garbled characters. This suggests limited multilingual support. At least for Japanese, their performance is far behind models like Gemma or Phi, and in some languages they become nearly unusable. These models likely require fine-tuning for each language. Iβm not sure how strong OpenThinker2 is in this regard, but given that it's based on DeepSeek-R1 and also 32B scale, I don't expect much.
High-benchmark models like QwQ and OpenThinker are certainly worth supporting, but I donβt think model selection should be driven by benchmark results alone. Cohere models (Command R+, Command A) offer better language coverage, fewer ethical restrictions, and strong performance on creative tasks. Personally, Command R+ is a key reason I keep coming back to HuggingChat, and I hope it remains part of the lineup.
since deepseek-r1-distill-32b and qwq-32b's flaws are basically inherent, we need either openthinker2 or a just-newly-released model called kimi-vl-a3b thinking which can be added alongside deepseek-r1-32b or qwq-32b.
More info on kimi: https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking
Time to add more reasoning models to the current hugging chat arsenal.
(p/s: in rare cases deepseek -r1-32b when clicking the "try again" it refused to try again and it showed the previous answer that it had answered (in my experience the previous answer that it generated was mostly incorrect). we are sure this is a confirmed bug)