Any success stories with parameters?
Hello everyone,
this looks like a decent and capable model, at least it's reasoning part, however I'm struggling with very lengthy outputs in reasoning alone and it seems to never end. It's like the model seems pretty smart, but it gets so easily distracted that it runs out of the context window and doesn't even get to writing the actual output after reasoning part. Naturally at that point, it starts generating nonsense.
I believe that it should be possible to fix it using different parameters, but so far I had no luck finding the good parameters that would fix this.
Trying the classic ones mentioned in the inference code snippets like temp 0.6, top-p 0.95, etc. did not give much different output from my usual starting settings.
Has any of you figured actually good parameters and successfully generated good output? Please share your parameters and output examples, thanks!
This model has great capabilities for text based reasoning, among the best of its size, quite nice to see.
Unfortunately, it fails to follow system instructions and basic tools.
It cannot produce syntax error free code, such as basic Python, JavaScript, Bash, or HTML, while it definitely have the logic for.
Haven't find any sweat spots either, looking forward for that, as this model clearly has a great reasoning.
(7.8B ollama Q4KM)
This model has great capabilities for text based reasoning, among the best of its size, quite nice to see.
Unfortunately, it fails to follow system instructions and basic tools.
It cannot produce syntax error free code, such as basic Python, JavaScript, Bash, or HTML, while it definitely have the logic for.Haven't find any sweat spots either, looking forward for that, as this model clearly has a great reasoning.
(7.8B ollama Q4KM)
Thank you for writing this.
You know, I've read somewhere on Reddit that we're all using it wrong, that we should not use LM Studio for this because LM Studio supposedly breaks this model and Ollama supposedly doesn't. But after reading your thoughts, I think your experience with it matches mine and yet I tested it in LM Studio, so I think that we can scratch the client off the list of possible reasons for this model's underperformance.
There was also another opinion that specifically Q4KM quants are broken in this model, but that theory felt somehow even harder to believe than the one with the problematic client app, so I didn't bother to test out different quants.
Well, about the reasoning, the Q4KM is certainly destructive, but it does keep a strong reasoning, to picture that, i just left a message about Math:
Where the Q4KM does solve the problem really well:
3276775*7372793
https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-7.8B/discussions/2
So, in my opinion, yes it's not much the tools we use^ but the model itself might require further training especially for coding tasks, and/or tooling.
We are looking forward.
In my testing at Q4KM with KV cache at Q4 using the recommended temperature and top-p settings, I've noticed some issues with the model occasionally hallucinating, getting caught in repetitive greeting loops, or falling into patterns of "alternatively xyz abc" constructions.
However, the model truly shines with mathematical problems. Impressively, it correctly solved "3276775*7372793" - a calculation that stumped SOTA models in my tests. Though it required nearly 10,000 tokens (about 1/3 of its context window), the result was accurate.
My overall assessment is that this model would benefit from additional fine-tuning. The mathematical capabilities demonstrate significant potential, and while it likely won't outperform leading models in coding tasks, with proper refinement it could substantially outpace current 7-8B models by a long shot. The main issues I faced include repetitive loops, extreme hallucinations, and incorrect syntax for basic Python code.
I see tremendous potential here and believe this model could be developed much further. The model is good, and I hope future iterations make it even better!
Great work, LG!
Yes I used q4km and kv cache at q4, and yes the "true" model at full precision may/may not face the issues I faced... but I just reported what I observed