Different than Unsloth?
Hey, how come these files are different than Unsloth files here? unsloth/QwQ-32B-GGUF
I downloaded Q2_K of both, they are different in size so they must be entirely different files!
Are you sure? they look the same size to me, both 12.3GB
Are you sure? they look the same size to me, both 12.3GB
Yes, I'm sure. I was comparing them side by side in the file manager, their sizes were different. Very little difference, but still. Their content is also different, but that's expectable since the metadata such as huggingface urls etc. are different, but yeah the actual different size of the file is noticeable when you compare their size in bytes side by side locally.
Oh I mean yeah there'll be a bit of noise I suspect but unless it's 10s of megabytes I'd assume they're the same
Plus mine uses imatrix which may make it change the file size slightly (not sure how but not discounting the possibility)
unsloth's page says "Qwen-QwQ-32B with our bug fixes", did they change something?
I think they only fixed a tokenizer issue that affects fine-tuning, so isn't relevant here
They also suggest some sampler params
https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively
From my experience:
On the same quant level (Q4_K_M), the direct llama.cpp obtained (as
@bartowski
does) is the model I use now every day, nothing to say, really happy with it!
But when I tried unsloth's "dynamic quant", I got different and strange results:
When I asked the same question in a language other than English, llama.cpp's one thought in English, but gave its answer respecting the user's language. Unsloth's one also thought in English, but then disregarded the user's language and answered in English as well.
I'm sorry in advance because I lost the chat history in question... After generating like 5-6 times in a row for each model, I didn't push it further, and considering the desired behavior was to respect the original language, I told myself I would try unsloth's one again and see in the longer term, but since "benchmarking" (if we can call it that) reasoning models takes a while, I never did and ended up deleting unsloth's one, keeping the other and moving on. I didn't want to spend more time on what I thought was a buggy model...
I can't give more info than that for now, that was just my short experience with their special quants...
However, I recognize I should have experimented more to make the stats more robust.
Ok, now I feel like I'll actually have to try again to illustrate my points and be fair to unsloth's team...
I'll post my updated feedback here
From my experience:
On the same quant level (Q4_K_M), the direct llama.cpp obtained (as @bartowski does) is the model I use now every day, nothing to say, really happy with it!
But when I tried unsloth's "dynamic quant", I got different and strange results:
When I asked the same question in a language other than English, llama.cpp's one thought in English, but gave its answer respecting the user's language. Unsloth's one also thought in English, but then disregarded the user's language and answered in English as well.
I'm sorry in advance because I lost the chat history in question... After generating like 5-6 times in a row for each model, I didn't push it further, and considering the desired behavior was to respect the original language, I told myself I would try unsloth's one again and see in the longer term, but since "benchmarking" (if we can call it that) reasoning models takes a while, I never did and ended up deleting unsloth's one, keeping the other and moving on. I didn't want to spend more time on what I thought was a buggy model...
I can't give more info than that for now, that was just my short experience with their special quants...
However, I recognize I should have experimented more to make the stats more robust.Ok, now I feel like I'll actually have to try again to illustrate my points and be fair to unsloth's team...
I'll post my updated feedback here
The AI reasoning is contagious. π€
Oh no! Said shit here, the quants I use daily are the ones directly from Qwen's repo!! Damn I would have to compare 3 models now π€¦
@MrDevolver
haha don't know exaclty what you meant but if it was about passion around it, yes!
Aha, now I know the answer
@MrDevolver
! Just noticed the patterns in my post! But wait, π
@MrDevolver haha don't know exaclty what you meant but if it was about passion around it, yes!
Aha, now I know the answer @MrDevolver ! Just noticed the patterns in my post! But wait, π
Yes, indeed I was referring to the patterns in your post. But I don't blame you, I guess we all spent our fair share time reading the long walls of reasoning texts from these models and sometimes it just gets so stuck in our head we unconsciously start writing just like them... π
These models bring that "But wait, there's more" meme to a whole new level... π€£
Ok, so, I did the test on 3 different prompts in 3 different languages (RU, DE, FR), 5 pass per model per language:
- Forget what I said, results are not conclusive (lol), that would require so many more samples, that was stupid engaging in such a test, sharing the results here would be pointless and potentially misleading
- I won't do more as it's so time consuming to benchmark manually
- System prompt was EN
- Only things I can say are global and not as interesting as I hoped the outcome would be:
- When source is RU, all 3 quants tend to keep RU to reason, and thus to answer too
- When source is DE, they tend to switch to EN for reasoning then might eventually return to DE to answer
- When source is FR, they also ted to switch to EN for reasoning, but this time almost always revert back to source language to answer
Sorry for cluttering the discussion, again that was dumb to try this experiment in the first place...