Different than Unsloth?

by MrDevolver - opened 29 days ago

Discussion

MrDevolver

29 days ago

•

edited 29 days ago

Hey, how come these files are different than Unsloth files here? unsloth/QwQ-32B-GGUF

I downloaded Q2_K of both, they are different in size so they must be entirely different files!

bartowski

Owner 29 days ago

Are you sure? they look the same size to me, both 12.3GB

MrDevolver

29 days ago

Are you sure? they look the same size to me, both 12.3GB

Yes, I'm sure. I was comparing them side by side in the file manager, their sizes were different. Very little difference, but still. Their content is also different, but that's expectable since the metadata such as huggingface urls etc. are different, but yeah the actual different size of the file is noticeable when you compare their size in bytes side by side locally.

bartowski

Owner 29 days ago

Oh I mean yeah there'll be a bit of noise I suspect but unless it's 10s of megabytes I'd assume they're the same

bartowski

Owner 29 days ago

Plus mine uses imatrix which may make it change the file size slightly (not sure how but not discounting the possibility)

blankreg

22 days ago

unsloth's page says "Qwen-QwQ-32B with our bug fixes", did they change something?

bartowski

Owner 22 days ago

I think they only fixed a tokenizer issue that affects fine-tuning, so isn't relevant here

They also suggest some sampler params

https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively

owao

21 days ago

From my experience:
On the same quant level (Q4_K_M), the direct llama.cpp obtained (as @bartowski does) is the model I use now every day, nothing to say, really happy with it!
But when I tried unsloth's "dynamic quant", I got different and strange results:
When I asked the same question in a language other than English, llama.cpp's one thought in English, but gave its answer respecting the user's language. Unsloth's one also thought in English, but then disregarded the user's language and answered in English as well.
I'm sorry in advance because I lost the chat history in question... After generating like 5-6 times in a row for each model, I didn't push it further, and considering the desired behavior was to respect the original language, I told myself I would try unsloth's one again and see in the longer term, but since "benchmarking" (if we can call it that) reasoning models takes a while, I never did and ended up deleting unsloth's one, keeping the other and moving on. I didn't want to spend more time on what I thought was a buggy model...
I can't give more info than that for now, that was just my short experience with their special quants...
However, I recognize I should have experimented more to make the stats more robust.

Ok, now I feel like I'll actually have to try again to illustrate my points and be fair to unsloth's team...

I'll post my updated feedback here

MrDevolver

21 days ago

From my experience:
On the same quant level (Q4_K_M), the direct llama.cpp obtained (as @bartowski does) is the model I use now every day, nothing to say, really happy with it!
But when I tried unsloth's "dynamic quant", I got different and strange results:
When I asked the same question in a language other than English, llama.cpp's one thought in English, but gave its answer respecting the user's language. Unsloth's one also thought in English, but then disregarded the user's language and answered in English as well.
I'm sorry in advance because I lost the chat history in question... After generating like 5-6 times in a row for each model, I didn't push it further, and considering the desired behavior was to respect the original language, I told myself I would try unsloth's one again and see in the longer term, but since "benchmarking" (if we can call it that) reasoning models takes a while, I never did and ended up deleting unsloth's one, keeping the other and moving on. I didn't want to spend more time on what I thought was a buggy model...
I can't give more info than that for now, that was just my short experience with their special quants...
However, I recognize I should have experimented more to make the stats more robust.

Ok, now I feel like I'll actually have to try again to illustrate my points and be fair to unsloth's team...

I'll post my updated feedback here

The AI reasoning is contagious. 🤭

owao

21 days ago

•

edited 21 days ago

Oh no! Said shit here, the quants I use daily are the ones directly from Qwen's repo!! Damn I would have to compare 3 models now 🤦

@MrDevolver haha don't know exaclty what you meant but if it was about passion around it, yes!
Aha, now I know the answer @MrDevolver ! Just noticed the patterns in my post! But wait, 😂

MrDevolver

21 days ago

@MrDevolver haha don't know exaclty what you meant but if it was about passion around it, yes!
Aha, now I know the answer @MrDevolver ! Just noticed the patterns in my post! But wait, 😂

Yes, indeed I was referring to the patterns in your post. But I don't blame you, I guess we all spent our fair share time reading the long walls of reasoning texts from these models and sometimes it just gets so stuck in our head we unconsciously start writing just like them... 😀

These models bring that "But wait, there's more" meme to a whole new level... 🤣

owao

21 days ago

Ok, so, I did the test on 3 different prompts in 3 different languages (RU, DE, FR), 5 pass per model per language:

Forget what I said, results are not conclusive (lol), that would require so many more samples, that was stupid engaging in such a test, sharing the results here would be pointless and potentially misleading
I won't do more as it's so time consuming to benchmark manually
System prompt was EN
Only things I can say are global and not as interesting as I hoped the outcome would be:

When source is RU, all 3 quants tend to keep RU to reason, and thus to answer too
When source is DE, they tend to switch to EN for reasoning then might eventually return to DE to answer
When source is FR, they also ted to switch to EN for reasoning, but this time almost always revert back to source language to answer

Sorry for cluttering the discussion, again that was dumb to try this experiment in the first place...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment