Latest updates?

#10
by Dampfinchen - opened

Hello, I've noticed you have updated the model quite a few times, but I'm not sure why.

The first time was 1 day ago, and the latest update happened two hours ago.

What were the reasons for these updates? Perhap an update history would be useful here. Thanks for your great work!

I would be very curious to know as well. Do I need to redownload the model?

Unsloth AI org
β€’
edited May 2

No update is required. The weights just changed ever so slightly for miniscule improvements in accuracy. So you can if you want.

We basically improved how we calculated the imatrix - but it was previously accounted for already but it just changed slightly

CC: @Dampfinchen @Mushoz @fakezeta @RachidAR

And here I am religiously updating my llamacpp and the 30B from unsloth almost every morning now. I share your OCD with squeezing out whatever performance improvements possible, because why not?

I wish model providers would do more frequent updates as well. Every release has its weaknesses and areas for improvement, which a 3.1 or 3.2 could help mitigate before the next big version is trained! I think Llama3 did this well last year culminating in Llama 3.3 70b which was a chef's kiss.

But anyway, thanks for being slightly obsessed with optimization guys, your new dynamic quants 2.0 are just what the community needed after having no updates since the "K_M/K_S" quants came online.

Unsloth AI org

And here I am religiously updating my llamacpp and the 30B from unsloth almost every morning now. I share your OCD with squeezing out whatever performance improvements possible, because why not?

I wish model providers would do more frequent updates as well. Every release has its weaknesses and areas for improvement, which a 3.1 or 3.2 could help mitigate before the next big version is trained! I think Llama3 did this well last year culminating in Llama 3.3 70b which was a chef's kiss.

But anyway, thanks for being slightly obsessed with optimization guys, your new dynamic quants 2.0 are just what the community needed after having no updates since the "K_M/K_S" quants came online.

Thank you for the praise we really appreciate it! πŸ€—

@shimmyshimmer
Hey just bumping this one since you uploaded the GGUF's again in the last 12 hours ish and I dutifully downloaded them all and updated my llamacpp. Any reason I should re-test the models (I have a bunch of private evaluations I run) with the new GGUF's or is it a minor update? Just curious what's changed :)

Also just fyi llama.cpp build b5328 (the one that introduces flash attention for deepseek for Ampere for RTX 3000 series and above) broke flash-attention on my mobile RTX 2070, so hopefully they fix that soon! All models output gibberish for me from that build onwards with flash-attention enabled.

There is a PR to fix flash attention on Turing cards - https://github.com/ggml-org/llama.cpp/pull/13415

Should be fixed when merged to main!

@shimmyshimmer I also noticed an update some 13 hours ago. Can you guys give some feedback no matter how dry and minor?

Unsloth AI org

@YearZero @sidran

New highly improved calibration dataset + Q5, Q6 etc XL quants

Sign up or log in to comment