Please share feedback here!

#6
by shimmyshimmer - opened
Unsloth AI org

If you’ve tested any of the initial GGUFs, we’d really appreciate your feedback! Let us know if you encountered any issues, what went wrong, or how things could be improved. Also, feel free to share your inference speed results!

shimmyshimmer pinned discussion

Is it working for you?

Q8_0, Llama.cpp:
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1, 1
llama_model_load_from_file_impl: failed to load model

Unsloth AI org

Is it working for you?

Q8_0, Llama.cpp:
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1, 1
llama_model_load_from_file_impl: failed to load model

Could you try updating llama.cpp to the latest version?

Yes, resolved, thank you!

system prompt is added between bos and user token role right? it seems to work really well!

i suggest you state where the system prompt should be inserted in the prompt template so that it is clear for text completion users/ users not using something with an autotokenizer

I've tested the UD-Q3_K_XL in llama.cpp (Ubuntu), and it works great. I'm testing with a context size of around 14000.

add Q1 quant ie 1 bit as well

Yo, DeepSeek-V2-Lite 16B needs to be GUFF'ed!

Unsloth AI org

add Q1 quant ie 1 bit as well

its uploading

Unsloth AI org

add Q1 quant ie 1 bit as well

They're up now!

ran the original Deepseek unsloth R1 quant with 2x 3090's with 128 GB of ram didn't get much as far as tokens 2-3/s.. interested to see if the new Unsloth Dynamic 2.0 GGUFs stack up with smart layering and shit

ran the original Deepseek unsloth R1 quant with 2x 3090's with 128 GB of ram didn't get much as far as tokens 2-3/s.. interested to see if the new Unsloth Dynamic 2.0 GGUFs stack up with smart layering and shit

if you're not on ik_llama.cpp fork you're missing out

Why are these sizes substantially larger than the other ones? For example UD-Q3-K-XL original vs this, 273gb vs 350gb.

Sign up or log in to comment