model loading error

by austinsr - opened 25 days ago

25 days ago

clip_init: failed to load model '/root/.cache/llama.cpp/unsloth_cogito-v2-preview-llama-109B-MoE-GGUF_mmproj-F16.gguf': operator(): unable to find tensor v.blk.0.attn_k.weight

mtmd_init_from_file: error: Failed to load CLIP model from /root/.cache/llama.cpp/unsloth_cogito-v2-preview-llama-109B-MoE-GGUF_mmproj-F16.gguf

srv load_model: failed to load multimodal model, '/root/.cache/llama.cpp/unsloth_cogito-v2-preview-llama-109B-MoE-GGUF_mmproj-F16.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

./llama-server
-hf unsloth/cogito-v2-preview-llama-109B-MoE-GGUF:Q6_K_XL
--n-gpu-layers 99
--jinja
--threads 36
--threads-batch 24
-sm row
--temp 0.6
--min-p 0.01
--top-p 0.9
--ctx-size 16384
--no-context-shift
--port 8080
--host 0.0.0.0
--metrics \

xbruce22

25 days ago

also got the same error.

av-codes

25 days ago

•

edited 25 days ago

If you're like me and don't care about multimodal inputs, add --no-mmproj to the args, it'll ignore. Otherwise - download/copy original files from other GGUF repos where they are present.

Nonetheless, model seems to output only: : when called with below args:

-hf unsloth/cogito-v2-preview-llama-109B-MoE-GGUF:Q3_K_XL --cache-type-k q4_0 --n-gpu-layers 99 --ctx-size 8192 -ot \".ffn_.*_exps.=CPU\" --no-mmproj -a Cogito2-Scout

Edit: sample HTTP log https://pastebin.com/Xmfeyb27

xbruce22

25 days ago

Thank you. Yes I don't need images as of now.

av-codes

25 days ago

Tried removing kv quantisation, adjusting ctx size, adding --jinja, model still only outputs: ::::::, tried to compare GGUF's with Unsloth's Llama 4 Scout, but couldn't spot any obvious difference that'd lead to such behavior

--jinja --n-gpu-layers 99 --ctx-size 16384 -ot \".ffn_.*_exps.=CPU\" --no-mmproj -a Cogito2-Scout

xbruce22

25 days ago

Same output for me too.

xbruce22

24 days ago

update, I redownloaded model and its now working.

av-codes

24 days ago

Ok, it was most likely an issue with the Q3_K_XL

Just re-downloaded the Q4_K_S, following args:

-hf unsloth/cogito-v2-preview-llama-109B-MoE-GGUF:Q4_K_S
--parallel 1 -ngl 12 --ctx-size 4096 --no-mmproj -a Cogito2-Scout

Produces outputs as expected

austinsr

24 days ago

•

edited 24 days ago

My output after the --no-mmproj is "GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG". I will try to redownload the model.
Edit: Redownloading fixed the repetition error.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment