issues to convert to GGUF
I've downloaded the model and am trying to quantize:
ls -la .
total 8812344
drwxr-xr-x 13 macdev wheel 416B Oct 6 10:45 .
drwxr-xr-x 3 macdev wheel 96B Oct 6 10:44 ..
drwxr-xr-x 13 macdev wheel 416B Oct 6 10:45 .git
-rw-r--r-- 1 macdev wheel 1.5K Oct 6 10:44 .gitattributes
-rw-r--r-- 1 macdev wheel 65K Oct 6 10:44 README.md
-rwxr-xr-x 1 macdev wheel 811B Oct 6 10:44 config.json
-rwxr-xr-x 1 macdev wheel 200B Oct 6 10:44 generation_config.json
drwxr-xr-x 4 macdev wheel 128B Oct 6 10:44 images
-rwxr-xr-x 1 macdev wheel 4.2G Oct 6 10:45 model.safetensors
-rwxr-xr-x 1 macdev wheel 513B Oct 6 10:44 special_tokens_map.json
-rwxr-xr-x 1 macdev wheel 133B Oct 6 10:44 tokenizer.json
-rwxr-xr-x 1 macdev wheel 4.6M Oct 6 10:44 tokenizer.model
-rwxr-xr-x 1 macdev wheel 2.3K Oct 6 10:44 tokenizer_config.json
but I get this error:
/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py . --outfile ./salamandra-2b-instruct_fp16.gguf
INFO:hf-to-gguf:Loading model:
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.0.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.0.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.1.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.1.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.11.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.11.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.13.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.13.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.15.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.15.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.16.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.16.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.16.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.17.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.17.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.17.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.18.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.18.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.18.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.19.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.19.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.19.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.20.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.20.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.20.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.21.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.21.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.21.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.22.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.22.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.22.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.23.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.23.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.23.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.3.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.3.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.4.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.4.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.6.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.6.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.7.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.7.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 5440
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
main()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
model_instance.write()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 434, in write
self.prepare_metadata(vocab_only=False)
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 427, in prepare_metadata
self.set_vocab()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 1521, in set_vocab
self._set_vocab_sentencepiece()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 752, in _set_vocab_sentencepiece
special_vocab = gguf.SpecialVocab(self.dir_model, n_vocab=len(tokens))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/vocab.py", line 40, in __init__
self._load(Path(path))
File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/vocab.py", line 76, in _load
self._try_load_from_tokenizer_json(path)
File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/vocab.py", line 122, in _try_load_from_tokenizer_json
tokenizer = json.load(f)
^^^^^^^^^^^^
File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^^^^^^^^
File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macdev/.pyenv/versions/3.12.0/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
this error, minus all the safe tensor INFO lines, is common when the model download is broken somehow .. can you provide/verify MD5s please?
find . -type f -maxdepth 1 -exec md5 {} \;
MD5 (./model.safetensors) = 72d45e5e9c4a738380e198e1929a7dd4
MD5 (./tokenizer_config.json) = 705ddfcd92cdd94f3855796ec4b01e3b
MD5 (./special_tokens_map.json) = af0d0fcd1cc480681d633730ae47ad96
MD5 (./config.json) = 96ef31cfcf5d41fd1b19c1262d07d299
MD5 (./tokenizer.json) = 61f26337646dce58e6aa773483b967d2
MD5 (./generation_config.json) = 91b221a0fe4f974912a35a74119ed69f
MD5 (./tokenizer.model) = 2378d392cfb6f4b30aa9d38f54975b0a
MD5 (./README.md) = 7b9ca1d9157f3b80b83381011b5def44
MD5 (./.gitattributes) = 0249edc2dad72f81bb6c65142fbeee42
I see the error, the tokenizer.json did not download, its just 133bytes
I downloaded that manually and I have a different error :)
/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py . --outfile ./salamandra-2b-instruct_fp16.gguf
INFO:hf-to-gguf:Loading model:
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 256000}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
...
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 2048
INFO:hf-to-gguf:gguf: feed forward length = 5440
INFO:hf-to-gguf:gguf: head count = 16
INFO:hf-to-gguf:gguf: key-value head count = 16
INFO:hf-to-gguf:gguf: rope theta = 10000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 0
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_eos_token to False
INFO:gguf.vocab:Setting chat_template to {%- if not date_string is defined %}{%- set date_string = "2024-09-30" %}{%- endif %}{{ "<|im_start|>assistant
I am Salamandra, an AI language model developed at the Barcelona Supercomputing Centre (BSC) by the Language Technologies Unit. My knowledge base was last updated on August 2023. Today Date: "+ date_string +"
Soy Salamandra, un modelo lingüístico de IA desarrollado en el Barcelona Supercomputing Centre (BSC) por la Language Technologies Unit. Mi base de conocimientos se actualizó por última vez en agosto de 2023.
Soc Salamandra, un model de llenguatge d'IA desenvolupat al Barcelona Supercomputing Centre (BSC) per la Language Technologies Unit. La meva base de coneixement es va actualitzar per última vegada l'agost de 2023.<|im_end|>
" }}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:salamandra-2b-instruct_fp16.gguf: n_tensors = 219, total_size = 4.5G
Traceback (most recent call last):
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
main()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
model_instance.write()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 436, in write
self.gguf_writer.write_kv_data_to_file()
File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 240, in write_kv_data_to_file
kv_bytes += self._pack_val(val.value, val.type, add_vtype=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 893, in _pack_val
raise ValueError("All items in a GGUF array should be of the same type")
ValueError: All items in a GGUF array should be of the same type
could it be because of the inversion in the down weights?
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {5440, 2048}
INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 5440}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}
INFO:hf-to-gguf:blk.9.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048}
I see the error, the tokenizer.json did not download, its just 133bytes
I downloaded that manually and I have a different error :)
/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py . --outfile ./salamandra-2b-instruct_fp16.gguf INFO:hf-to-gguf:Loading model: INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... INFO:hf-to-gguf:gguf: loading model part 'model.safetensors' INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {2048, 256000} INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 256000} INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048} ... INFO:hf-to-gguf:blk.9.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 2048} INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {2048} INFO:hf-to-gguf:Set meta model INFO:hf-to-gguf:Set model parameters INFO:hf-to-gguf:gguf: context length = 8192 INFO:hf-to-gguf:gguf: embedding length = 2048 INFO:hf-to-gguf:gguf: feed forward length = 5440 INFO:hf-to-gguf:gguf: head count = 16 INFO:hf-to-gguf:gguf: key-value head count = 16 INFO:hf-to-gguf:gguf: rope theta = 10000.0 INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05 INFO:hf-to-gguf:gguf: file type = 1 INFO:hf-to-gguf:Set model tokenizer INFO:gguf.vocab:Setting special token type bos to 1 INFO:gguf.vocab:Setting special token type eos to 2 INFO:gguf.vocab:Setting special token type unk to 0 INFO:gguf.vocab:Setting special token type pad to 0 INFO:gguf.vocab:Setting add_bos_token to True INFO:gguf.vocab:Setting add_eos_token to False INFO:gguf.vocab:Setting chat_template to {%- if not date_string is defined %}{%- set date_string = "2024-09-30" %}{%- endif %}{{ "<|im_start|>assistant I am Salamandra, an AI language model developed at the Barcelona Supercomputing Centre (BSC) by the Language Technologies Unit. My knowledge base was last updated on August 2023. Today Date: "+ date_string +" Soy Salamandra, un modelo lingüístico de IA desarrollado en el Barcelona Supercomputing Centre (BSC) por la Language Technologies Unit. Mi base de conocimientos se actualizó por última vez en agosto de 2023. Soc Salamandra, un model de llenguatge d'IA desenvolupat al Barcelona Supercomputing Centre (BSC) per la Language Technologies Unit. La meva base de coneixement es va actualitzar per última vegada l'agost de 2023.<|im_end|> " }}{% for message in messages %}{{'<|im_start|>' + message['role'] + ' ' + message['content'] + '<|im_end|>' + ' '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant ' }}{% endif %} INFO:hf-to-gguf:Set model quantization version INFO:gguf.gguf_writer:Writing the following files: INFO:gguf.gguf_writer:salamandra-2b-instruct_fp16.gguf: n_tensors = 219, total_size = 4.5G Traceback (most recent call last): File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module> main() File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main model_instance.write() File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 436, in write self.gguf_writer.write_kv_data_to_file() File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 240, in write_kv_data_to_file kv_bytes += self._pack_val(val.value, val.type, add_vtype=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 893, in _pack_val raise ValueError("All items in a GGUF array should be of the same type") ValueError: All items in a GGUF array should be of the same type
Hey @robbiemu , I am also working on this. I have discovered that tokenizer.json was missing in the base models, but they have updated the repos a few minutes ago: https://huggingface.co/BSC-LT/salamandra-7b/discussions/1#6707a5f2f9b028c1857f2df0
I hadpulled the instruct .. it was a stub when I downloaded the repo initially, but I thought maybe it was an LFS error so I downloaded it manually, as you can see:
ls -la
total 8849584
drwxr-xr-x 13 macdev wheel 416B Oct 6 13:14 .
drwxr-xr-x 4 macdev wheel 128B Oct 6 15:10 ..
drwxr-xr-x 13 macdev wheel 416B Oct 6 10:45 .git
-rw-r--r-- 1 macdev wheel 1.5K Oct 6 10:44 .gitattributes
-rw-r--r-- 1 macdev wheel 65K Oct 6 10:44 README.md
-rwxr-xr-x 1 macdev wheel 811B Oct 6 10:44 config.json
-rwxr-xr-x 1 macdev wheel 200B Oct 6 10:44 generation_config.json
drwxr-xr-x 4 macdev wheel 128B Oct 6 10:44 images
-rwxr-xr-x 1 macdev wheel 4.2G Oct 6 10:45 model.safetensors
-rwxr-xr-x 1 macdev wheel 513B Oct 6 10:44 special_tokens_map.json
-rw-r--r--@ 1 macdev staff 18M Oct 6 12:06 tokenizer.json
-rwxr-xr-x 1 macdev wheel 4.6M Oct 6 10:44 tokenizer.model
-rwxr-xr-x 1 macdev wheel 2.3K Oct 6 10:44 tokenizer_config.json
edit: huh, I see what you mean, the new one is a different size. the wires were crossed somewhere I guess. I will retry.
edit 2: actually, I see it is the same size, my filesystem just rounds differently for human readable file size ... on HF 19MB is 18MB for me on disk.. about to try anyway. BTW I had the same problem when I cloned the repo as last time. I get this file at first:
cat tokenizer.json
version https://git-lfs.github.com/spec/v1
oid sha256:139de51e6bbe12b772a255e157829f43bd67b63a4d55f1fe0e3abce37b2d8c9a
size 19066993
I have to download that one file manually. This happens during clone:
git clone https://huggingface.co/BSC-LT/salamandra-2b-instruct
Cloning into 'salamandra-2b-instruct'...
remote: Enumerating objects: 71, done.
remote: Counting objects: 100% (67/67), done.
remote: Compressing objects: 100% (66/66), done.
remote: Total 71 (delta 29), reused 0 (delta 0), pack-reused 4 (from 1)
Unpacking objects: 100% (71/71), 90.00 KiB | 1.70 MiB/s, done.
fatal: active `post-checkout` hook found during `git clone`:
/Users/Shared/Public/huggingface/salamandra-2b-instruct/.git/hooks/post-checkout
For security reasons, this is disallowed by default.
If this is intentional and the hook should actually be run, please
run the command again with `GIT_CLONE_PROTECTION_ACTIVE=false`
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
cd salamandra-2b-instruct
git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
edit 3: same results:
...
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:salamandra-2b-instruct_fp16.gguf: n_tensors = 219, total_size = 4.5G
Traceback (most recent call last):
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4430, in <module>
main()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 4424, in main
model_instance.write()
File "/Users/Shared/Public/Github/llama.cpp/convert_hf_to_gguf.py", line 436, in write
self.gguf_writer.write_kv_data_to_file()
File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 240, in write_kv_data_to_file
kv_bytes += self._pack_val(val.value, val.type, add_vtype=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Shared/Public/Github/llama.cpp/gguf-py/gguf/gguf_writer.py", line 893, in _pack_val
raise ValueError("All items in a GGUF array should be of the same type")
ValueError: All items in a GGUF array should be of the same type
we found the issue was just in the readme -- the Norwegian language is in the yaml spec as "- no" instead of "- \no", and because of how yaml works, that is read as a False.
I have quantized and uploaded the models to ollama, they can be run with ease with a single line of code:
ollama run hdnh2006/salamandra-7b-instruct
or
ollama run hdnh2006/salamandra-2b-instruct
I have quantized them from 2 to 8 bits and they can also be downloaded from HuggingFace:
https://huggingface.co/hdnh2006/BSC-LT-salamandra-2b-instruct-gguf
https://huggingface.co/hdnh2006/BSC-LT-salamandra-7b-instruct-gguf
Ollama links:
https://ollama.com/hdnh2006/salamandra-2b-instruct
https://ollama.com/hdnh2006/salamandra-7b-instruct