Spaces:
Running
on
A10G
Error
Error: Error quantizing: b'main: build = 3337 (a8db2a9c)\nmain: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu\nmain: quantizing 'Einstein-v7-Qwen2-7B.fp16.gguf' to 'einstein-v7-qwen2-7b-iq4_xs-imat.gguf' as IQ4_XS\nllama_model_loader: loaded meta data with 20 key-value pairs and 339 tensors from Einstein-v7-Qwen2-7B.fp16.gguf (version GGUF V3 (latest))\nllama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\nllama_model_loader: - kv 0: general.architecture str = qwen2\nllama_model_loader: - kv 1: general.name str = Einstein-v7-Qwen2-7B\nllama_model_loader: - kv 2: qwen2.block_count u32 = 28\nllama_model_loader: - kv 3: qwen2.context_length u32 = 131072\nllama_model_loader: - kv 4: qwen2.embedding_length u32 = 3584\nllama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 18944\nllama_model_loader: - kv 6: qwen2.attention.head_count u32 = 28\nllama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 4\nllama_model_loader: - kv 8: qwen2.rope.freq_base f32 = 1000000.000000\nllama_model_loader: - kv 9: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001\nllama_model_loader: - kv 10: general.file_type u32 = 1\nllama_model_loader: - kv 11: tokenizer.ggml.model str = gpt2\nllama_model_loader: - kv 12: tokenizer.ggml.pre str = qwen2\nllama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ...\nllama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...\nllama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,151387] = ["\xc4\xa0 \xc4\xa0", "\xc4\xa0\xc4\xa0 \xc4\xa0\xc4\xa0", "i n", "\xc4\xa0 t",...\nllama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 151645\nllama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 = 151646\nllama_model_loader: - kv 18: tokenizer.chat_template str = {% if not add_generation_prompt is de...\nllama_model_loader: - kv 19: general.quantization_version u32 = 2\nllama_model_loader: - type f32: 141 tensors\nllama_model_loader: - type f16: 198 tensors\n================================ Have weights data with 224 entries\n[ 1/ 339] token_embd.weight - [ 3584, 152064, 1, 1], type = f16, \n====== llama_model_quantize_internal: did not find weights for token_embd.weight\nconverting to iq4_xs .. size = 1039.50 MiB -> 276.12 MiB\n[ 2/ 339] blk.0.attn_norm.weight - [ 3584, 1, 1, 1], type = f32, size = 0.014 MB\n[ 3/ 339] blk.0.ffn_down.weight - [18944, 3584, 1, 1], type = f16, \n====== llama_model_quantize_internal: imatrix size 14336 is different from tensor size 18944 for blk.0.ffn_down.weight\nllama_model_quantize: failed to quantize: imatrix size 14336 is different from tensor size 18944 for blk.0.ffn_down.weight\nmain: failed to quantize model from 'Einstein-v7-Qwen2-7B.fp16.gguf'\n'
This looks like a model-specific error! Since we don't do anything special on top of llama.cpp this is an issue for llama.cpp.