IQ4_XS quant possibly broken
#1
by
llama-anon
- opened
./llama-server --model ~/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2 -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
build: 6085 (ef0144c0) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
system info: n_threads = 6, n_threads_batch = 6, total_threads = 12
system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 860 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 11
main: loading model
srv load_model: loading model '/home/john/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3060) - 11609 MiB free
llama_model_load: error loading model: tensor 'blk.23.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/john/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2'
srv load_model: failed to load model, '/home/john/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
john@debian:~/TND/llama.cpp/build/bin$ cd /home/john/TND/AI/base-glm4.5/
john@debian:~/TND/AI/base-glm4.5$ sha256sum GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2
973909a53c0ef7ab63d3955d51f61c1c5dd65336283a13ca93cca30d90b5e994 GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2
john@debian:~/TND/AI/base-glm4.5$ sha256sum GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part2of2
c3a965e1dcc94e912cc12fec7e84d32cb773cf11119a00488a7537f6dd90ea19 GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part2of2
They are not broken and work perfectly fine. You just need to concatinate all the parts first. Obviously it wont work if you try to load a partial file only containing half the model.
We don't use the GGUF split file format for many different reasons. We simple split the GGUF into parts so you can concatenate them like any other file using:cat GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2 GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part2of2 > GLM-4.5-Air-Base.i1-IQ4_XS.gguf
Alternatively you can download the already concatenated files from our download page under https://hf.tst.eu/model#GLM-4.5-Air-Base-GGUF
llama-anon
changed discussion status to
closed