mradermacher/GLM-4.5-Air-Base-i1-GGUF · IQ4

11 days ago

•

./llama-server --model ~/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2 -ot ffn_up_shexp=CUDA0 -ot exps=CPU -ngl 100 -t 6 -c 16384 --no-mmap -fa -ub 2048 -b 2048 
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes
build: 6085 (ef0144c0) with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
system info: n_threads = 6, n_threads_batch = 6, total_threads = 12

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 860 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

main: binding port with default address family
main: HTTP server is listening, hostname: 127.0.0.1, port: 8080, http threads: 11
main: loading model
srv    load_model: loading model '/home/john/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2'
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3060) - 11609 MiB free
llama_model_load: error loading model: tensor 'blk.23.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/john/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2'
srv    load_model: failed to load model, '/home/john/TND/AI/base-glm4.5/GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
john@debian:~/TND/llama.cpp/build/bin$ cd /home/john/TND/AI/base-glm4.5/
john@debian:~/TND/AI/base-glm4.5$ sha256sum GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2 
973909a53c0ef7ab63d3955d51f61c1c5dd65336283a13ca93cca30d90b5e994  GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2
john@debian:~/TND/AI/base-glm4.5$ sha256sum GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part2of2 
c3a965e1dcc94e912cc12fec7e84d32cb773cf11119a00488a7537f6dd90ea19  GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part2of2

nicoboss

11 days ago

•

edited 11 days ago

They are not broken and work perfectly fine. You just need to concatinate all the parts first. Obviously it wont work if you try to load a partial file only containing half the model.

We don't use the GGUF split file format for many different reasons. We simple split the GGUF into parts so you can concatenate them like any other file using:
cat GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part1of2 GLM-4.5-Air-Base.i1-IQ4_XS.gguf.part2of2 > GLM-4.5-Air-Base.i1-IQ4_XS.gguf

Alternatively you can download the already concatenated files from our download page under https://hf.tst.eu/model#GLM-4.5-Air-Base-GGUF

llama-anon changed discussion status to closed 11 days ago

mradermacher
/

GLM-4.5-Air-Base-i1-GGUF

IQ4_XS quant possibly broken