koboldcpp thinks it is a GPT-NEO-X model?
Does config.json need to be in a certain location for it to load as an MPT model? Edit: config.json is next to the bin
python3.10 koboldcpp.py --model ../models-koboldcpp/mpt-30B-chat-GGML/mpt-30b-chat.ggmlv0.q5_0.bin --useclblast 0 0 --contextsize 8192
Welcome to KoboldCpp - Version 1.32
Attempting to use CLBlast library for faster prompt ingestion. A compatible clblast will be required.
Initializing dynamic library: koboldcpp_clblast.so
Loading model: /home/sapien/m2/ai-chat/models-koboldcpp/mpt-30B-chat-GGML/mpt-30b-chat.ggmlv0.q5_0.bin
[Threads: 7, BlasThreads: 7, SmartContext: False]
Identified as GPT-NEO-X model: (ver 406)
Attempting to Load...
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
gpt_neox_model_load: loading model from '/home/sapien/m2/ai-chat/models-koboldcpp/mpt-30B-chat-GGML/mpt-30b-chat.ggmlv0.q5_0.bin' - please wait ...
gpt_neox_model_load: n_vocab = 7168
gpt_neox_model_load: n_ctx = 8192
gpt_neox_model_load: n_embd = 64
gpt_neox_model_load: n_head = 48
gpt_neox_model_load: n_layer = 50432
gpt_neox_model_load: n_rot = 1090519040
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype = 2008
gpt_neox_model_load: qntvr = 2
gpt_neox_model_load: ggml ctx size = 104213.61 MB
Platform:0 Device:0 - NVIDIA CUDA with NVIDIA GeForce RTX 3090
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3090'
ggml_opencl: device FP16 support: false
CL FP16 temporarily disabled pending further optimization.
GGML_ASSERT: ggml.c:4164: ctx->mem_buffer != NULL
[1] 29383 IOT instruction (core dumped) python3.10 koboldcpp.py --model --useclblast 0 0 --contextsize 8192
Fixed by adding this to my command: --forceversion 500
Yeah I should mention that in the README. It's a bug in KoboldCpp, LostRuins is aware and will fix it in the next release.