Invalid tokenizer regex?
When trying to load the Q4_K_M model with lastest llama.cpp commit (a8a1f335), I get this error:
init_tokenizer: initializing tokenizer for type 2
Failed to process regex: ''(?:[sSdDmMtT]|[lL][lL]|[vV][eE]|[rR][eE])|[^\r\n\p{L}\p{N}]?+\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]++[\r\n]*|\s*[\r\n]|\s+(?!\S)|\s+'
Regex error: regex_error(error_badrepeat): One of *?+{ was not preceded by a valid regular expression.
llama_model_load: error loading model: error loading model vocabulary: Failed to process regex
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '.\models\Ling-lite.Q4_K_M.gguf'
Anyone else had this?
Yeah, as usual, llama.cpp added support for a model but apparently didn't even bother to try it with any of the actual models (it seems all are pretty much broken). I'll investigate tomorrow, probably this repo will just go away.
Yeah, as usual, llama.cpp added support for a model but apparently didn't even bother to try it with any of the actual models (it seems all are pretty much broken). I'll investigate tomorrow, probably this repo will just go away.
That's harsh, I and several others did test, and did not get this error, there is definitely something wrong though as imatrix tokenization hangs, investigating...
Please follow https://github.com/ggml-org/llama.cpp/pull/12634 for further information. I assume this can be fixed using a future llama.cpp update and if not we will requantize this model as soon this issue got fixed. It is worth mentioning that using llama-server I'm unable to recreate this issue myself so far.
I just tested, it does work with your PR!