mradermacher/Ling-lite-GGUF · Invalid tokenizer regex?

25 days ago

When trying to load the Q4_K_M model with lastest llama.cpp commit (a8a1f335), I get this error:

init_tokenizer: initializing tokenizer for type 2
Failed to process regex: ''(?:[sSdDmMtT]|[lL][lL]|[vV][eE]|[rR][eE])|[^\r\n\p{L}\p{N}]?+\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]++[\r\n]*|\s*[\r\n]|\s+(?!\S)|\s+'
Regex error: regex_error(error_badrepeat): One of *?+{ was not preceded by a valid regular expression.
llama_model_load: error loading model: error loading model vocabulary: Failed to process regex
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '.\models\Ling-lite.Q4_K_M.gguf'

Anyone else had this?

mradermacher

Owner 25 days ago

Yeah, as usual, llama.cpp added support for a model but apparently didn't even bother to try it with any of the actual models (it seems all are pretty much broken). I'll investigate tomorrow, probably this repo will just go away.

CISCai

25 days ago

Yeah, as usual, llama.cpp added support for a model but apparently didn't even bother to try it with any of the actual models (it seems all are pretty much broken). I'll investigate tomorrow, probably this repo will just go away.

That's harsh, I and several others did test, and did not get this error, there is definitely something wrong though as imatrix tokenization hangs, investigating...

nicoboss

25 days ago

•

edited 25 days ago

Please follow https://github.com/ggml-org/llama.cpp/pull/12634 for further information. I assume this can be fixed using a future llama.cpp update and if not we will requantize this model as soon this issue got fixed. It is worth mentioning that using llama-server I'm unable to recreate this issue myself so far.

CISCai

25 days ago

Hopefully fixed in llama.cpp#12677

stduhpf

24 days ago

I just tested, it does work with your PR!