Error when loading with llama.cpp

#1
by Alsebay - opened

I have tried to download and use Q4_K_M version, but I got this error:
llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'smaug-bpe'
have you try:
python llama.cpp/gguf-py/scripts/gguf-new-metadata.py --pre-tokenizer llama-bpe input_model output_model ?
I have quant some model and also got this issue, fixed by this python script.

There might be two issues here - first, you have an antique version of llama.cpp, of course it doesn't support current quants. You need to update.

Second, no, I haven't tried setting the pretokenizer, because that breaks the quant. If that model really uses llama-bpe, then the model is currently broken, as it matches the smaug pretokenizer.

mradermacher changed discussion status to closed

I just tried the fresh installation in colab to test it, so maybe that the issue? πŸ€”
edit: Maybe that because oogabooga WebUI, will try independent llama.cpp. Anyways, thanks so much :).

OK, maybe my model use smaug-bpe pretokenizer instead of llama-bpe ._. . MergeKit will use smaug-bpe for default (In my experiece, it's just happen with llama-3 merge, dunno why)
edit: so...does that mean my model is broken? O_O

It's broken when the output is not good enough to be called working. If it works for you using the smaug pretokenizer, it's not broken.

Having said that, using the wrong pretokenizer is probably degrading the quality of the model, if the tensors were trained with the llama pretokenizer, because they are different. Different enough to be a big issue? I don't know, probably not. Will it be fully fixed by forcing the pretok. to llama-bpe? I do not know, it might have affected the tokenizer as a whole rather than just the pretokenizer.

Anyway, you can opt to in the future tell mergekit to do "the right thing"(tm) and/or redo the models with the "right" pretokenizer (in which case I will happily quantize them again). There might be another way to fix the issue, but I am just the poor guy doing quants, not the poor guy doing merges, so I have no clue :-)

Also, I did not know that this is due to mergekit defaults, that explains some of the weird uses of smaug-bpe in l3 based models I have seen recently.

I see. Maybe I will remake again this model series later. Thanks so much. You are very knowledgeable, thanks for your note that help me found the issue XD.

It strange that both of LumiMaid and Stheno is llama-bpe, I also use --copy-tokenizer for all of my merge model, so in theory it won't result as smaug-bpe @@
I give up, maybe I will let the mergekit's dev and llama.cpp's dev fixed this. Thanks again for helping me.

I have investigated a bit further, and something weird is going on. Your problem is fairly common recently, and the essential difference between smaug and llama seems to be ignore_merges: true (llama) vs. false (smaug). But your model indeed has "true", while the other l3-based models with smaug-bpe had "false". So there is something I don't quite get.

The pre-tokenizer is not guessed by llama.cpp, it is in fact measured, so there should be essentially zero chance of this being misdetected.

When manually comparing the tokenizer.json files files between s3theno and this model, however, there are a few differences in the post_processor and padding config that could have the same effect.

I have investigated a bit further, and something weird is going on. Your problem is fairly common recently, and the essential difference between smaug and llama seems to be ignore_merges: true (llama) vs. false (smaug). But your model indeed has "true", while the other l3-based models with smaug-bpe had "false". So there is something I don't quite get.

The pre-tokenizer is not guessed by llama.cpp, it is in fact measured, so there should be essentially zero chance of this being misdetected.

When manually comparing the tokenizer.json files files between s3theno and this model, however, there are a few differences in the post_processor and padding config that could have the same effect.

I have come to llama.cpp github and see that it have some bugs (6 closed and 1 still open) about llama 3 pre-tokenizer, also there are 1 pull request to improve detect llama-bpe tokenizer. So that mean llama.cpp convert-hf-to-gguf.py have a bug now, that what I think.

Also I try load your mradermacher/L3-Aethora-15B-i1-GGUF and mradermacher/L3-Aethora-15B-GGUF that you quantized for SteelSkull, it have this issue too, for the trained model. So in conclusion I think that because llama.cpp. We will wait they fix those trouble. :/ πŸ˜…
Edit: so....MergeKit do nothing wrong, it still keep the true tokenizer from origin model, sorry for false information.

The only ticket open with smaug-bpe I can find is by jim-plus, who already had his previous ticket closed because it is a model problem, not an issue in llama.cpp. The very premise of his ticket (that there is a hash collision) can be dismissed immediately (the hashes are different). My personal suggestion would be to not wait for miracles or engage in wishful thinking but deal with the reality of the situation.

Besides, while it may be true that mergekit kept the tokenizer from the original model, the original model is not llama-3 - your tokenizer is clearly NOT the original llama-3-8b-instruct tokenizer. So mergekit (and you :) may be off the hook, but so is llama.cpp. And that means if your base model is a fine tune, it was fine-tuned (possibly in error) with the smaug-bpe pre-tokenizer, which means smaug-bpe would be the correct pretokenizer setting.

Now, I am not following llama.cpp bug reports - if you decide to wait it out, I'd be happy if you could drop me a note, and I can requant any model you wish with the fixed llama.cpp if this is deemed to be an issue in llama.cpp.

Besides, while it may be true that mergekit kept the tokenizer from the original model, the original model is not llama-3 - your tokenizer is clearly NOT the original llama-3-8b-instruct tokenizer. So mergekit (and you :) may be off the hook, but so is llama.cpp. And that means if your base model is a fine tune, it was fine-tuned (possibly in error) with the smaug-bpe pre-tokenizer, which means smaug-bpe would be the correct pretokenizer setting.

I see, thanks so much for let me now about it. So that mean there is 50/50 chance (at least) that the smaug-bpe pretokenizer is because MergeKit and me.

Now, I am not following llama.cpp bug reports - if you decide to wait it out, I'd be happy if you could drop me a note, and I can requant any model you wish with the fixed llama.cpp if this is deemed to be an issue in llama.cpp.

Sure :) I think this trouble series will be fixed soon (Maybe MergeKit, maybe llama.cpp). (Maybe I conclusion too soon) I will ask you requant again in the future.
Thanks for your hard work. I really appreciate that. XD
Sorry if I make trouble and distrub you too much. πŸ˜…

No, I mean you inherited the smaug-bpe "problem" from your base model. And also, smaug-bpe might be as good or better as llama-bpe at this point.

I haven't traced it yet, but I think at some point in the past there was some confusion, and a few fine-tunes inherited the wrong pretokenizer, and then everything base don those fine-tunes inherited it.

So I think it's 100% sure you have nothing to do with it, smaug-bpe came from your base model.

Anyway, yes, I am looking toward the future and the resolution of that bug report :)

And no, you don't trouble and disturb me at all. It wuld be nice if I had all the answers, but I don't, and I learn new details together with you :)

I see. It very lucky for me that I meet you. :)

No, I mean you inherited the smaug-bpe "problem" from your base model. And also, smaug-bpe might be as good or better as llama-bpe at this point.

I haven't traced it yet, but I think at some point in the past there was some confusion, and a few fine-tunes inherited the wrong pretokenizer, and then everything base don those fine-tunes inherited it.

So I think it's 100% sure you have nothing to do with it, smaug-bpe came from your base model.

Anyway, yes, I am looking toward the future and the resolution of that bug report :)

Haha, yeah. I have some misunderstanding :P. Anyways, you are very knowlegdable, I have learn a lot of new things from you.
PS: I have read carefully your comment and then check your quant about Lumimaid, and you are right, it have smaug-bpe pre-tokenizer. I also try Stheno-v3.2 from your quant repo, it normal (llama-bpe). :-) so that my conclusion is: mergekit copy both of tokenizer from Stheno and Lumimaid, but there is a 'small' conflict between they pre-tokenizer, so mergekit choose smaug-bpe at the end. (Because LumiMaid is the base model, mergekit will copy tokenizer from LumiMaid.)
That mean I must carefully check that the source model is smaug-bpe or llama-bpe or not before merge again, hahaha. πŸ˜‚

Sign up or log in to comment