Feedback
The rambling at the end is quite persistent and spoils it for me, which is unfortunate, because the generations are seriously good!
I feel that. I produced some other versions of this blend that were stable, but the generations they produced were also much less interesting. Unfortunately, I haven't figured out the sweet spot yet. If I manage to produce a new version that doesn't have the EOS issue, I'll make sure to direct people to that model from this model.
Thanks for giving it a try and sharing your feedback.
Hi @sophosympatheia . I had simmilar problem lastly. I see you use tokenizer_config from R1. Maybe you should add generation_config also and include "pad_token": "<|finetune_right_pad_id|>" and the rest of tokens from Nova-Tempus v0.1 into tokenizer_config? I am total novice in this field but for me it worked.
I am also very curious why GGUF inference engines mostly do not have this issue...
@Huzderu you inference with full weight model right?
@altomek Can you elaborate on what worked for you? I am not all that skilled when it comes to tinkering directly with tokenizer configs. I played around with overwriting the tokenizer files using copies from Deepseek and Nova-Tempus v0.1, but I still had the EOS issue in every test case. It sounds like you're doing something more sophisticated than that.
I'm currently messing around with some new merge recipes leveraging nuslerp and sce while attempting to use mergekit's built-in controls for handling the tokenizer more precisely. I'm hoping that one of those experiments yields similar or better results to this model, without the issues. That being said, if there's a way to fix this version just by manipulating some JSON in a conf file, I'm all ears! I just might need you (or someone) to spell it out for me.
Sure, here is my commit.
As both models use different tokenizer_config, maybe you could add what v0.1 uses into what you already have in v0.2? At least try to add pad_token to tokenizer_config, change eos_token and additionally add generation_config file...
Tokenizers are magic for me, hope you suceed with mergekit tokenizer handling. Would love to know how it would go!
Thanks! I think you're on the right track. I am testing those new merges right now, the ones where I specifically asked for a union of the tokenizers, and the EOS problem seems to be solved in those models. The tokenizer configuration files are much closer to what you specified, but there's also some work that mergekit performs on the embeddings during the merge process that is necessary. Just adjusting the conf files wasn't enough, and in fact that seems to make problems worse if the embeddings don't line up with the conf file specifications.
Probably nothing short of a new merge with the adjusted tokenizer settings in the mergekit recipe will fix the problem. I still need to do more testing to decide whether either of the new merges I made today are worth releasing to supersede this one. If not, or maybe even if so, I'll recreate this merge using the fixed tokenizer recipe and upload the fixed copy.
My apologies to everyone for this merge being so off. I'm used to the models having the same tokenizer settings, so I got lazy with the tokenizer settings in my mergekit recipe. That proved to be a big mistake for this merge where there were substantial differences in the tokenizer configurations between the models used in the merge recipe. Lesson learned! Mergekit supports all those tokenizer-related settings for a reason. 😂
EDIT: Actually, even the new merges are kind of messed up, or at least the SCE merge is having issues (although it's pretty fun). Perhaps a tokenizer union wasn't the right approach. The bias is definitely in favor of Deepseek's tokenizer. For now, I was able to make it behave by adding some custom stopping strings to my SillyTavern settings: ["<|end▁of▁sentence|>","<|end▁of▁text|>","<|end_of_text|>", "<|end_of_sentence|>"]
Even with the issue, I'm still a huge fan of this merge. I can always try to roll for a less rambly swipe, or just edit the response to cut it off at an earlier stopping point. If you can look past the jank, it's definitely an improvement over v0.1. At the moment, I keep coming back to it over 120B models.
Using 8-bit exl2, in case that makes a difference.
Thanks, @MikeRoz . I'm glad you're enjoying it!
Look forward to some more versions soon that should have less jank. I made some progress today in that direction. If nothing else, I know I have a viable workaround for the versions I made today, but I'm going to try some more variations to see if I can achieve a clean fix.
EDIT: Setting the tokenizer source to Deepseek's model did the trick. No more EOS issues, no more need to add custom stopping strings to the generation config. I'll have a new model up soon that should be a considerable improvement over this one, both in terms of jank and writing quality.
I fixed the tokenizer issues and updated the repo. I alerted the people who had already quantized this model so that they hopefully update their quants as well.