Model can't produce certain character pairs - broken tokenization?

#9
by CyberShadowMD - opened

Try this completion:

One!
Two!
Three!
Four!

It should suggest "Five!" but it just cannot produce a ! followed by a newline.

Other character sequences have this problem as well. It makes this model unusable for certain programming languages.

Running deepseek-coder-33b-instruct.Q4_K_M.gguf under llama.cpp (tried many versions)...

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment