Model slow to generate tokens

#5
by DefamationStation - opened

Hello,

4090 i9 13900HX and using LM Studio Beta v6

Models speed at n_gpu_layers -1, 10, 20, 40 and the models output generation speed is the same, not sure if it's my machine or the models performance but usually I get entire walls of text instantly from models like this

It might be that you're using the fast tokenizer for ordinary llama models. (In text-generation-webui, that's the use_fast option). This has its own tokenizer, so you have to use that.

Sign up or log in to comment