Promising
Wow, this model is tiny and fast and seems fairly useful. Might be perfect for CLI work or be able to use a program that doesn't use a server.
Though i'm quickly finding... if you give it a lot of information (over 4k tokens) processing the prompt slows to a crawl. Is this normal?
Yes, that’s expected. Longer prompts need more compute. Even small models slow down past a few thousand tokens because of how attention inside scales.
Yeah it just seems to slow down quadradically. So if you keep the input under 2k tokens it is fast. Curiously even the output slowed down heavily, when i had short inputs the outputs were lightning fast too.
As mentioned though, this may be perfect for CLI processing where i pass it a couple paragraphs and have it fix basic problems. Looking forward to the other planned models!