Ollama and llama.cpp
First of all, congrats on this incredible good model. I'm running this on the CPU of my laptop and can get captions at a rate of about 2-3 images per minute. The caption quality is comparable to Llava 1.6 running at 4-bit quantization with Ollama, maybe moondream hallucinates a little less than llava.
Would you be interested in sharing this model in the Ollama library? Ollama (and it's backend llama.cpp) now support a Vulkan backend, which means I will be able to run this on my laptops iGPU. With Llava 1.6, the speedup is more than x2.
@vikhyatk , I see now that moondream is now on ollama library: https://www.ollama.com/library/moondream
Do you know which version is this? I prefer to use it through ollama because it's much faster than transformers. But I also want to use the latest version and do not want to fall behind.
@vikhyatk , I see now that moondream is now on ollama library: https://www.ollama.com/library/moondream
Do you know which version is this? I prefer to use it through ollama because it's much faster than transformers. But I also want to use the latest version and do not want to fall behind.
It may be an older version actually, I’m not sure how it gets updated. Will try to find out.