Is there any quantized Pixtral Large model that can be run using the vLLM library ?

#22
by HamzaChekireb - opened

I am attempting to run a quantized Pixtral model using 160 GB of VRAM with the vLLM framework. However, I have not been able to find any compatible quantized model, as the available ones( https://huggingface.co/models?other=base_model:quantized:mistralai/Pixtral-Large-Instruct-2411 ) are unfortunately not supported by vLLM. Frameworks like vLLM and Ollama are crucial for managing concurrent users and enabling distributed inference effectively.

Sign up or log in to comment