Model InternVLChatModel is not supported

by indychou - opened Feb 15

Feb 15

請問使用 convert_hf_to_gguf.py 轉換為 GGUF 會導致報錯。
INFO:hf-to-gguf:Loading model: llama-beeze2-8b-instruct
ERROR:hf-to-gguf:Model InternVLChatModel is not supported

能指點迷津嗎?

koungho

Feb 18

•

edited Feb 18

llamacpp的轉換程式還未支援此格式
https://github.com/ggml-org/llama.cpp/discussions/11768

harryli1986

Feb 18

請問有其他的解法嗎? 感謝

koungho

Feb 19

用vllm以BitsAndBytes在線量化的方式運行，不過運行參數有點難調整
https://docs.vllm.ai/en/latest/features/quantization/bnb.html

luisleo52655

May 25

•

edited May 25

vllm serve MediaTek-Research/Llama-Breeze2-8B-Instruct --chat-template breeze2.jinja --gpu-memory-utilization 0.9 --max-model-len 32784 --enable-auto-tool-choice --tool-call-parser llama3_json --tensor-parallel-size 1 --max_num_seqs 20

使用以上方法部屬在vLLM，會出現以下error:

KeyError: IMG_CONTEXT

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment