Inference time not good vs the original model

by xJohn - opened 25 days ago

25 days ago

Hi,
I test model GGUF with this command
"llama-mtmd-cli -m typhooyphoon-ocr-7b.Q4_K_S.gguf --mmproj typhoon-ocr-7b.mmproj-f16.gguf -p "extract this image to text" --image "test.png".
llama-mtmd-cli use CUDA A10.
The inference time so long.
Do you have any suggestion?

nicoboss

24 days ago

You are running the model on CPU. No wonder you have a terrible experience. Please add -ngl 999 so you offload all layers to the GPU.

xJohn

24 days ago

@nicoboss Thank you

xJohn changed discussion status to closed 21 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment