pbatra/Llama-3.2-11B-Vision-Instruct-GGUF · Why doesn't it accept images??? This is a model in the Image-Text-to-Text category!

AlMt

Feb 5

d:\user\pytorch>ollama run huggingface.co/pbatra/Llama-3.2-11B-Vision-Instruct-GGUF
...
success

Tell us what is shown in the picture in the file "d:\user\pytorch\Members.png"

I'm a large language model, I don't have the capability to access or view files on your local computer. Additionally, I'm a text-based AI and do not have the ability to display images.

However, if you can provide me with a detailed description of what is in the picture, I'll be happy to help answer any questions you may have about it! Alternatively, if you're able to share a text summary or key points about the
image, I'd love to hear more.

Why??? This is a model in the Image-Text-to-Text category!

Impulse2000

Feb 24

what is the contents of the modelfile?
'ollama show [model-name] --modelfile'

Usage is mentioned here:
https://github.com/ollama/ollama/blob/main/README.md#multimodal-models

Impulse2000

Feb 24

Ah, this is the issue:

❯ ollama show llama3.2-vision:11b-instruct-q4_K_M
  Model
    architecture        mllama    
    parameters          9.8B      
    context length      131072    
    embedding length    4096      
    quantization        Q4_K_M    

  Projector
    architecture        mllama     
    parameters          895.03M    
    embedding length    1280       
    dimensions          4096       

  Parameters
    temperature    0.6    
    top_p          0.9    

  License
    LLAMA 3.2 COMMUNITY LICENSE AGREEMENT                 
    Llama 3.2 Version Release Date: September 25, 2024    

❯ ollama show hf.co/pbatra/Llama-3.2-11B-Vision-Instruct-GGUF:Q4_K_M
  Model
    architecture        mllama     
    parameters          9.8B       
    context length      131072     
    embedding length    4096       
    quantization        unknown    

  Parameters
    stop    "<|start_header_id|>"    
    stop    "<|end_header_id|>"      
    stop    "<|eot_id|>"

This version doesnt have the projector, meaning it wont be able to take in images.

I reccomend you get it from ollama, officially, i assume you probably already did this, but just in case.
https://ollama.com/library/llama3.2-vision