Example inference not working

#20
by suman-off - opened

The following example code is not working for me, may I know the library requirements?

Use a pipeline as a high-level helper

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="allenai/Molmo-7B-D-0924", trust_remote_code=True)
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
response = pipe(text=messages)
print(f"{response = }")

Issue:
ValueError: Unrecognized configuration class <class 'transformers_modules.allenai.Molmo-7B-D-0924.ac032b93b84a7f10c9578ec59f9f20ee9a8990a2.config_molmo.MolmoConfig'> for this kind of AutoModel: AutoModelForImageTextToText.
Model type should be one of AriaConfig, AyaVisionConfig, BlipConfig, Blip2Config, ChameleonConfig, Emu3Config, FuyuConfig, Gemma3Config, Gemma3nConfig, GitConfig, Glm4vConfig, GotOcr2Config, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, InternVLConfig, JanusConfig, Kosmos2Config, Llama4Config, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, Mistral3Config, MllamaConfig, PaliGemmaConfig, Pix2StructConfig, PixtralVisionConfig, Qwen2_5_VLConfig, Qwen2VLConfig, ShieldGemma2Config, SmolVLMConfig, UdopConfig, VipLlavaConfig, VisionEncoderDecoderConfig.

Sign up or log in to comment