Example inference not working
The following example code is not working for me, may I know the library requirements?
Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="allenai/Molmo-7B-D-0924", trust_remote_code=True)
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
response = pipe(text=messages)
print(f"{response = }")
Issue:
ValueError: Unrecognized configuration class <class 'transformers_modules.allenai.Molmo-7B-D-0924.ac032b93b84a7f10c9578ec59f9f20ee9a8990a2.config_molmo.MolmoConfig'> for this kind of AutoModel: AutoModelForImageTextToText.
Model type should be one of AriaConfig, AyaVisionConfig, BlipConfig, Blip2Config, ChameleonConfig, Emu3Config, FuyuConfig, Gemma3Config, Gemma3nConfig, GitConfig, Glm4vConfig, GotOcr2Config, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, InternVLConfig, JanusConfig, Kosmos2Config, Llama4Config, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, Mistral3Config, MllamaConfig, PaliGemmaConfig, Pix2StructConfig, PixtralVisionConfig, Qwen2_5_VLConfig, Qwen2VLConfig, ShieldGemma2Config, SmolVLMConfig, UdopConfig, VipLlavaConfig, VisionEncoderDecoderConfig.