The following example code is not working for me, may I know the library requirements?

Use a pipeline as a high-level helper

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="allenai/Molmo-7B-D-0924", trust_remote_code=True)
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
response = pipe(text=messages)
print(f"{response = }")

Issue:
ValueError: Unrecognized configuration class <class 'transformers_modules.allenai.Molmo-7B-D-0924.ac032b93b84a7f10c9578ec59f9f20ee9a8990a2.config_molmo.MolmoConfig'> for this kind of AutoModel: AutoModelForImageTextToText.
Model type should be one of AriaConfig, AyaVisionConfig, BlipConfig, Blip2Config, ChameleonConfig, Emu3Config, FuyuConfig, Gemma3Config, Gemma3nConfig, GitConfig, Glm4vConfig, GotOcr2Config, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, InternVLConfig, JanusConfig, Kosmos2Config, Llama4Config, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, Mistral3Config, MllamaConfig, PaliGemmaConfig, Pix2StructConfig, PixtralVisionConfig, Qwen2_5_VLConfig, Qwen2VLConfig, ShieldGemma2Config, SmolVLMConfig, UdopConfig, VipLlavaConfig, VisionEncoderDecoderConfig.

allenai
/

Molmo-7B-O-0924

Example inference not working

Use a pipeline as a high-level helper