allenai
/

MolmoE-1B-0924

Image-Text-to-Text

text-generation

Mixture of Experts

Inference Endpoints

Model card Files Files and versions Community

VRAM - how to run

#6

by SinanAkkoyun - opened about 17 hours ago

about 17 hours ago

•

edited about 11 hours ago

How do you run inference with your API?

about 17 hours ago

•

edited about 11 hours ago

May I ask what quantization your demo inference runs at?

cktlco

about 15 hours ago

•

edited about 15 hours ago

This is an unquantized model. You can estimate the total expected VRAM required by adding the size of each checkpoint file.

To run this on a 24GB GPU, you can try this env var or the 4-bit per weight quantized model I linked in the last comment here: https://huggingface.co/allenai/MolmoE-1B-0924/discussions/4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment