moondream
/

moondream-2b-2025-04-14-4bit

Image-Text-to-Text

Model card Files Files and versions Community

vikhyatk commited on 29 days ago

Commit

6e1575b

·

verified ·

1 Parent(s): c19ca21

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -7,7 +7,9 @@ Moondream is a small vision language model designed to run efficiently everywher
 [Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
-This repository contains the 2025-04-14 **int4** release of Moondream. There's more information about this version of the model in our [release blog post](https://moondream.ai/blog/smaller-faster-moondream-with-qat). Other revisions, as well as release history, can be found [here](https://huggingface.co/vikhyatk/moondream2).
 ### Usage
@@ -27,6 +29,11 @@ model = AutoModelForCausalLM.from_pretrained(
     device_map={"": "cuda"}
 )
 # Captioning
 print("Short caption:")
 print(model.caption(image, length="short")["caption"])

 [Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
+This repository contains the 2025-04-14 **int4** release of Moondream. On an RTX 3090, it uses 2,305 MB of VRAM and runs at a speed of 187 tokens/second.
+There's more information about this version of the model in our [release blog post](https://moondream.ai/blog/smaller-faster-moondream-with-qat). Other revisions, as well as release history, can be found [here](https://huggingface.co/vikhyatk/moondream2).
 ### Usage
     device_map={"": "cuda"}
 )
+# Optional, but recommended when running inference on a large number of
+# images since it has upfront compilation cost but significantly speeds
+# up inference:
+model.model.compile()
 # Captioning
 print("Short caption:")
 print(model.caption(image, length="short")["caption"])