Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,9 @@ Moondream is a small vision language model designed to run efficiently everywher
|
|
7 |
|
8 |
[Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
|
9 |
|
10 |
-
This repository contains the 2025-04-14 **int4** release of Moondream.
|
|
|
|
|
11 |
|
12 |
### Usage
|
13 |
|
@@ -27,6 +29,11 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
27 |
device_map={"": "cuda"}
|
28 |
)
|
29 |
|
|
|
|
|
|
|
|
|
|
|
30 |
# Captioning
|
31 |
print("Short caption:")
|
32 |
print(model.caption(image, length="short")["caption"])
|
|
|
7 |
|
8 |
[Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
|
9 |
|
10 |
+
This repository contains the 2025-04-14 **int4** release of Moondream. On an RTX 3090, it uses 2,305 MB of VRAM and runs at a speed of 187 tokens/second.
|
11 |
+
|
12 |
+
There's more information about this version of the model in our [release blog post](https://moondream.ai/blog/smaller-faster-moondream-with-qat). Other revisions, as well as release history, can be found [here](https://huggingface.co/vikhyatk/moondream2).
|
13 |
|
14 |
### Usage
|
15 |
|
|
|
29 |
device_map={"": "cuda"}
|
30 |
)
|
31 |
|
32 |
+
# Optional, but recommended when running inference on a large number of
|
33 |
+
# images since it has upfront compilation cost but significantly speeds
|
34 |
+
# up inference:
|
35 |
+
model.model.compile()
|
36 |
+
|
37 |
# Captioning
|
38 |
print("Short caption:")
|
39 |
print(model.caption(image, length="short")["caption"])
|