File size: 1,590 Bytes
de8fafd 66b7ffa de8fafd dffbe5f 08ffeb8 dffbe5f 081b24f dffbe5f 08ffeb8 7ca304f dffbe5f 08ffeb8 dffbe5f 08ffeb8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: apache-2.0
pipeline_tag: image-text-to-text
---
Moondream is a small vision language model designed to run efficiently on edge devices.
[Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
This repository contains the latest (**2025-01-09**) release of Moondream, as well as historical releases. The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application.
**Usage**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="2025-01-09",
trust_remote_code=True,
# Uncomment to run on GPU.
# device_map={"": "cuda"}
)
# Captioning
print("Short caption:")
print(model.caption(image, length="short")["caption"])
print("\nNormal caption:")
for t in model.caption(image, length="normal", stream=True)["caption"]:
# Streaming generation example, supported for caption() and detect()
print(t, end="", flush=True)
print(model.caption(image, length="normal"))
# Visual Querying
print("\nVisual query: 'How many people are in the image?'")
print(model.query(image, "How many people are in the image?")["answer"])
# Object Detection
print("\nObject detection: 'face'")
objects = model.detect(image, "face")["objects"]
print(f"Found {len(objects)} face(s)")
# Pointing
print("\nPointing: 'person'")
points = model.point(image, "person")["points"]
print(f"Found {len(points)} person(s)")
``` |