vikhyatk
/

moondream2

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

moondream2 / README.md

vikhyatk's picture

Update README.md

081b24f verified 10 days ago

|

1.59 kB

	---
	license: apache-2.0
	pipeline_tag: image-text-to-text
	---

	Moondream is a small vision language model designed to run efficiently on edge devices.

	[Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)

	This repository contains the latest (2025-01-09) release of Moondream, as well as historical releases. The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application.


	Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from PIL import Image

	model = AutoModelForCausalLM.from_pretrained(
	"vikhyatk/moondream2",
	revision="2025-01-09",
	trust_remote_code=True,
	# Uncomment to run on GPU.
	# device_map={"": "cuda"}
	)

	# Captioning
	print("Short caption:")
	print(model.caption(image, length="short")["caption"])

	print("\nNormal caption:")
	for t in model.caption(image, length="normal", stream=True)["caption"]:
	# Streaming generation example, supported for caption() and detect()
	print(t, end="", flush=True)
	print(model.caption(image, length="normal"))

	# Visual Querying
	print("\nVisual query: 'How many people are in the image?'")
	print(model.query(image, "How many people are in the image?")["answer"])

	# Object Detection
	print("\nObject detection: 'face'")
	objects = model.detect(image, "face")["objects"]
	print(f"Found {len(objects)} face(s)")

	# Pointing
	print("\nPointing: 'person'")
	points = model.point(image, "person")["points"]
	print(f"Found {len(points)} person(s)")
	```