moondream
/

moondream-2b-2025-04-14-4bit

Image-Text-to-Text

Safetensors

GGUF

moondream1

custom_code

Model card Files Files and versions Community

vikhyatk commited on 29 days ago

Commit

c19ca21

verified ·

1 Parent(s): 08d8cec

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -33

README.md CHANGED Viewed

@@ -7,15 +7,16 @@ Moondream is a small vision language model designed to run efficiently everywher
 [Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
-This repository contains the 2025-04-14 **int4** release of Moondream, as well as [historical releases](https://huggingface.co/vikhyatk/moondream2/blob/main/versions.txt). The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application.
-Make sure to install the requirements:
 ```
-pip install -r https://depot.moondream.ai/transformers/requirements.txt
 ```
-### Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from PIL import Image
@@ -23,7 +24,6 @@ from PIL import Image
 model = AutoModelForCausalLM.from_pretrained(
     "moondream/moondream-2b-2025-04-14-4bit",
     trust_remote_code=True,
-    # Uncomment to run on GPU.
     device_map={"": "cuda"}
 )
@@ -50,30 +50,4 @@ print(f"Found {len(objects)} face(s)")
 print("\nPointing: 'person'")
 points = model.point(image, "person")["points"]
 print(f"Found {len(points)} person(s)")
-```
-### Changelog
-**int4-2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))
-1. Moondream uses a whole lot less memory (4.12 down to 2.47GB)
-2. Small device get a big speed up (44.54 to 67.84 tok/sec on a RTX 4050 Mobile)
-3. Improved spatial understanding (RealWorldQA up from 58.3 to 60.13)
-**2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))
-1. Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT)
-2. Added temperature and nucleus sampling to reduce repetitive outputs
-3. Better OCR for documents and tables (prompt with “Transcribe the text” or “Transcribe the text in natural reading order”)
-4. Object detection supports document layout detection (figure, formula, text, etc)
-5. UI understanding (ScreenSpot F1\@0.5 up from 53.3 to 60.3)
-6. Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3)
-**2025-03-27** ([full release notes](https://moondream.ai/blog/moondream-2025-03-27-release))
-1. Added support for long-form captioning
-2. Open vocabulary image tagging
-3. Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
-4. Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
-5. Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2)
-6. Fixed token streaming bug affecting multi-byte unicode characters
-7. gpt-fast style `compile()` now supported in HF Transformers implementation

 [Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
+This repository contains the 2025-04-14 **int4** release of Moondream. There's more information about this version of the model in our [release blog post](https://moondream.ai/blog/smaller-faster-moondream-with-qat). Other revisions, as well as release history, can be found [here](https://huggingface.co/vikhyatk/moondream2).
+### Usage
+Make sure to install the requirements:
 ```
+pip install pillow torchao
 ```
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from PIL import Image
 model = AutoModelForCausalLM.from_pretrained(
     "moondream/moondream-2b-2025-04-14-4bit",
     trust_remote_code=True,
     device_map={"": "cuda"}
 )
 print("\nPointing: 'person'")
 points = model.point(image, "person")["points"]
 print(f"Found {len(points)} person(s)")
+```