Update README.md
Browse files
README.md
CHANGED
@@ -7,15 +7,16 @@ Moondream is a small vision language model designed to run efficiently everywher
|
|
7 |
|
8 |
[Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
|
9 |
|
10 |
-
This repository contains the 2025-04-14 **int4** release of Moondream
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
Make sure to install the requirements:
|
13 |
```
|
14 |
-
pip install
|
15 |
```
|
16 |
|
17 |
-
### Usage
|
18 |
-
|
19 |
```python
|
20 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
21 |
from PIL import Image
|
@@ -23,7 +24,6 @@ from PIL import Image
|
|
23 |
model = AutoModelForCausalLM.from_pretrained(
|
24 |
"moondream/moondream-2b-2025-04-14-4bit",
|
25 |
trust_remote_code=True,
|
26 |
-
# Uncomment to run on GPU.
|
27 |
device_map={"": "cuda"}
|
28 |
)
|
29 |
|
@@ -50,30 +50,4 @@ print(f"Found {len(objects)} face(s)")
|
|
50 |
print("\nPointing: 'person'")
|
51 |
points = model.point(image, "person")["points"]
|
52 |
print(f"Found {len(points)} person(s)")
|
53 |
-
```
|
54 |
-
|
55 |
-
### Changelog
|
56 |
-
**int4-2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))
|
57 |
-
1. Moondream uses a whole lot less memory (4.12 down to 2.47GB)
|
58 |
-
2. Small device get a big speed up (44.54 to 67.84 tok/sec on a RTX 4050 Mobile)
|
59 |
-
3. Improved spatial understanding (RealWorldQA up from 58.3 to 60.13)
|
60 |
-
|
61 |
-
|
62 |
-
**2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))
|
63 |
-
|
64 |
-
1. Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT)
|
65 |
-
2. Added temperature and nucleus sampling to reduce repetitive outputs
|
66 |
-
3. Better OCR for documents and tables (prompt with “Transcribe the text” or “Transcribe the text in natural reading order”)
|
67 |
-
4. Object detection supports document layout detection (figure, formula, text, etc)
|
68 |
-
5. UI understanding (ScreenSpot F1\@0.5 up from 53.3 to 60.3)
|
69 |
-
6. Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3)
|
70 |
-
|
71 |
-
**2025-03-27** ([full release notes](https://moondream.ai/blog/moondream-2025-03-27-release))
|
72 |
-
|
73 |
-
1. Added support for long-form captioning
|
74 |
-
2. Open vocabulary image tagging
|
75 |
-
3. Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
|
76 |
-
4. Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
|
77 |
-
5. Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2)
|
78 |
-
6. Fixed token streaming bug affecting multi-byte unicode characters
|
79 |
-
7. gpt-fast style `compile()` now supported in HF Transformers implementation
|
|
|
7 |
|
8 |
[Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
|
9 |
|
10 |
+
This repository contains the 2025-04-14 **int4** release of Moondream. There's more information about this version of the model in our [release blog post](https://moondream.ai/blog/smaller-faster-moondream-with-qat). Other revisions, as well as release history, can be found [here](https://huggingface.co/vikhyatk/moondream2).
|
11 |
+
|
12 |
+
### Usage
|
13 |
+
|
14 |
+
Make sure to install the requirements:
|
15 |
|
|
|
16 |
```
|
17 |
+
pip install pillow torchao
|
18 |
```
|
19 |
|
|
|
|
|
20 |
```python
|
21 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
22 |
from PIL import Image
|
|
|
24 |
model = AutoModelForCausalLM.from_pretrained(
|
25 |
"moondream/moondream-2b-2025-04-14-4bit",
|
26 |
trust_remote_code=True,
|
|
|
27 |
device_map={"": "cuda"}
|
28 |
)
|
29 |
|
|
|
50 |
print("\nPointing: 'person'")
|
51 |
points = model.point(image, "person")["points"]
|
52 |
print(f"Found {len(points)} person(s)")
|
53 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|