vikhyatk commited on
Commit
c19ca21
·
verified ·
1 Parent(s): 08d8cec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -33
README.md CHANGED
@@ -7,15 +7,16 @@ Moondream is a small vision language model designed to run efficiently everywher
7
 
8
  [Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
9
 
10
- This repository contains the 2025-04-14 **int4** release of Moondream, as well as [historical releases](https://huggingface.co/vikhyatk/moondream2/blob/main/versions.txt). The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application.
 
 
 
 
11
 
12
- Make sure to install the requirements:
13
  ```
14
- pip install -r https://depot.moondream.ai/transformers/requirements.txt
15
  ```
16
 
17
- ### Usage
18
-
19
  ```python
20
  from transformers import AutoModelForCausalLM, AutoTokenizer
21
  from PIL import Image
@@ -23,7 +24,6 @@ from PIL import Image
23
  model = AutoModelForCausalLM.from_pretrained(
24
  "moondream/moondream-2b-2025-04-14-4bit",
25
  trust_remote_code=True,
26
- # Uncomment to run on GPU.
27
  device_map={"": "cuda"}
28
  )
29
 
@@ -50,30 +50,4 @@ print(f"Found {len(objects)} face(s)")
50
  print("\nPointing: 'person'")
51
  points = model.point(image, "person")["points"]
52
  print(f"Found {len(points)} person(s)")
53
- ```
54
-
55
- ### Changelog
56
- **int4-2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))
57
- 1. Moondream uses a whole lot less memory (4.12 down to 2.47GB)
58
- 2. Small device get a big speed up (44.54 to 67.84 tok/sec on a RTX 4050 Mobile)
59
- 3. Improved spatial understanding (RealWorldQA up from 58.3 to 60.13)
60
-
61
-
62
- **2025-04-15** ([full release notes](https://moondream.ai/blog/moondream-2025-04-14-release))
63
-
64
- 1. Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT)
65
- 2. Added temperature and nucleus sampling to reduce repetitive outputs
66
- 3. Better OCR for documents and tables (prompt with “Transcribe the text” or “Transcribe the text in natural reading order”)
67
- 4. Object detection supports document layout detection (figure, formula, text, etc)
68
- 5. UI understanding (ScreenSpot F1\@0.5 up from 53.3 to 60.3)
69
- 6. Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3)
70
-
71
- **2025-03-27** ([full release notes](https://moondream.ai/blog/moondream-2025-03-27-release))
72
-
73
- 1. Added support for long-form captioning
74
- 2. Open vocabulary image tagging
75
- 3. Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
76
- 4. Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
77
- 5. Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2)
78
- 6. Fixed token streaming bug affecting multi-byte unicode characters
79
- 7. gpt-fast style `compile()` now supported in HF Transformers implementation
 
7
 
8
  [Website](https://moondream.ai/) / [Demo](https://moondream.ai/playground) / [GitHub](https://github.com/vikhyat/moondream)
9
 
10
+ This repository contains the 2025-04-14 **int4** release of Moondream. There's more information about this version of the model in our [release blog post](https://moondream.ai/blog/smaller-faster-moondream-with-qat). Other revisions, as well as release history, can be found [here](https://huggingface.co/vikhyatk/moondream2).
11
+
12
+ ### Usage
13
+
14
+ Make sure to install the requirements:
15
 
 
16
  ```
17
+ pip install pillow torchao
18
  ```
19
 
 
 
20
  ```python
21
  from transformers import AutoModelForCausalLM, AutoTokenizer
22
  from PIL import Image
 
24
  model = AutoModelForCausalLM.from_pretrained(
25
  "moondream/moondream-2b-2025-04-14-4bit",
26
  trust_remote_code=True,
 
27
  device_map={"": "cuda"}
28
  )
29
 
 
50
  print("\nPointing: 'person'")
51
  points = model.point(image, "person")["points"]
52
  print(f"Found {len(points)} person(s)")
53
+ ```