kxqt commited on
Commit
0c9b5b3
·
1 Parent(s): 2c50deb

update readme

Browse files
Files changed (1) hide show
  1. README.md +13 -138
README.md CHANGED
@@ -1,138 +1,13 @@
1
- # Expediting SAM without Fine-tuning
2
-
3
- <!-- **[Meta AI Research, FAIR](https://ai.facebook.com/research/)**
4
-
5
- [Alexander Kirillov](https://alexander-kirillov.github.io/), [Eric Mintun](https://ericmintun.github.io/), [Nikhila Ravi](https://nikhilaravi.com/), [Hanzi Mao](https://hanzimao.me/), Chloe Rolland, Laura Gustafson, [Tete Xiao](https://tetexiao.com), [Spencer Whitehead](https://www.spencerwhitehead.com/), Alex Berg, Wan-Yen Lo, [Piotr Dollar](https://pdollar.github.io/), [Ross Girshick](https://www.rossgirshick.info/)
6
-
7
- [[`Paper`](https://ai.facebook.com/research/publications/segment-anything/)] [[`Project`](https://segment-anything.com/)] [[`Demo`](https://segment-anything.com/demo)] [[`Dataset`](https://segment-anything.com/dataset/index.html)] [[`Blog`](https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/)] [[`BibTeX`](#citing-segment-anything)]
8
-
9
- ![SAM design](assets/model_diagram.png?raw=true)
10
-
11
- The **Segment Anything Model (SAM)** produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a [dataset](https://segment-anything.com/dataset/index.html) of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks.
12
-
13
- <p float="left">
14
- <img src="assets/masks1.png?raw=true" width="37.25%" />
15
- <img src="assets/masks2.jpg?raw=true" width="61.5%" />
16
- </p> -->
17
-
18
- ## Introduction
19
-
20
- This is the official implementation of the paper "[Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning](https://arxiv.org/abs/2210.01035)" on [Segment Anything Model (SAM)](https://segment-anything.com/).
21
-
22
- ![framework](assets/Hourglass_transformer_framework.png)
23
- ![framework](assets/TokenClusterReconstruct_Details.png)
24
-
25
- Our method can speed up SAM without any training. The bottleneck of SAM is image encoder. We implement our method on image encoder to signifficantly speed up the generation process. We test our method on different SAM models using a single 16G Tesla-V100. We set `--points-per-side=12` and `--points-per-batch=144` so that the generation process executes only one time.
26
-
27
- | Model | clustering location | num of clusters | speed(image/s) |
28
- | ---------------- | ------------------- | --------------- | ------------------ |
29
- | SAM-ViT-H | - | - | 1.27 |
30
- | SAM-ViT-H + ours | 18 | 121 | 1.40(1.10x faster) |
31
- | SAM-ViT-H + ours | 14 | 100 | 1.52(1.19x faster) |
32
- | SAM-ViT-H + ours | 8 | 100 | 1.64(1.30x faster) |
33
- | SAM-ViT-H + ours | 8 | 81 | 1.82(1.44x faster) |
34
- | SAM-ViT-H + ours | 6 | 81 | 1.89(1.49x faster) |
35
-
36
- Here is the visualization of the setting above.
37
-
38
- ![result of sam-vit-h + ours](assets/result_vit_h.png)
39
-
40
- We also try to implement our method on smaller model. Here are some examples generate by SAM w/ ViT-L + ours, with the setting of `--points-per-side=16` and `--points-per-batch=256`.
41
-
42
- ![result of sam-vit-l + ours](assets/result_vit_l.png)
43
-
44
- ## Installation
45
-
46
- The code requires `python>=3.8`, as well as `pytorch>=1.7` and `torchvision>=0.8`. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
47
-
48
- <!-- Install Segment Anything:
49
-
50
- ```
51
- pip install git+https://github.com/facebookresearch/segment-anything.git
52
- ```
53
-
54
- or clone the repository locally and install with -->
55
-
56
- To use Segment Anything with our method, please clone this repository locally and install with
57
-
58
- ```
59
- pip install -e .
60
- ```
61
-
62
- The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. `jupyter` is also required to run the example notebooks.
63
- ```
64
- pip install opencv-python pycocotools matplotlib onnxruntime onnx
65
- ```
66
-
67
-
68
- ## <a name="GettingStarted"></a>Getting Started
69
-
70
- You can run the code like using original Segment Anything Model. The only difference is that you need to add `use_hourglass=True` as parameter while calling `build_sam` function. Here is an example.
71
-
72
- First download a [model checkpoint](#model-checkpoints). Then the model can be used in just a few lines to get masks from a given prompt:
73
-
74
- ```
75
- from segment_anything import build_sam, SamPredictor
76
- predictor = SamPredictor(build_sam(checkpoint="</path/to/model.pth>", use_hourglass=True))
77
- predictor.set_image(<your_image>)
78
- masks, _, _ = predictor.predict(<input_prompts>)
79
- ```
80
-
81
- or generate masks for an entire image:
82
-
83
- ```
84
- from segment_anything import build_sam, SamAutomaticMaskGenerator
85
- mask_generator = SamAutomaticMaskGenerator(build_sam(checkpoint="</path/to/model.pth>", use_hourglass=True))
86
- masks = mask_generator.generate(<your_image>)
87
- ```
88
-
89
- Additionally, masks can be generated for images from the command line:
90
-
91
- ```
92
- python scripts/amg.py --checkpoint <path/to/sam/checkpoint> --input <image_or_folder> --output <output_directory> --use_hourglass
93
- ```
94
-
95
- You need to add `--use_hourglass` if you want to use our method to accelerate the process.
96
-
97
-
98
- ## <a name="Models"></a>Model Checkpoints
99
-
100
- <!-- Three model versions of the model are available with different backbone sizes. These models can be instantiated by running
101
- ```
102
- from segment_anything import sam_model_registry
103
- sam = sam_model_registry["<name>"](checkpoint="<path/to/checkpoint>")
104
- ```
105
- Click the links below to download the checkpoint for the corresponding model name. The default model in bold can also be instantiated with `build_sam`, as in the examples in [Getting Started](#getting-started). -->
106
-
107
- Here are the official weight of SAM model.
108
-
109
- * **`default` or `vit_h`: [ViT-H SAM model.](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)**
110
- * `vit_l`: [ViT-L SAM model.](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth)
111
- * `vit_b`: [ViT-B SAM model.](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth)
112
-
113
- ## License
114
- The model is licensed under the [Apache 2.0 license](LICENSE).
115
-
116
- ## Citation
117
-
118
- If you find this repo useful in your research, please consider citing:
119
-
120
- ```latex
121
- @article{liang2022expediting,
122
- author = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
123
- title = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
124
- journal = {arXiv preprint arXiv:2210.01035},
125
- year = {2022},
126
- }
127
- ```
128
-
129
- If you use SAM or SA-1B in your research, please use the following BibTeX entry.
130
-
131
- ```
132
- @article{kirillov2023segany,
133
- title={Segment Anything},
134
- author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
135
- journal={arXiv:2304.02643},
136
- year={2023}
137
- }
138
- ```
 
1
+ ---
2
+ title: Expedit SAM
3
+ emoji: 👁
4
+ colorFrom: red
5
+ colorTo: yellow
6
+ sdk: gradio
7
+ sdk_version: 3.24.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference