dragonSwing
/

nanosam

Mask Generation

Model card Files Files and versions Community

nanosam / README.md

dragonSwing's picture

Update README.md

37fcd54 7 months ago

|

history blame contribute delete

3.38 kB

	---
	license: apache-2.0
	pipeline_tag: mask-generation
	---

	# NanoSAM: Accelerated Segment Anything Model for Edge deployment

	- [GitHub](https://github.com/binh234/nanosam)
	- [Demo](https://huggingface.co/spaces/dragonSwing/nanosam)

	## Pretrained Models

	NanoSAM performance on edge devices. Latency/throughput is measured on NVIDIA Jetson Xavier NX, and NVIDIA T4 GPU with TensorRT, fp16. Data transfer time is included.

	\| Image Encoder \| CPU \| Jetson Xavier NX \| T4 \| Model size \| Download \|
	\| --------------- \| :---: \| :--------------: \| :---: \| :--------: \| :------------------------------------------------------------------------------------------------------: \|
	\| PPHGV2-B1 \| 110ms \| 9.6ms \| 2.4ms \| 12.7MB \| [Link](https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx) \|
	\| PPHGV2-B2 \| 200ms \| 12.4ms \| 3.2ms \| 29.5MB \| [Link](https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx) \|
	\| PPHGV2-B4 \| 300ms \| 17.3ms \| 4.1ms \| 61.4MB \| [Link](https://huggingface.co/dragonSwing/nanosam/resolve/main/sam_hgv2_b1_ln_nonorm_image_encoder.onnx) \|
	\| ResNet18 \| 500ms \| 22.4ms \| 5.8ms \| 63.2MB \| [Link](https://drive.google.com/file/d/14-SsvoaTl-esC3JOzomHDnI9OGgdO2OR/view?usp=drive_link) \|
	\| EfficientViT-L0 \| 1s \| 31.6ms \| 6ms \| 117.5MB \| - \|

	Zero-Shot Instance Segmentation on COCO2017 validation dataset

	\| Image Encoder \| mAP<sup>mask<br>50-95 \| mIoU (all) \| mIoU (large) \| mIoU (medium) \| mIoU (small) \|
	\| --------------- \| :-------------------: \| :--------: \| :----------: \| :-----------: \| :----------: \|
	\| ResNet18 \| - \| 70.6 \| 79.6 \| 73.8 \| 62.4 \|
	\| MobileSAM \| - \| 72.8 \| 80.4 \| 75.9 \| 65.8 \|
	\| PPHGV2-B1 \| 41.2 \| 75.6 \| 81.2 \| 77.4 \| 70.8 \|
	\| PPHGV2-B2 \| 42.6 \| 76.5 \| 82.2 \| 78.5 \| 71.5 \|
	\| PPHGV2-B4 \| 44.0 \| 77.3 \| 83.0 \| 79.7 \| 72.1 \|
	\| EfficientViT-L0 \| 45.6 \| 78.6 \| 83.7 \| 81.0 \| 73.3 \|

	## Usage

	```python3
	from nanosam.utils.predictor import Predictor

	image_encoder_cfg = {
	"path": "data/sam_hgv2_b4_ln_nonorm_image_encoder.onnx",
	"name": "OnnxModel",
	"provider": "cpu",
	"normalize_input": False,
	}
	mask_decoder_cfg = {
	"path": "data/efficientvit_l0_mask_decoder.onnx",
	"name": "OnnxModel",
	"provider": "cpu",
	}
	predictor = Predictor(encoder_cfg, decoder_cfg)

	image = PIL.Image.open("assets/dogs.jpg")

	predictor.set_image(image)

	mask, _, _ = predictor.predict(np.array([[x, y]]), np.array([1]))
	```

	The point labels may be

	\| Point Label \| Description \|
	\| :---------: \| ------------------------- \|
	\| 0 \| Background point \|
	\| 1 \| Foreground point \|
	\| 2 \| Bounding box top-left \|
	\| 3 \| Bounding box bottom-right \|