yaswanthgali commited on
Commit
932fcdf
·
verified ·
1 Parent(s): b1a6229

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -3
README.md CHANGED
@@ -1,3 +1,76 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ tags:
5
+ - vision
6
+ - image-segmentation
7
+ - pytorch
8
+ ---
9
+ # EoMT
10
+
11
+ [![PyTorch](https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white)](https://pytorch.org/)
12
+
13
+ **EoMT (Encoder-only Mask Transformer)** is a Vision Transformer (ViT) architecture designed for high-quality and efficient image segmentation. It was introduced in the CVPR 2025 highlight paper:
14
+ **[Your ViT is Secretly an Image Segmentation Model](https://www.tue-mps.org/eomt)**
15
+ by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus.
16
+
17
+ > **Key Insight**: Given sufficient scale and pretraining, a plain ViT along with additional few params can perform segmentation without the need for task-specific decoders or pixel fusion modules. The same model backbone supports semantic, instance, and panoptic segmentation with different post-processing 🤗
18
+
19
+ The original implementation can be found in this [repository](https://github.com/tue-mps/eomt)
20
+
21
+ ---
22
+
23
+
24
+ ### How to use
25
+
26
+ Here is how to use this model for Panotpic Segmentation:
27
+
28
+ ```python
29
+ import matplotlib.pyplot as plt
30
+ import requests
31
+ import torch
32
+ from PIL import Image
33
+
34
+ from transformers import EomtForUniversalSegmentation, AutoImageProcessor
35
+
36
+
37
+ model_id = "yaswanthgali/ade20k_panoptic_eomt_large_640"
38
+ processor = AutoImageProcessor.from_pretrained(model_id)
39
+ model = EomtForUniversalSegmentation.from_pretrained(model_id)
40
+
41
+ image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw)
42
+
43
+ inputs = processor(
44
+ images=image,
45
+ return_tensors="pt",
46
+ )
47
+
48
+ with torch.inference_mode():
49
+ outputs = model(**inputs)
50
+
51
+ # Prepare the original image size in the format (height, width)
52
+ original_image_sizes = [(image.height, image.width)]
53
+
54
+ # Post-process the model outputs to get final segmentation prediction
55
+ preds = processor.post_process_panoptic_segmentation(
56
+ outputs,
57
+ original_image_sizes=original_image_sizes,
58
+ )
59
+
60
+ # Visualize the panoptic segmentation mask
61
+ plt.imshow(preds[0]["segmentation"])
62
+ plt.axis("off")
63
+ plt.title("Panoptic Segmentation")
64
+ plt.show()
65
+ ```
66
+
67
+ ## Citation
68
+ If you find our work useful, please consider citing us as:
69
+ ```bibtex
70
+ @inproceedings{kerssies2025eomt,
71
+ author = {Kerssies, Tommie and Cavagnero, Niccolò and Hermans, Alexander and Norouzi, Narges and Averta, Giuseppe and Leibe, Bastian and Dubbelman, Gijs and de Geus, Daan},
72
+ title = {Your ViT is Secretly an Image Segmentation Model},
73
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
74
+ year = {2025},
75
+ }
76
+ ```