Spaces:
Runtime error
Runtime error
import streamlit as st | |
from streamlit_extras.switch_page_button import switch_page | |
st.title("MobileSAM") | |
st.success("""[Original tweet](https://twitter.com/mervenoyann/status/1738959605542076863) (December 24, 2023)""", icon="βΉοΈ") | |
st.markdown(""" """) | |
st.markdown("""Read the MobileSAM paper this weekend π Sharing some insights! | |
The idea π‘: SAM model consist of three parts, a heavy image encoder, a prompt encoder (prompt can be text, bounding box, mask or point) and a mask decoder. | |
To make the SAM model smaller without compromising from the performance, the authors looked into three types of distillation. | |
First one is distilling the decoder outputs directly (a more naive approach) with a completely randomly initialized small ViT and randomly initialized mask decoder. | |
However, when the ViT and the decoder are both in a bad state, this doesn't work well. | |
""") | |
st.markdown(""" """) | |
st.image("pages/MobileSAM/image_1.jpeg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
The second type of distillation is called semi-coupled, where the authors only randomly initialized the ViT image encoder and kept the mask decoder. | |
This is called semi-coupled because the image encoder distillation still depends on the mask decoder (see below π) | |
""") | |
st.markdown(""" """) | |
st.image("pages/MobileSAM/image_2.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
The last type of distillation, [decoupled distillation](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhao_Decoupled_Knowledge_Distillation_CVPR_2022_paper.pdf), is the most intuitive IMO. | |
The authors have "decoupled" image encoder altogether and have frozen the mask decoder and didn't really distill based on generated masks. | |
This makes sense as the bottleneck here is the encoder itself and most of the time, distillation works well with encoding. | |
""") | |
st.markdown(""" """) | |
st.image("pages/MobileSAM/image_3.jpeg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
Finally, they found out that decoupled distillation performs better than coupled distillation by means of mean IoU and requires much less compute! β₯οΈ | |
""") | |
st.markdown(""" """) | |
st.image("pages/MobileSAM/image_4.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
Wanted to leave some links here if you'd like to try yourself π | |
- MobileSAM [demo](https://huggingface.co/spaces/dhkim2810/MobileSAMMobileSAM) | |
- Model [repository](https://huggingface.co/dhkim2810/MobileSAM) | |
If you'd like to experiment around TinyViT, [timm library](https://huggingface.co/docs/timm/index) ([Ross Wightman](https://x.com/wightmanr)) has a bunch of [checkpoints available](https://huggingface.co/models?sort=trending&search=timm%2Ftinyvit). | |
""") | |
st.markdown(""" """) | |
st.image("pages/MobileSAM/image_5.jpeg", use_column_width=True) | |
st.markdown(""" """) | |
st.info(""" | |
Ressources: | |
[Faster Segment Anything: Towards Lightweight SAM for Mobile Applications](https://arxiv.org/abs/2306.14289) | |
by Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong (2023) | |
[GitHub](https://github.com/ChaoningZhang/MobileSAM)""", icon="π") | |
st.markdown(""" """) | |
st.markdown(""" """) | |
st.markdown(""" """) | |
col1, col2, col3= st.columns(3) | |
with col1: | |
if st.button('Previous paper', use_container_width=True): | |
switch_page("Home") | |
with col2: | |
if st.button('Home', use_container_width=True): | |
switch_page("Home") | |
with col3: | |
if st.button('Next paper', use_container_width=True): | |
switch_page("OneFormer") |