Abstract
IAUNet, a query-based U-Net architecture with a lightweight convolutional Pixel decoder and Transformer decoder, outperforms state-of-the-art models in biomedical instance segmentation.
Instance segmentation is critical in biomedical imaging to accurately distinguish individual objects like cells, which often overlap and vary in size. Recent query-based methods, where object queries guide segmentation, have shown strong performance. While U-Net has been a go-to architecture in medical image segmentation, its potential in query-based approaches remains largely unexplored. In this work, we present IAUNet, a novel query-based U-Net architecture. The core design features a full U-Net architecture, enhanced by a novel lightweight convolutional Pixel decoder, making the model more efficient and reducing the number of parameters. Additionally, we propose a Transformer decoder that refines object-specific features across multiple scales. Finally, we introduce the 2025 Revvity Full Cell Segmentation Dataset, a unique resource with detailed annotations of overlapping cell cytoplasm in brightfield images, setting a new benchmark for biomedical instance segmentation. Experiments on multiple public datasets and our own show that IAUNet outperforms most state-of-the-art fully convolutional, transformer-based, and query-based models and cell segmentation-specific models, setting a strong baseline for cell instance segmentation tasks. Code is available at https://github.com/SlavkoPrytula/IAUNet
Community
IAUNet: Instance-Aware U-Net (CVPRW 2025)
In this work, we present:
- A novel query-based U-Net model: IAUNet: Instance-Aware U-Net โญ๏ธ
- A new cell instance segmentation dataset: Revvity-25 ๐ฅ
๐ GitHub: https://github.com/SlavkoPrytula/IAUNet
๐ Project page: https://slavkoprytula.github.io/IAUNet/
If you find this work useful, consider giving it a โญ๏ธ on GitHub to support further open-source research!

Model overview. Overview of the IAUNet architecture, highlighting the Pixel and Transformer Decoder stages. Given an input image I
, the encoder extracts multi-scale features as skip connections for the Pixel decoder. At each decoder block, we add skip connections X_s
to the main features X
and inject normalized coordinate features for CoordConv. Stacked depth-wise convolutions with an SE block refine spatial information, generating mask features X_m
. The Transformer decoder then processes learnable queries q
through three Transformer blocks per layer, iteratively refining them with X_m
. Deep supervision loss is applied after each Transformer block using updated queries q_hat
and high-resolution mask features.
Revvity-25. One of our key contributions in this paper is a novel cell instance segmentation dataset named Revvity-25
. It includes 110
high-resolution 1080 x 1080
brightfield images, each containing, on average, 27
manually labeled and expert-validated cancer cells, totaling 2937
annotated cells. To our knowledge, this is the first dataset with accurate and detailed annotations for cell borders and overlaps, with each cell annotated using an average of 60
polygon points, reaching up to 400
points for more complex structures. Revvity-25
dataset provides a unique resource that opens new possibilities for testing and benchmarking models for modal and amodal semantic and instance segmentation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention (2025)
- MLRU++: Multiscale Lightweight Residual UNETR++ with Attention for Efficient 3D Medical Image Segmentation (2025)
- trAIce3D: A Prompt-Driven Transformer Based U-Net for Semantic Segmentation of Microglial Cells from Large-Scale 3D Microscopy Images (2025)
- InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation (2025)
- Hybrid Attention Network for Accurate Breast Tumor Segmentation in Ultrasound Images (2025)
- MambaVesselNet++: A Hybrid CNN-Mamba Architecture for Medical Image Segmentation (2025)
- MedSAM-CA: A CNN-Augmented ViT with Attention-Enhanced Multi-Scale Fusion for Medical Image Segmentation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper