HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

[๐Ÿ“‚ GitHub] [๐Ÿ“œ Paper]

This is InternVL2_5-HiMTok-8B model fine-tuned on the refcoco series train dataset.

If you find this project useful in your research, please consider citing:

@article{wang2025himtok,
  title={HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model},
  author={Wang, Tao and Cheng, Changxu and Wang, Lingfeng and Chen, Senda and Zhao, Wuyue},
  journal={arXiv preprint arXiv:2503.13026},
  year={2025}
}
Downloads last month
16
Safetensors
Model size
8.73B params
Tensor type
BF16
ยท
BOOL
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yayafengzi/InternVL2_5-HiMTok-8B

Finetuned
(11)
this model

Collection including yayafengzi/InternVL2_5-HiMTok-8B