metadata

license: apache-2.0
base_model:
  - OpenGVLab/InternVL2_5-8B
pipeline_tag: mask-generation

HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model

[📂 GitHub] [📜 Paper]

This is InternVL2_5-HiMTok-8B model fine-tuned on the refcoco series train dataset.

If you find this project useful in your research, please consider citing:

@article{wang2025himtok,
  title={HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model},
  author={Wang, Tao and Cheng, Changxu and Wang, Lingfeng and Chen, Senda and Zhao, Wuyue},
  journal={arXiv preprint arXiv:2503.13026},
  year={2025}
}