MLCD-VL
Collection
2 items
•
Updated
•
1
Dataset | Split | MLCD-seg-7B | EVF-SAM | GLaMM | VisionLLM v2 | LISA |
---|---|---|---|---|---|---|
RefCOCO | val | 83.6 | 82.4 | 79.5 | 79.2 | 74.9 |
RefCOCO | testA | 85.3 | 84.2 | 83.2 | 82.3 | 79.1 |
RefCOCO | testB | 81.5 | 80.2 | 76.9 | 77.0 | 72.3 |
RefCOCO+ | val | 79.4 | 76.5 | 72.6 | 68.9 | 65.1 |
RefCOCO+ | testA | 82.9 | 80.0 | 78.7 | 75.8 | 70.8 |
RefCOCO+ | testB | 75.6 | 71.9 | 64.6 | 61.8 | 58.1 |
RefCOCOg | val | 79.7 | 78.2 | 74.2 | 73.3 | 67.9 |
RefCOCOg | test | 80.5 | 78.3 | 74.9 | 74.8 | 70.6 |
If you just want to use this code, please refer to this sample below
from transformers import AutoModel, AutoTokenizer
from PIL import Image
model_path = "DeepGlint-AI/MLCD-Seg" # or use your local path
mlcd_seg = AutoModel.from_pretrained(
model_path,
torch_dtype=torch.float16,
trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
# Assuming you have an image named test.jpg
seg_img = Image.open("test.jpg").convert('RGB')
seg_prompt = "Could you provide a segmentation mask for the right giraffe in this image?"
pred_mask = model.seg(seg_img, seg_prompt, tokenizer, force_seg=False)
If you want to use this code measurement dataset (e.g. refcoco), then you need to use the following method
from transformers import AutoModel, AutoTokenizer
from PIL import Image
model_path = "DeepGlint-AI/MLCD-Seg" # or use your local path
mlcd_seg = AutoModel.from_pretrained(
model_path,
torch_dtype=torch.float16,
trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
# Assuming you have an image named test.jpg
seg_img = Image.open("test.jpg").convert('RGB')
seg_prompt = "Could you provide a segmentation mask for the right giraffe in this image?"
pred_mask = model.seg(seg_img, seg_prompt, tokenizer, force_seg=True)
@misc{mlcdseg_wukun,
author = {Wu, Kun and Xie, Yin and Zhou, Xinyu and An, Xiang, and Deng, Jiankang, and Jie, Yu},
title = {MLCD-Seg},
year = {2025},
url = {https://github.com/deepglint/unicom/tree/main/downstream},
}
Base model
Qwen/Qwen2.5-7B