Model Card for CoMP-MM-1B
This is an VFM that supports native image resolution inputs, continually pre-trained from SigLIP.
Model Sources
- Repository: https://github.com/SliMM-X/CoMP-MM
- Paper: https://arxiv.org/abs/2503.18931
- Project Page: https://slimm-x.github.io/comp
How to Get Started with the Model
Install the github repo, and use the code below to get started with the model.
import torch
from slimm.model.processor import SliMMQwen2VLProcessor
from slimm.model.utils_vl import process_vision_info
from slimm.model.vision_encoder import CoMPSiglipVisionModel
from PIL import Image
model_path = "SliMM-X/CoMP-SigLIP-So400M"
model = CoMPSiglipVisionModel.from_pretrained(
model_path, torch_dtype="auto", device_map="cuda", w_merger=False
).to(torch.bfloat16)
processor = SliMMQwen2VLProcessor.from_pretrained(model_path)
image_input = Image.open("https://slimm-x.github.io/comp/figs/teaser.png")
inputs = processor(
images=image_input,
return_tensors="pt",
)
inputs = inputs.to("cuda")
output_feat = model(inputs.pixel_values.to(torch.bfloat16), inputs.image_grid_thw)
print(output_feat)
Citation
BibTeX:
@article{comp2025,
title={CoMP: Continual Multimodal Pre-training for Vision Foundation Models},
author={Chen, Yitong and Meng, Lingchen and Peng, Wujian and Wu, Zuxuan and Jiang, Yu-Gang},
year={2025},
journal={arXiv preprint arXiv:2503.18931},
}
- Downloads last month
- 73
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The HF Inference API does not support image-feature-extraction models for slimm
library.
Model tree for SliMM-X/CoMP-SigLIP-So400M
Base model
google/siglip-so400m-patch14-384