metadata
license: other
license_name: nvclv1
license_link: LICENSE
datasets:
- ILSVRC/imagenet-1k
pipeline_tag: image-classification
MambaVision: A Hybrid Mamba-Transformer Vision Backbone.
Model Overview
We introduce a novel mixer block by creating a symmetric path without SSM to enhance the modeling of global context. MambaVision has a hierarchical architecture that employs both self-attention and mixer blocks.
Model Performance
MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in terms of Top-1 accuracy and throughput.
Model Usage
You must first login into HuggingFace to pull the model:
huggingface-cli login
The model can be simply used according to:
access_token = "<YOUR ACCESS TOKEN"
model = AutoModel.from_pretrained("nvidia/MambaVision-T-2K", trust_remote_code=True)