README.md · nvidia/MambaVision-T2-1K at e3785c4b77c31df9764faca44b902086767b38fd

metadata

license: other
license_name: nvclv1
license_link: LICENSE
datasets:
  - ILSVRC/imagenet-1k
pipeline_tag: image-classification

MambaVision: A Hybrid Mamba-Transformer Vision Backbone.

Model Overview

We introduce a novel mixer block by creating a symmetric path without SSM to enhance the modeling of global context. MambaVision has a hierarchical architecture that employs both self-attention and mixer blocks.

Model Performance

MambaVision demonstrates a strong performance by achieving a new SOTA Pareto-front in terms of Top-1 accuracy and throughput.

Model Usage

You must first login into HuggingFace to pull the model:

huggingface-cli login

The model can be simply used according to:

access_token = "<YOUR ACCESS TOKEN"
model = AutoModel.from_pretrained("nvidia/MambaVision-T-2K", trust_remote_code=True)

License:

NVIDIA Source Code License-NC