Instructions to use Denali-AI/Rainier-VL-2B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Denali-AI/Rainier-VL-2B-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Denali-AI/Rainier-VL-2B-Base", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Denali-AI/Rainier-VL-2B-Base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Denali-AI/Rainier-VL-2B-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Denali-AI/Rainier-VL-2B-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Denali-AI/Rainier-VL-2B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Denali-AI/Rainier-VL-2B-Base
- SGLang
How to use Denali-AI/Rainier-VL-2B-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Denali-AI/Rainier-VL-2B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Denali-AI/Rainier-VL-2B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Denali-AI/Rainier-VL-2B-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Denali-AI/Rainier-VL-2B-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Denali-AI/Rainier-VL-2B-Base with Docker Model Runner:
docker model run hf.co/Denali-AI/Rainier-VL-2B-Base
You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Access to the Rainier-VL model family is granted on request. These weights are released by Denali AI under the Apache 2 license for product and research use in industrial / product-line visual inspection. By requesting access you agree to use the model in compliance with the license and applicable law, and not for unlawful surveillance or harm. Requests are reviewed manually.
Log in or Sign Up to review the conditions and access this model content.
Accuracy vs. memory: Rainier-VL-2B-Base is Pareto-dominant β highest VisA AUROC at the lowest VRAM in the field.
Domain defect detection β AUROC (higher=better)
| Model | VisA | BSData | BTAD |
|---|---|---|---|
| Rainier-VL-2B-Base | 0.949 | 0.982 | 0.983 |
| Qwen2.5-VL-3B | 0.702 | 0.619 | 0.785 |
| Qwen3-VL-4B | 0.840 | 0.915 | 0.571 |
| Qwen3.5-4B | 0.683 | 0.794 | 0.766 |
| Gemma-4-E2B | 0.625 | 0.933 | 0.321 |
| Holo-3.1-4B | 0.728 | 0.855 | 0.637 |
| NuExtract3 | 0.709 | 0.911 | 0.270 |
| GLM-4.6V-Flash | 0.852 | 0.943 | 0.898 |
| GRM-2.5 | 0.683 | 0.794 | 0.766 |
Served on vLLM β accuracy, latency, throughput, footprint (same harness)
| Model | Params (B) | VRAM (GB) | VisA AUROC | Latency (ms) | Decode tok/s | Throughput tok/s |
|---|---|---|---|---|---|---|
| Rainier-VL-2B-Base | 2.10 | 4.19 | 0.949 | 27 | 276 | 2830 |
| Qwen3-VL-4B | 4.44 | 8.88 | 0.877 | 64 | 138 | 2094 |
| Qwen3-VL-2B | 2.00 | 4.00 | 0.839 | 67 | 288 | 2809 |
| NuExtract3 | 4.54 | 9.32 | 0.734 | 69 | 138 | 1589 |
| Qwen2.5-VL-3B | 3.75 | 7.51 | 0.718 | 79 | 173 | 2211 |
| Qwen3.5-4B | 4.54 | 9.32 | 0.673 | 57 | 139 | 1679 |
| GRM-2.5 | 4.54 | 9.32 | 0.673 | 66 | 139 | 1599 |
| Holo-3.1-4B | 4.54 | 10.35 | 0.660 | 70 | 139 | 1697 |
| Gemma-4-E2B | 5.10 | 10.25 | 0.642 | 31 | 163 | 3260 |
VRAM = bf16 weight residency (lowest serving footprint); Rainier's peak at batch 1 is 4.61 GB. Rainier carries roughly half the VRAM of the 3 B+ field.
Latency is per-query inference time; the one-time SigLIP image encode is a fixed per-image setup cost (run once, reused across every query on that image) and is not a per-query term.
Rainier-VL-2B-Base offers the best balance of space, accuracy, latency, throughput, speed, and deployability β and is outright best on accuracy (0.949, vs. next-best 0.877), footprint (2.10 B / 4.19 GB), and latency (27 ms, below the fastest peer's 31 ms), while remaining competitive on throughput (2830 tok/s) and decode (276 tok/s). It is also the only model in the set that emits defect boxes and pixel masks, and carries the lowest VRAM footprint in the comparison band β one that grows far more slowly with context length than the transformer peers, since its Mamba state-space layers hold a fixed-size state where a transformer's KV-cache grows with every token.
Capabilities vs. every evaluated model
β = supported in the public release; ~ = partial/limited (e.g. JSON via prompting, not schema-enforced); β = not available. Grounding = documented bbox output; Seg. = pixel masks; Stream. O(1) = constant-time-per-frame streaming.
| Model | VQA | Struct. JSON | Grounding (bbox) | Seg. (mask) | Stream. O(1) |
|---|---|---|---|---|---|
| Rainier-VL-2B-Base | β | β | β | β | β |
| Qwen2.5-VL-3B | β | ~ | β | β | β |
| Qwen3-VL-2B | β | ~ | β | β | β |
| Qwen3-VL-4B | β | ~ | β | β | β |
| Gemma-4-E2B | β | ~ | β | β | β |
| GLM-4.6V-Flash | β | ~ | β | β | β |
| Holo-3.1-4B | β | ~ | β | β | β |
| NuExtract3 | β | β | β | β | β |
| Qwen3.5-4B | β | ~ | β | β | β |
| GRM-2.5 | β | ~ | β | β | β |
Rainier is the only model in the evaluated set that emits pixel masks and exposes a constant-time-per-frame streaming path.
VRAM vs. context length (3 B+ field, bf16)
Peak inference VRAM (GB) generating 128β32K tokens. Rainier holds the lowest footprint at every length β roughly half the field β because its hybrid backbone keeps a fixed-size Mamba state for most layers, so VRAM grows far slower than the transformer peers' KV-caches. Peers measured to 16K, Rainier to 32K; "β" = not measured.
| Model | 128 | 512 | 2048 | 8192 | 16384 | 32768 |
|---|---|---|---|---|---|---|
| Rainier-VL-2B-Base | 4.58 | 4.47 | 4.61 | 5.27 | 6.14 | 7.89 |
| Qwen2.5-VL-3B | 7.81 | 7.81 | 7.81 | 7.91 | 8.21 | β |
| Qwen3-VL-4B | 9.19 | 9.19 | 9.37 | 10.29 | 11.52 | β |
| NuExtract3 | 9.34 | 9.34 | 9.34 | 9.46 | 9.75 | β |
| Qwen3.5-4B | 9.34 | 9.34 | 9.34 | 9.46 | 9.75 | β |
| GRM-2.5 | 9.34 | 9.34 | 9.34 | 9.46 | 9.75 | β |
| Holo-3.1-4B | 9.35 | 9.35 | 9.35 | 9.48 | 9.76 | β |
| Gemma-4-E2B | 10.76 | 10.76 | 10.76 | 10.76 | 11.22 | β |
Measured eager bf16, single image; the Rainier curve is conservative (cached single-step decode kernel not engaged on the eager path). Decode is also O(1) per token β Rainier's cached state-update decode holds ~76 tok/s flat from 512β8192 tokens, where a cache-free re-scan degrades 28β20.
Defect detection, localization & segmentation
| Capability | Metric | Value |
|---|---|---|
| Defect detection (image) | AUROC β BTAD / BSData / VisA | 0.983 / 0.982 / 0.949 |
| Defect segmentation (OneFormer-FT) | val mIoU | 0.753 |
| Defect segmentation | pixel-AUROC | 0.731 |
| Defect segmentation (end-to-end) | mIoU | 0.274 |
| Defect box (GIoU head) | Acc@0.25 | 0.295 |
| Defect box (GIoU head) | mIoU | 0.176 |
Qualitative defect inspection (held-out VisA)
Real held-out VisA images (PCB, candle, cashew, chewing-gum). Each cell: Rainier-VL-2B-Base predicted defect box (red) vs ground-truth box (green), the P(yes) defect verdict, and box IoU. The model separates defective from clean and lands boxes on the defect region β tight on clear surface defects, looser on small/diffuse ones.
- Downloads last month
- 23



