You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Access to the Rainier-VL model family is granted on request. These weights are released by Denali AI under the Apache 2 license for product and research use in industrial / product-line visual inspection. By requesting access you agree to use the model in compliance with the license and applicable law, and not for unlawful surveillance or harm. Requests are reviewed manually.

Log in or Sign Up to review the conditions and access this model content.

Defect detection vs peers

Footprint vs peers

Accuracy vs VRAM Pareto

Accuracy vs. memory: Rainier-VL-2B-Base is Pareto-dominant β€” highest VisA AUROC at the lowest VRAM in the field.

Domain defect detection β€” AUROC (higher=better)

Model VisA BSData BTAD
Rainier-VL-2B-Base 0.949 0.982 0.983
Qwen2.5-VL-3B 0.702 0.619 0.785
Qwen3-VL-4B 0.840 0.915 0.571
Qwen3.5-4B 0.683 0.794 0.766
Gemma-4-E2B 0.625 0.933 0.321
Holo-3.1-4B 0.728 0.855 0.637
NuExtract3 0.709 0.911 0.270
GLM-4.6V-Flash 0.852 0.943 0.898
GRM-2.5 0.683 0.794 0.766

Served on vLLM β€” accuracy, latency, throughput, footprint (same harness)

Model Params (B) VRAM (GB) VisA AUROC Latency (ms) Decode tok/s Throughput tok/s
Rainier-VL-2B-Base 2.10 4.19 0.949 27 276 2830
Qwen3-VL-4B 4.44 8.88 0.877 64 138 2094
Qwen3-VL-2B 2.00 4.00 0.839 67 288 2809
NuExtract3 4.54 9.32 0.734 69 138 1589
Qwen2.5-VL-3B 3.75 7.51 0.718 79 173 2211
Qwen3.5-4B 4.54 9.32 0.673 57 139 1679
GRM-2.5 4.54 9.32 0.673 66 139 1599
Holo-3.1-4B 4.54 10.35 0.660 70 139 1697
Gemma-4-E2B 5.10 10.25 0.642 31 163 3260

VRAM = bf16 weight residency (lowest serving footprint); Rainier's peak at batch 1 is 4.61 GB. Rainier carries roughly half the VRAM of the 3 B+ field.

Latency is per-query inference time; the one-time SigLIP image encode is a fixed per-image setup cost (run once, reused across every query on that image) and is not a per-query term.

Rainier-VL-2B-Base offers the best balance of space, accuracy, latency, throughput, speed, and deployability β€” and is outright best on accuracy (0.949, vs. next-best 0.877), footprint (2.10 B / 4.19 GB), and latency (27 ms, below the fastest peer's 31 ms), while remaining competitive on throughput (2830 tok/s) and decode (276 tok/s). It is also the only model in the set that emits defect boxes and pixel masks, and carries the lowest VRAM footprint in the comparison band β€” one that grows far more slowly with context length than the transformer peers, since its Mamba state-space layers hold a fixed-size state where a transformer's KV-cache grows with every token.

Capabilities vs. every evaluated model

βœ“ = supported in the public release; ~ = partial/limited (e.g. JSON via prompting, not schema-enforced); βœ— = not available. Grounding = documented bbox output; Seg. = pixel masks; Stream. O(1) = constant-time-per-frame streaming.

Model VQA Struct. JSON Grounding (bbox) Seg. (mask) Stream. O(1)
Rainier-VL-2B-Base βœ“ βœ“ βœ“ βœ“ βœ“
Qwen2.5-VL-3B βœ“ ~ βœ“ βœ— βœ—
Qwen3-VL-2B βœ“ ~ βœ“ βœ— βœ—
Qwen3-VL-4B βœ“ ~ βœ“ βœ— βœ—
Gemma-4-E2B βœ“ ~ βœ— βœ— βœ—
GLM-4.6V-Flash βœ“ ~ βœ“ βœ— βœ—
Holo-3.1-4B βœ“ ~ βœ“ βœ— βœ—
NuExtract3 βœ“ βœ“ βœ— βœ— βœ—
Qwen3.5-4B βœ“ ~ βœ— βœ— βœ—
GRM-2.5 βœ“ ~ βœ— βœ— βœ—

Rainier is the only model in the evaluated set that emits pixel masks and exposes a constant-time-per-frame streaming path.

VRAM vs. context length (3 B+ field, bf16)

Peak inference VRAM (GB) generating 128β†’32K tokens. Rainier holds the lowest footprint at every length β€” roughly half the field β€” because its hybrid backbone keeps a fixed-size Mamba state for most layers, so VRAM grows far slower than the transformer peers' KV-caches. Peers measured to 16K, Rainier to 32K; "β€”" = not measured.

Model 128 512 2048 8192 16384 32768
Rainier-VL-2B-Base 4.58 4.47 4.61 5.27 6.14 7.89
Qwen2.5-VL-3B 7.81 7.81 7.81 7.91 8.21 β€”
Qwen3-VL-4B 9.19 9.19 9.37 10.29 11.52 β€”
NuExtract3 9.34 9.34 9.34 9.46 9.75 β€”
Qwen3.5-4B 9.34 9.34 9.34 9.46 9.75 β€”
GRM-2.5 9.34 9.34 9.34 9.46 9.75 β€”
Holo-3.1-4B 9.35 9.35 9.35 9.48 9.76 β€”
Gemma-4-E2B 10.76 10.76 10.76 10.76 11.22 β€”

Measured eager bf16, single image; the Rainier curve is conservative (cached single-step decode kernel not engaged on the eager path). Decode is also O(1) per token β€” Rainier's cached state-update decode holds ~76 tok/s flat from 512β†’8192 tokens, where a cache-free re-scan degrades 28β†’20.

Defect detection, localization & segmentation

Capability Metric Value
Defect detection (image) AUROC β€” BTAD / BSData / VisA 0.983 / 0.982 / 0.949
Defect segmentation (OneFormer-FT) val mIoU 0.753
Defect segmentation pixel-AUROC 0.731
Defect segmentation (end-to-end) mIoU 0.274
Defect box (GIoU head) Acc@0.25 0.295
Defect box (GIoU head) mIoU 0.176

Qualitative defect inspection (held-out VisA)

Qualitative defect examples

Real held-out VisA images (PCB, candle, cashew, chewing-gum). Each cell: Rainier-VL-2B-Base predicted defect box (red) vs ground-truth box (green), the P(yes) defect verdict, and box IoU. The model separates defective from clean and lands boxes on the defect region β€” tight on clear surface defects, looser on small/diffuse ones.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Denali-AI/Rainier-VL-2B-Base

Quantizations
1 model

Collection including Denali-AI/Rainier-VL-2B-Base