metadata
library_name: transformers
license: apache-2.0
language:
- en
base_model:
- HuggingFaceTB/SmolVLM-Instruct
4bit nf4 quantized version, you can find the quantized version generation code below.
The 8bit config seems to be more accurate, when compared to this one.
from transformers import BitsAndBytesConfig
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = AutoModelForVision2Seq.from_pretrained("HuggingFaceTB/SmolVLM-Instruct", quantization_config=nf4_config)