uisikdag's picture
Update README.md
928ea15 verified
|
raw
history blame
607 Bytes
metadata
library_name: transformers
license: apache-2.0
language:
  - en
base_model:
  - HuggingFaceTB/SmolVLM-Instruct

4bit nf4 quantized version, you can find the quantized version generation code below.

The 8bit config seems to be more accurate, when compared to this one.

from transformers import BitsAndBytesConfig


nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

model_nf4 = AutoModelForVision2Seq.from_pretrained("HuggingFaceTB/SmolVLM-Instruct", quantization_config=nf4_config)