Overview
google/gemma-2-2b-jpn-itをBitsAndBytes(0.44.1)で4bit量子化
量子化の際のコードは以下の通りです。
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "google/gemma-2-2b-jpn-it"
repo_id = "indiebot-community/gemma-2-2b-jpn-it-bnb-4bit"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
tokenizer.push_to_hub(repo_id)
model.push_to_hub(repo_id)
- Downloads last month
- 63
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support