--- base_model: - yamatazen/ForgottenMaid-12B tags: - bitsandbytes - bnb - chatml --- # Code for quantization (Generated by Grok with manual editing) ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch import sys # Define model ID model_id = sys.argv[1] # Configure quantization quantization_config = BitsAndBytesConfig( load_in_4bit=True, # Use 4-bit quantization (or load_in_8bit=True for 8-bit) bnb_4bit_quant_type="nf4", # Normal Float 4-bit (nf4) for better precision bnb_4bit_compute_dtype=torch.float16, # Compute in float16 for efficiency bnb_4bit_use_double_quant=True # Double quantization for further memory savings ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) # Load quantized model model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=quantization_config, device_map="auto", # Automatically map layers to GPU/CPU torch_dtype=torch.float16 ) # Save model and tokenizer save_path = sys.argv[2] model.save_pretrained(save_path) tokenizer.save_pretrained(save_path) ```