yamatazen
/

ForgottenMaid-12B-bnb

4-bit precision

Model card Files Files and versions

yamatazen commited on May 18

Commit

d8aebac

·

verified ·

1 Parent(s): d3ea24a

Update README.md

Files changed (1) hide show

README.md +34 -1

README.md CHANGED Viewed

@@ -5,4 +5,37 @@ tags:
 - bitsandbytes
 - bnb
 - chatml
----

 - bitsandbytes
 - bnb
 - chatml
+---
+# Code for quantization (Generated by Grok with manual editing)
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import torch
+import sys
+# Define model ID
+model_id = sys.argv[1]
+# Configure quantization
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,  # Use 4-bit quantization (or load_in_8bit=True for 8-bit)
+    bnb_4bit_quant_type="nf4",  # Normal Float 4-bit (nf4) for better precision
+    bnb_4bit_compute_dtype=torch.float16,  # Compute in float16 for efficiency
+    bnb_4bit_use_double_quant=True  # Double quantization for further memory savings
+)
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+# Load quantized model
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    quantization_config=quantization_config,
+    device_map="auto",  # Automatically map layers to GPU/CPU
+    torch_dtype=torch.float16
+)
+# Save model and tokenizer
+save_path = sys.argv[2]
+model.save_pretrained(save_path)
+tokenizer.save_pretrained(save_path)
+```