jsbaicenter
/

r1-1776-distill-llama-70b-FP8-Dynamic

Text Generation

text-generation-inference

compressed-tensors

Model card Files Files and versions

uyiosa commited on May 4

Commit

348dbf5

·

verified ·

1 Parent(s): 9520704

Update README.md

Files changed (1) hide show

README.md +41 -1

README.md CHANGED Viewed

@@ -5,4 +5,44 @@ base_model:
 library_name: transformers
 ---
-The [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b) model quantized to fp8.

 library_name: transformers
 ---
+The [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b) model quantized to fp8.
+# quantization using llm_compressor
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from llmcompressor.transformers import oneshot
+from llmcompressor.modifiers.quantization import QuantizationModifier
+# Define the model ID for the model you want to quantize
+MODEL_ID = "perplexity-ai/r1-1776-distill-llama-70b"
+# Load the model and tokenizer with appropriate parameters
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_ID,
+    device_map="auto",
+    torch_dtype="auto",
+    trust_remote_code=True,  # Add this to automatically trust remote code
+    low_cpu_mem_usage=True,  # Help with memory issues during loading
+    offload_folder="offload"  # Use disk offloading for large models
+)
+tokenizer = AutoTokenizer.from_pretrained(
+    MODEL_ID,
+    trust_remote_code=True  # Also need this for tokenizer
+)
+# Configure the quantization recipe
+recipe = QuantizationModifier(targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])
+# Apply the quantization algorithm
+oneshot(model=model, recipe=recipe)
+# Define the directory to save the quantized model
+SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
+# Save the quantized model and tokenizer
+model.save_pretrained(SAVE_DIR)
+tokenizer.save_pretrained(SAVE_DIR)
+print(f"Quantized model saved to {SAVE_DIR}")
+```