Ithanil commited on
Commit
acbc8f5
·
verified ·
1 Parent(s): f4fb31e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: nvidia-open-model-license
4
+ license_link: >-
5
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
6
+ base_model:
7
+ - nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
8
+ ---
9
+
10
+ # Llama-3_3-Nemotron-Super-49B-v1_5-FP8-Dynamic
11
+
12
+ FP8 quantization of https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
13
+
14
+ ## Creation
15
+
16
+ Created with llmcompressor using the following code:
17
+
18
+ ```
19
+ import sys
20
+
21
+ from transformers import AutoTokenizer, AutoModelForCausalLM
22
+ from llmcompressor.transformers import oneshot
23
+ from llmcompressor.modifiers.quantization import QuantizationModifier
24
+
25
+ MODEL_ID = sys.argv[1]
26
+ SAVE_DIR = sys.argv[2]
27
+
28
+ # Load the model
29
+ model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", torch_dtype="auto", local_files_only=True, trust_remote_code=True)
30
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, local_files_only=True, trust_remote_code=True)
31
+
32
+ # Configure the simple PTQ quantization
33
+ recipe = QuantizationModifier(targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"])
34
+
35
+ # Apply the quantization algorithm.
36
+ oneshot(model=model, recipe=recipe, trust_remote_code_model=True)
37
+
38
+ # Save the model
39
+ model.save_pretrained(SAVE_DIR)
40
+ tokenizer.save_pretrained(SAVE_DIR)
41
+ ```