OPEA
/

DeepSeek-V3-int4-sym-gguf-q4-0-inc

Inference Endpoints

Model card Files Files and versions Community

cicdatopea commited on Jan 26

Commit

454e3b7

·

verified ·

1 Parent(s): 7382bd7

Update README.md

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -85,8 +85,30 @@ Please follow the [Build llama.cpp locally](https://github.com/ggerganov/llama.c
 **5*80G gpu is needed(could optimize), 1.4T cpu memory is needed**
 pip3 install git+https://github.com/intel/auto-round.git
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer

 **5*80G gpu is needed(could optimize), 1.4T cpu memory is needed**
+**1 add meta data to bf16 model** https://huggingface.co/opensourcerelease/DeepSeek-V3-bf16
+```python
+import safetensors
+from safetensors.torch import save_file
+for i in range(1, 164):
+    idx_str = "0" * (5-len(str(i))) + str(i)
+    safetensors_path = f"model-{idx_str}-of-000163.safetensors"
+    print(safetensors_path)
+    tensors = dict()
+    with safetensors.safe_open(safetensors_path, framework="pt") as f:
+        for key in f.keys():
+            tensors[key] = f.get_tensor(key)
+    save_file(tensors, safetensors_path, metadata={'format': 'pt'})
+```
+**2 replace the  modeling_deepseek.py with the following file**, basically align device and remove torch.no_grad as we need some tuning in AutoRound.
+https://github.com/intel/auto-round/blob/deepseekv3/modeling_deepseek.py
 pip3 install git+https://github.com/intel/auto-round.git
+**3   tuning**
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer