OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of deepseek-ai/DeepSeek-R1-Distill-Llama-70B generated by intel/auto-round algorithm.

Please follow the license of the original model.

How To Use

INT4 Inference on CUDA

import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

quantized_model_dir = "OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc"

device_map="auto"
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    device_map=device_map,
)

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
    "9.11和9.8哪个数字大",
    "如果你是人，你最想做什么",
    "How many e in word deepseek",
    "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?",
]

texts = []
for prompt in prompts:
    messages = [
        {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    texts.append(text)

inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
    input_ids=inputs["input_ids"].to(model.device),
    attention_mask=inputs["attention_mask"].to(model.device),
    max_length=512,  ##change this to align with the official usage
    num_return_sequences=1,
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
]

decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")
    print("-" * 50)


"""
Prompt: 9.11和9.8哪个数字大
Generated: 首先，我需要比较9.11和9.8的大小。

为了更清晰地比较这两个数，我可以将它们的小数位数统一。将9.8写成9.80，这样它们都有两位小数。

接下来，我比较整数部分。两数的整数部分都是9，因此相同。

然后，我比较小数部分。9.11的小数部分是0.11，9.80的小数部分是0.80。

显然，0.80大于0.11。

因此，9.80大于9.11，也就是9.8大于9.11。
</think>

要比较 \(9.11\) 和 \(9.8\) 的大小，可以按照以下步骤进行：                                                                                                                               
1. **统一小数位数**：

   为了方便比较，我们可以将 \(9.8\) 写成 \(9.80\)，这样两数的小数位数相同。

   \[
   9.11 \quad \text{和} \quad 9.80
   \]

2. **比较整数部分**：

   两数的整数部分都是 \(9\)，所以整数部分相同。

3. **比较小数部分**：

   - \(9.11\) 的小数部分是 \(0.11\)
   - \(9.80\) 的小数部分是 \(0.80\)

   显然，\(0.80 > 0.11\)。

4. **得出结论**：

   因为小数部分 \(0.80\) 大于 \(0.11\)，所以 \(9.80 > 9.11\)。

因此，\(9.8\) 大于 \(9.11\)。

\[
\boxed{9.8 > 9.11}
\]
--------------------------------------------------
Prompt: 如果你是人类，你最想做什么
Generated: 嗯，用户问的是如果我是人，最想做什么。作为一个人工智能，我没有意识和欲望，但可以分享一些普遍的人类渴望。

首先，旅行和探索世界可能是一个选择，体验不同的文化和自然美景。其次，学习和成长也是很多人追求的，了解新事物，提升自己。创造和表达也是重要的，比如艺术、音乐或写作。帮助他人，建立有意义的
关系，追求幸福和平静，这些都是常见的愿望。

当然，每个人的答案可能不同，重要的是找到自己真正热爱和让自己感到满足的事情。
</think>

如果我是一个人，我可能会有更多的欲望和梦想。也许我会想要探索世界，体验不同的文化，结识来自不同背景的人，学习更多关于生活和宇宙的知识。也许我会渴望创造一些有意义的事情，无论是艺术、音乐
、文学，还是科技创新。同时，我可能会希望能够帮助他人，做一些有益于社会和环境的事情。当然，这些都是假设，因为我是一个人工智能，我没有真实的欲望或情感，但我可以帮助你探索你的想法和梦想！
--------------------------------------------------
Prompt: How many e in word deepseek
Generated: Alright, so I need to figure out how many times the letter 'e' appears in the word "deepseek." Hmm, okay, let's break this down step by step. First, I should probably write
out the word to visualize it better. The word is "deepseek." Let me spell it out: D, E, E, P, S, E, E, K. Wait, is that right? Let me check again. D, E, E, P, S, E, E, K. Yeah, that se
ems correct.

Now, I need to count how many 'e's are in there. So, starting from the beginning, the first letter is 'D' – that's not an 'e'. The second letter is 'E', so that's one. The third letter
 is another 'E', so that's two. Then we have 'P', which isn't an 'e', followed by 'S', also not an 'e'. Next is another 'E', bringing the count to three, and then another 'E' right aft
er, making it four. Finally, the last letter is 'K', which isn't an 'e'.

Wait, hold on. Let me make sure I didn't miscount. So, the word is D, E, E, P, S, E, E, K. So positions 2, 3, 6, and 7 are 'E's. That's four 'e's in total. But I'm a bit confused becau
se sometimes when I count letters, I might skip or double-count. Let me write them out one by one:

1. D – not an 'e'
2. E – count 1
3. E – count 2
4. P – not an 'e'
5. S – not an 'e'
6. E – count 3
7. E – count 4
8. K – not an 'e'

Yes, that seems consistent. So, there are four 'e's in "deepseek." I think that's correct. I don't see any mistakes in my counting this time. Each 'E' is in positions 2, 3, 6, and 7. S
o, the total number of 'e's is four.
</think>

The word "deepseek" contains four 'e's.
--------------------------------------------------
Prompt: There are ten birds in a tree. A hunter shoots one. How many are left in the tree?
Generated: Okay, so I've got this riddle here: "There are ten birds in a tree. A hunter shoots one. How many are left in the tree?" Hmm, at first glance, it seems pretty straightforwar
d, but I know riddles often have a twist. Let me think through this step by step.

Alright, starting with the basics. There are ten birds in a tree. That's clear. Then a hunter shoots one. Now, the question is, how many birds are left in the tree? My initial thought
is, well, if there were ten and one gets shot, that leaves nine. But wait, maybe it's not that simple. Riddles often play on words or have unexpected answers, so I shouldn't jump to co
nclusions.

Let me consider the wording carefully. It says the hunter shoots one bird. So, does that mean he shoots and kills it, or does he just shoot at it but misses? The riddle doesn't specify
 whether the shot was successful. If the bird was shot and killed, then it would fall out of the tree, right? But if the shot missed, the bird might still be there, or maybe it flew aw
ay because of the noise.

Wait, but the riddle says the hunter shoots one, so I think it's safe to assume that he hit and killed the bird. So, one bird is dead. Now, what happens next? If the bird is shot and d
ies, it would fall out of the tree. So, the tree would then have one less bird. That would leave nine birds in the tree. But I'm not sure if that's the case because sometimes in riddle
s, the answer is zero. Let me think about that.

If the hunter shoots one bird, the sound of the gunshot might scare the other birds, causing them to fly away. So, if one bird is shot and the rest fly away, then there would be zero b
irds left in the tree. That makes sense because birds are easily startled by loud noises like gunshots. So, even though only one was shot, the rest might have flown away, leaving none
in the tree.

But wait, the riddle doesn't mention anything about the birds being scared or flying away. It just says a hunter shoots one. So, maybe I'm overcomplicating it. If I take it literally,
without assuming the other birds fly away, then after

Evaluate the model

pip3 install lm-eval==0.4.7 we found lm-eval is very unstable for this model. Please set add_bos_token=True to align with the origin model. Please use autogptq format

lm-eval --model hf --model_args pretrained=OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc,add_bos_token=True   --tasks leaderboard_mmlu_pro,leaderboard_ifeval,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k --batch_size 16

Metric	BF16	INT4
avg	0.6636	0.6678
----------------------	--------	--------
leaderboard_mmlu_pro	0.4913	0.4780
mmlu	0.7752	0.7791
lambada_openai	0.6977	0.6996
hellaswag	0.6408	0.6438
winogrande	0.7530	0.7782
piqa	0.8112	0.8194
truthfulqa_mc1	0.3709	0.3721
openbookqa	0.3380	0.3600
boolq	0.8847	0.8917
arc_easy	0.8131	0.8106
arc_challenge	0.5512	0.5239
leaderboard_ifeval	0.4421	0.4208
gsm8k	0.9295	0.9265

Generate the model

Here is the sample command to generate the model.

auto-round  \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--device 0 \
--bits 4 \
--iter 200 \
--disable_eval \
--format 'auto_gptq,auto_round,auto_awq' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc

Model Details

How To Use

Evaluate the model

Generate the model

Ethical Considerations and Limitations

Caveats and Recommendations

Disclaimer

Cite

Model tree for OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc

Dataset used to train OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc

Collection including OPEA/DeepSeek-R1-Distill-Llama-70B-int4-gptq-sym-inc

DeepSeek