Intel/DeepSeek-V3.1-int4-mixed-AutoRound

Model Details

This model is a mixed int4 model with group_size 128 and symmetric quantization of deepseek-ai/DeepSeek-V3.1 generated by intel/auto-round via RTN(no algorithm tuning). Non expert layers are fallback to 8 bits. Please refer to Section Generate the model for more details. Please follow the license of the original model.

How To Use

INT4 Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
import torch
quantized_model_dir = "Intel/DeepSeek-V3.1-int4-mixed-AutoRound"

model = AutoModelForCausalLM.from_pretrained(
        quantized_model_dir,
        torch_dtype=torch.bfloat16,
        device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, trust_remote_code=True)
prompts = [
        "9.11和9.8哪个数字大",
        "strawberry中有几个r?",
        "There is a girl who likes adventure,",
        "Please give a brief introduction of DeepSeek company.",
        ]

texts=[]
for prompt in prompts:
    messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
    ]
    text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
            )
    texts.append(text)
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True)

outputs = model.generate(
        input_ids=inputs["input_ids"].to(model.device),
        attention_mask=inputs["attention_mask"].to(model.device),
        max_length=200, ##change this to align with the official usage
        num_return_sequences=1,
        do_sample=False  ##change this to align with the official usage
        )
generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs["input_ids"], outputs)
        ]
decoded_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

for i, prompt in enumerate(prompts):
    input_id = inputs
    print(f"Prompt: {prompt}")
    print(f"Generated: {decoded_outputs[i]}")

"""
Prompt: 9.11和9.8哪个数字大
Generated: 9.11 和 9.8 比较时，9.11 更大。
- 因为 9.11 相当于 9 + 0.11，而 9.8 相当于 9 + 0.8，但注意这里 0.11 实际上小于 0.8（0.11 < 0.8），所以 9.8 更大。
- 重新确认：9.11 是 9.11，9.8 是 9.80，因此 9.80 > 9.11。

**答案：9.8 更大。**
--------------------------------------------------
Prompt: strawberry中有几个r?
Generated: 在英文单词 "strawberry" 中，字母 "r" 出现了 **3 次**。
- 位置：第 3 个字母（s**t**r**a**w**b**e**r**r**y，注意：第 1 个 "r" 是第 3 字符，第 2 个 "r" 是第 6 字符，第 3 个 "r" 是第 7 字符）。

如果需要进一步解释或其他问题，请随时告知！ 😊
--------------------------------------------------
Prompt: There is a girl who likes adventure,
Generated: Of course! A girl who likes adventure is a fantastic starting point for a story, a character, or a real-life inspiration. Here are a few ways to explore that idea:

### As a Character Profile:

**Name:** Let's call her **Elara**.

**Traits:**
*   **Curious:** She asks "why" and "what if" more than anyone else. She sees a hidden path in the woods and has to know where it leads.
*   **Resourceful:** She's the one with a multi-tool in her pocket, who knows how to read a map (and the stars), and can build a fire.
*   **Brave, not fearless:** She feels the fear of climbing the tall cliff or exploring the dark cave, but her curiosity and determination are stronger.
*   **Resilient:** She doesn't see a wrong turn
--------------------------------------------------
Prompt: Please give a brief introduction of DeepSeek company.
Generated: Of course. Here is a brief introduction to DeepSeek:

**DeepSeek** is a leading Chinese AI research company focused on developing powerful artificial general intelligence (AGI). The company is best known for creating state-of-the-art large language models (LLMs).

**Key Highlights:**

*   **Core Product:** Their flagship product is the **DeepSeek-V2** language model, a powerful and efficient AI known for its strong performance in coding, mathematics, and general reasoning.
*   **Open-Source Commitment:** DeepSeek has gained significant recognition for open-sourcing its earlier models (like DeepSeek-Coder and DeepSeek-LLM 67B), making them freely available for research and commercial use. This has helped foster innovation and build a strong developer community.
*   **Specialization in Coding:** They are particularly renowned for their models' exceptional capabilities
--------------------------------------------------

"""

Generate the model

Mian branch is required if the model is fp8 and the device supports fp8 https://github.com/intel/auto-round

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
from auto_round import AutoRound

model_name = "deepseek-ai/DeepSeek-V3.1"

layer_config = {}
for n, m in model.named_modules():
    if isinstance(m, torch.nn.Linear):
        if "expert" in n and "shared_experts" not in n:
            layer_config[n] = {"bits": 4}
            print(n, 4)
        elif n != "lm_head":
            layer_config[n] = {"bits": 8}
            print(n, 8)

autoround = AutoRound(model_name, iters=0, layer_config=layer_config)
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Intel
/

DeepSeek-V3.1-int4-mixed-AutoRound