Model Overview

Model_Architecture: DeepSeek V3
- Input: Text
- Output: Text
Supported_Hardware_Microarchitecture: AMD MI350/MI355
ROCm: "7.0"
Operating Systems: Linux
Inference Engine: vLLM
Model Optimizer: AMD-Quark
Quantization:
- Weight:
  - Type: OCP MXFP4
  - Mode: Static
- Activation:
  - Type: OCP MXFP4
  - Mode: Dynamic
- KV_Cache:
  - Type: OCP FP8
  - Mode: Static
Calibration_Dataset: Pile

This model was built with DeepSeek by applying AMD-Quark for MXFP4 quantization.

Model Quantization

The model was quantized from unsloth/DeepSeek-V3-0324-BF16 using AMD-Quark.
Weights and activations were quantized to MXFP4, and KV caches were quantized to FP8.
The AutoSmoothQuant algorithm was applied to enhance accuracy during quantization.

Quantization Scripts

cd Quark/examples/torch/language_modeling/llm_ptq/
python3 quantize_quark.py \
    --model_dir "/deepseek-ai/DeepSeek-V3-0324-BF16/" \
    --quant_scheme "w_mxfp4_a_mxfp4" \
    --quant_algo_config_file "llm_ptq/models/deepseekv2v3/autosmoothquant_config.json" \
    --num_calib_data 128 \
    --exclude_layers "$exclude_layers"\
    --multi_gpu true \
    --quant_algo "autosmoothquant" \
    --model_export "hf_format" \
    --output_dir "$output_dir"

Deployment

Backend: vLLM
Description: This model can be deployed efficiently using the vLLM backend.

Evaluation

Tasks:
- Wikitext
- GSM8K
Framework: lm-evaluation-harness
Engine: vLLM

Accuracy

wikitext-ppl
3.33074593544006

Reproduction Command

Wikitext:  
  lm_eval \
    --model vllm \
    --model_args pretrained="amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ",gpu_memory_utilization=0.85,tensor_parallel_size=8,kv_cache_dtype='fp8' \
    --tasks wikitext \
    --fewshot_as_multiturn \
    --apply_chat_template \
    --num_fewshot 5 \
    --batch_size auto

GSM8K:
  lm_eval \
    --model vllm \
    --model_args pretrained="amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ",gpu_memory_utilization=0.85,tensor_parallel_size=8,kv_cache_dtype='fp8' \
    --tasks gsm8k_llama \
    --fewshot_as_multiturn \
    --apply_chat_template \
    --num_fewshot 8 \
    --batch_size auto

License

Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.

amd
/

DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ

Model Overview

Model Quantization

Deployment

Evaluation

Accuracy

License

Model tree for amd/DeepSeek-V3-0324-WMXFP4-AMXFP4-MoE-Quant-ASQ