DeepSeek-R1-0528-quantized.w4a16
Model Overview
- Model Architecture: DeepseekV3ForCausalLM
- Input: Text
- Output: Text
- Model Optimizations:
- Activation quantization: None
- Weight quantization: INT4
- Release Date: 05/30/2025
- Version: 1.0
- Model Developers: Red Hat (Neural Magic)
Model Optimizations
This model was obtained by quantizing weights of DeepSeek-R1-0528 to INT4 data type. This optimization reduces the number of bits used to represent weights from 8 to 4, reducing GPU memory requirements (by approximately 50%). Weight quantization also reduces disk size requirements by approximately 50%.
Deployment
This model can be deployed efficiently using the vLLM backend, as shown in the example below.
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_id = "RedHatAI/DeepSeek-R1-0528-quantized.w4a16"
number_gpus = 8
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, max_tokens=256)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Give me a short introduction to large language model."
llm = LLM(model=model_id, tensor_parallel_size=number_gpus)
outputs = llm.generate(prompt, sampling_params)
generated_text = outputs[0].outputs[0].text
print(generated_text)
vLLM also supports OpenAI-compatible serving. See the documentation for more details.
Evaluation
The model was evaluated on popular reasoning tasks (AIME 2024, MATH-500, GPQA-Diamond) via LightEval.
For reasoning evaluations, we estimate pass@1 based on 10 runs with different seeds, temperature=0.6
, top_p=0.95
and max_new_tokens=65536
.
Accuracy
Recovery (%) | deepseek/DeepSeek-R1-0528 | RedHatAI/DeepSeek-R1-0528-quantized.w4a16 (this model) |
|
---|---|---|---|
AIME 2024 pass@1 |
98.50 | 88.66 | 87.33 |
MATH-500 pass@1 |
99.88 | 97.52 | 97.40 |
GPQA Diamond pass@1 |
101.21 | 79.65 | 80.61 |
Reasoning Average Score |
99.82 | 88.61 | 88.45 |
- Downloads last month
- 126
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for RedHatAI/DeepSeek-R1-0528-quantized.w4a16
Base model
deepseek-ai/DeepSeek-R1-0528