QuantTrio
/

DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Lite

Text Generation

DeepSeek-R1-0528

text-generation-inference

4-bit precision

Model card Files Files and versions Community

tclf90 commited on 2 days ago

Commit

af03f5e

·

verified ·

1 Parent(s): 99603dd

Update README.md

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -26,10 +26,11 @@ Preliminary trials show that converting the entire model to pure Int4 (AWQ/GPTQ)
 Variant Overview
-| Variant     | Characteristics                                                         | File Size | Recommended Scenario                                     |
-|-------------|-------------------------------------------------------------------------|-----------|----------------------------------------------------------|
-| **Compact** | More Int8 layers, higher fidelity                                       | 414 GB    | Ample GPU memory & strict quality needs (e.g., 8 × A100) |
-| **Lite**    | Only the most critical layers upgraded to Int8; size close to pure Int4 | 355 GB    | Resource-constrained, lightweight server deployments     |
 Choose the variant that best matches your hardware and quality requirements.

 Variant Overview
+| Variant     | Characteristics                                                                       | File Size | Recommended Scenario                                                                       |
+|-------------|---------------------------------------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------|
+| **Lite**    | Only the most critical layers upgraded to Int8; size close to pure Int4               | 355 GB    | Resource-constrained, lightweight server deployments                                       |
+| **Compact** | More Int8 layers, relatively higher output quality                                    | 414 GB    | VRAM-sufficient deployments focused on answer quality (e.g., 8 × A100)                     |
+| **Medium**  | Compact plus fully-Int8 attention layers; high quality with reduced long-context loss | 445 GB    | VRAM-rich deployments needing both top answer quality and high concurrency (e.g., 8 × H20) |
 Choose the variant that best matches your hardware and quality requirements.