tclf90 commited on
Commit
af03f5e
·
verified ·
1 Parent(s): 99603dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -26,10 +26,11 @@ Preliminary trials show that converting the entire model to pure Int4 (AWQ/GPTQ)
26
 
27
  Variant Overview
28
 
29
- | Variant | Characteristics | File Size | Recommended Scenario |
30
- |-------------|-------------------------------------------------------------------------|-----------|----------------------------------------------------------|
31
- | **Compact** | More Int8 layers, higher fidelity | 414 GB | Ample GPU memory & strict quality needs (e.g., 8 × A100) |
32
- | **Lite** | Only the most critical layers upgraded to Int8; size close to pure Int4 | 355 GB | Resource-constrained, lightweight server deployments |
 
33
 
34
  Choose the variant that best matches your hardware and quality requirements.
35
 
 
26
 
27
  Variant Overview
28
 
29
+ | Variant | Characteristics | File Size | Recommended Scenario |
30
+ |-------------|---------------------------------------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------|
31
+ | **Lite** | Only the most critical layers upgraded to Int8; size close to pure Int4 | 355 GB | Resource-constrained, lightweight server deployments |
32
+ | **Compact** | More Int8 layers, relatively higher output quality | 414 GB | VRAM-sufficient deployments focused on answer quality (e.g., 8 × A100) |
33
+ | **Medium** | Compact plus fully-Int8 attention layers; high quality with reduced long-context loss | 445 GB | VRAM-rich deployments needing both top answer quality and high concurrency (e.g., 8 × H20) |
34
 
35
  Choose the variant that best matches your hardware and quality requirements.
36