Phi-4 Reasoning Quantized
π Model Description
This is an int8 quantized version of Phi-4 Reasoning, optimized using torchao for reduced memory footprint and accelerated inference. The quantization applies int8 weights with dynamic int8 activations, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.
Quantization Details
- Method: torchao quantization
- Weight Precision: int8
- Activation Precision: int8 dynamic
- Technique: Symmetric mapping
- Impact: Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.
π― Intended Use
- Fast inference in production environments with limited VRAM
- Research on int8 quantization deployment performance
- Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.
β οΈ Limitations
- Slight degradation in performance compared to full-precision (bfloat16) models
- English-centric training data; may underperform in other languages or nuanced tasks
- Further finetuning or quantization-aware calibration can enhance task-specific performance.