Phi-4 Reasoning Quantized


πŸš€ Model Description

This is an int8 quantized version of Phi-4 Reasoning, optimized using torchao for reduced memory footprint and accelerated inference. The quantization applies int8 weights with dynamic int8 activations, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.


Quantization Details

  • Method: torchao quantization
  • Weight Precision: int8
  • Activation Precision: int8 dynamic
  • Technique: Symmetric mapping
  • Impact: Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.

🎯 Intended Use

  • Fast inference in production environments with limited VRAM
  • Research on int8 quantization deployment performance
  • Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.

⚠️ Limitations

  • Slight degradation in performance compared to full-precision (bfloat16) models
  • English-centric training data; may underperform in other languages or nuanced tasks
  • Further finetuning or quantization-aware calibration can enhance task-specific performance.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AINovice2005/quantized-Phi-4-reasoning

Base model

microsoft/phi-4
Quantized
(28)
this model

Collection including AINovice2005/quantized-Phi-4-reasoning