AINovice2005/quantized-Phi-4-reasoning

Phi-4 Reasoning Quantized

🚀 Model Description

This is an int8 quantized version of Phi-4 Reasoning, optimized using torchao for reduced memory footprint and accelerated inference. The quantization applies int8 weights with dynamic int8 activations, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.

Quantization Details

Method: torchao quantization
Weight Precision: int8
Activation Precision: int8 dynamic
Technique: Symmetric mapping
Impact: Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.

🎯 Intended Use

Fast inference in production environments with limited VRAM
Research on int8 quantization deployment performance
Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.

⚠️ Limitations

Slight degradation in performance compared to full-precision (bfloat16) models
English-centric training data; may underperform in other languages or nuanced tasks
Further finetuning or quantization-aware calibration can enhance task-specific performance.

AINovice2005
/

quantized-Phi-4-reasoning

Phi-4 Reasoning Quantized

🚀 Model Description

Quantization Details

🎯 Intended Use

⚠️ Limitations

Model tree for AINovice2005/quantized-Phi-4-reasoning

Collection including AINovice2005/quantized-Phi-4-reasoning

🚀 Optimized Models: torchao & Pruna Quantization