phi-3.5-onnx-qnn
phi-3.5-onnx-qnn is an ONNX QNN int4 quantized version of Microsoft Phi-3.5-mini-instruct, providing a small fast NPU inference implementation, optimized for NPU deployment on Windows ARM64 AI PCs with Snapdragon Elite X NPU processors.
Model Description
- Developed by: microsoft
- Model type: phi3
- Parameters: 3.8 billion
- Model Parent: microsoft/Phi-3.5-mini-instruct
- Language(s) (NLP): English
- License: Apache 2.0
- Uses: Chat, general-purpose LLM
- Quantization: int4
Model Card Contact
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for llmware/phi-3.5-onnx-qnn
Base model
microsoft/Phi-3.5-mini-instruct