[microsoft/Phi-3.5-mini-instruct](microsoft/Phi-3.5-mini-instruct) converted to OpenVINO with INT8 weight compression. Quick start, to download this model and run chat inference: ``` pip install huggingface-hub[cli] openvino-genai==2025.2 curl -O https://raw.githubusercontent.com/helena-intel/snippets/refs/heads/main/llm_chat/python/llm_chat_manual.py huggingface-cli download helenai/Phi-3.5-mini-instruct-ov-int8 --local-dir Phi-3.5-mini-instruct-ov-int8 python llm_chat_manual.py Phi-3.5-mini-instruct-ov-int8 CPU ``` In the last line, change CPU to GPU to run on Intel GPU. Model export command (to export this model yourself): ``` pip install --upgrade optimum-intel[openvino] optimum-cli export openvino -m microsoft/Phi-3.5-mini-instruct --weight-format int8 Phi-3.5-mini-instruct-ov-int8 ``` This model was exported with the following versions (from [modelinfo.py](https://github.com/helena-intel/snippets/blob/main/model_info/modelinfo.py)) ``` openvino_version : 2025.2.0-19140-c01cd93e24d-releases/2025/2 nncf_version : 2.16.0 optimum_intel_version : 1.23.1 optimum_version : 1.25.3 pytorch_version : 2.7.1 transformers_version : 4.51.3 ```