--- license: apache-2.0 title: PyTorch Weights-only-Quantization (WoQ) sdk: gradio emoji: 📉 colorFrom: red colorTo: pink pinned: false short_description: Inference scripts for pytorch weights-only-quantization --- # PyTorch Weights-only-Quantization (WoQ) Inference scripts for pytorch weights-only-quantization ## TEQ: a trainable equivalent transformation that preserves the FP32 precision in weight-only quantization ### Install ``` conda create -n teq-inference python=3.10 conda activate teq-inference conda install -c conda-forge gcc pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu pip install -r requirements.txt ``` ### Usage ``` python teq_inference.py --base --model_dir --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu ``` For example: ``` python teq_inference.py --base meta-llama/Llama-3.2-1B --model_dir ./meta-llama_Llama-3.2-1B-TEQ-int4-gs128-asym --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu ```