Spaces:
Sleeping
Sleeping
File size: 1,125 Bytes
f928e33 56f379a f928e33 54d6eb5 21765f6 54d6eb5 f928e33 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
---
license: apache-2.0
title: PyTorch Weights-only-Quantization (WoQ)
sdk: gradio
emoji: π
colorFrom: red
colorTo: pink
pinned: false
short_description: Inference scripts for pytorch weights-only-quantization
---
# PyTorch Weights-only-Quantization (WoQ)
Inference scripts for pytorch weights-only-quantization
## TEQ: a trainable equivalent transformation that preserves the FP32 precision in weight-only quantization
### Install
```
conda create -n teq-inference python=3.10
conda activate teq-inference
conda install -c conda-forge gcc
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
```
### Usage
```
python teq_inference.py --base <base_model> --model_dir <path-to-woq-TEQ-quantized-model> --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu
```
For example:
```
python teq_inference.py --base meta-llama/Llama-3.2-1B --model_dir ./meta-llama_Llama-3.2-1B-TEQ-int4-gs128-asym --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu
``` |