File size: 1,125 Bytes
f928e33
 
 
 
 
56f379a
 
f928e33
 
 
54d6eb5
 
 
 
 
 
 
 
 
21765f6
54d6eb5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f928e33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: apache-2.0
title: PyTorch Weights-only-Quantization (WoQ)
sdk: gradio
emoji: πŸ“‰
colorFrom: red
colorTo: pink
pinned: false
short_description: Inference scripts for pytorch weights-only-quantization
---
# PyTorch Weights-only-Quantization (WoQ)

Inference scripts for pytorch weights-only-quantization

## TEQ: a trainable equivalent transformation that preserves the FP32 precision in weight-only quantization

### Install

```
conda create -n teq-inference python=3.10

conda activate teq-inference

conda install -c conda-forge gcc

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

pip install -r requirements.txt
```

### Usage

```
python teq_inference.py --base <base_model> --model_dir <path-to-woq-TEQ-quantized-model> --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu
```

For example:

```
python teq_inference.py --base meta-llama/Llama-3.2-1B --model_dir ./meta-llama_Llama-3.2-1B-TEQ-int4-gs128-asym --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu
```