Spaces:

fbaldassarri
/

woq-inference

Sleeping

woq-inference / README.md

Update README.md

56f379a verified 3 months ago

1.13 kB

	---
	license: apache-2.0
	title: PyTorch Weights-only-Quantization (WoQ)
	sdk: gradio
	emoji: 📉
	colorFrom: red
	colorTo: pink
	pinned: false
	short_description: Inference scripts for pytorch weights-only-quantization
	---
	# PyTorch Weights-only-Quantization (WoQ)

	Inference scripts for pytorch weights-only-quantization

	## TEQ: a trainable equivalent transformation that preserves the FP32 precision in weight-only quantization

	### Install

	```
	conda create -n teq-inference python=3.10

	conda activate teq-inference

	conda install -c conda-forge gcc

	pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

	pip install -r requirements.txt
	```

	### Usage

	```
	python teq_inference.py --base <base_model> --model_dir <path-to-woq-TEQ-quantized-model> --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu
	```

	For example:

	```
	python teq_inference.py --base meta-llama/Llama-3.2-1B --model_dir ./meta-llama_Llama-3.2-1B-TEQ-int4-gs128-asym --weights_file quantized_weight.pt --config_file qconfig.json --prompt "Tell me a joke" --device cpu
	```