boboliu
/

Qwen3-Embedding-0.6B-W4A16-G128

Feature Extraction

sentence-transformers

text-generation

sentence-similarity

text-generation-inference

compressed-tensors

Model card Files Files and versions

Qwen3-Embedding-0.6B-W4A16-G128 / .ipynb_checkpoints /README-checkpoint.md

boboliu's picture

Upload folder using huggingface_hub

e484dc5 verified 2 months ago

|

history blame contribute delete

797 Bytes

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen3-Embedding-0.6B
	tags:
	- transformers
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	---

	# Qwen3-Embedding-0.6B-W4A16-G128

	GPTQ Quantized [https://huggingface.co/Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) with [THUIR/T2Ranking](https://huggingface.co/datasets/THUIR/T2Ranking) and [m-a-p/COIG-CQIA](huggingface.co/datasets/m-a-p/COIG-CQIA) for calibration set.

	## What's the benefit?

	VRAM Usage: `3228M` -> `2124M`

	## What's the cost?

	I think `~5%` accuracy, further evaluation on the way...

	## How to use it?

	`pip install compressed-tensors optimum` and `auto-gptq` / `gptqmodel`, then goto [the official usage guide](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B#transformers-usage).