license: apache-2.0 | |
base_model: | |
- Qwen/Qwen3-Embedding-0.6B | |
tags: | |
- transformers | |
- sentence-transformers | |
- sentence-similarity | |
- feature-extraction | |
# Qwen3-Embedding-0.6B-W4A16-G128 | |
GPTQ Quantized [https://huggingface.co/Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) with [THUIR/T2Ranking](https://huggingface.co/datasets/THUIR/T2Ranking) and [m-a-p/COIG-CQIA](huggingface.co/datasets/m-a-p/COIG-CQIA) for calibration set. | |
## What's the benefit? | |
VRAM Usage: `3228M` -> `2124M` | |
## What's the cost? | |
I think `~5%` accuracy, further evaluation on the way... | |
## How to use it? | |
`pip install compressed-tensors optimum` and `auto-gptq` / `gptqmodel`, then goto [the official usage guide](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B#transformers-usage). |