File size: 797 Bytes
e484dc5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
license: apache-2.0
base_model:
- Qwen/Qwen3-Embedding-0.6B
tags:
- transformers
- sentence-transformers
- sentence-similarity
- feature-extraction
---

# Qwen3-Embedding-0.6B-W4A16-G128

GPTQ Quantized [https://huggingface.co/Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) with [THUIR/T2Ranking](https://huggingface.co/datasets/THUIR/T2Ranking) and [m-a-p/COIG-CQIA](huggingface.co/datasets/m-a-p/COIG-CQIA) for calibration set.

## What's the benefit?

VRAM Usage: `3228M` -> `2124M`

## What's the cost?

I think `~5%` accuracy, further evaluation on the way...

## How to use it?

`pip install compressed-tensors optimum` and `auto-gptq` / `gptqmodel`, then goto [the official usage guide](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B#transformers-usage).