--- license: apache-2.0 base_model: - Qwen/Qwen3-Embedding-0.6B tags: - transformers - sentence-transformers - sentence-similarity - feature-extraction --- # Qwen3-Embedding-0.6B-W4A16-G128 GPTQ Quantized [https://huggingface.co/Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) with [THUIR/T2Ranking](https://huggingface.co/datasets/THUIR/T2Ranking) and [m-a-p/COIG-CQIA](huggingface.co/datasets/m-a-p/COIG-CQIA) for calibration set. ## What's the benefit? VRAM Usage: `3228M` -> `2124M` ## What's the cost? I think `~5%` accuracy, further evaluation on the way... ## How to use it? `pip install compressed-tensors optimum` and `auto-gptq` / `gptqmodel`, then goto [the official usage guide](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B#transformers-usage).