Qwen3-Reranker-4B-F32-GGUF

Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

Model Files

Filename Size Format Description
Qwen3-Reranker-4B.BF16.gguf 8.05 GB BF16 Brain Float 16-bit quantization
Qwen3-Reranker-4B.F16.gguf 8.05 GB F16 Half precision (16-bit) floating point
Qwen3-Reranker-4B.F32.gguf 16.1 GB F32 Full precision (32-bit) floating point
Qwen3-Reranker-4B.Q2_K.gguf 1.67 GB Q2_K 2-bit quantization with K-quant
Qwen3-Reranker-4B.Q3_K_L.gguf 2.24 GB Q3_K_L 3-bit quantization (Large) with K-quant
Qwen3-Reranker-4B.Q3_K_M.gguf 2.08 GB Q3_K_M 3-bit quantization (Medium) with K-quant
Qwen3-Reranker-4B.Q3_K_S.gguf 1.89 GB Q3_K_S 3-bit quantization (Small) with K-quant
Qwen3-Reranker-4B.Q4_K_M.gguf 2.5 GB Q4_K_M 4-bit quantization (Medium) with K-quant
Qwen3-Reranker-4B.Q4_K_S.gguf 2.38 GB Q4_K_S 4-bit quantization (Small) with K-quant
Qwen3-Reranker-4B.Q5_K_M.gguf 2.89 GB Q5_K_M 5-bit quantization (Medium) with K-quant
Qwen3-Reranker-4B.Q5_K_S.gguf 2.82 GB Q5_K_S 5-bit quantization (Small) with K-quant
Qwen3-Reranker-4B.Q6_K.gguf 3.31 GB Q6_K 6-bit quantization with K-quant
Qwen3-Reranker-4B.Q8_0.gguf 4.28 GB Q8_0 8-bit quantization

Recommended Usage for Reranking Tasks

  • Q4_K_M or Q5_K_M: Optimal balance for most reranking applications
  • Q6_K or Q8_0: Higher precision for critical ranking accuracy
  • Q3_K_M: Good performance with reduced memory footprint
  • F16 or BF16: Maximum reranking precision, requires more VRAM
  • F32: Highest precision for research and benchmarking

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
74
GGUF
Model size
4.02B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prithivMLmods/Qwen3-Reranker-4B-F32-GGUF

Base model

Qwen/Qwen3-4B-Base
Quantized
(8)
this model