Qwen3-Reranker-4B-F32-GGUF

Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B). This series inherits the exceptional multilingual capabilities, long-text understanding, and reasoning skills of its foundational model. The Qwen3 Embedding series represents significant advancements in multiple text embedding and ranking tasks, including text retrieval, code retrieval, text classification, text clustering, and bitext mining.

Model Files

Filename	Size	Format	Description
Qwen3-Reranker-4B.BF16.gguf	8.05 GB	BF16	Brain Float 16-bit quantization
Qwen3-Reranker-4B.F16.gguf	8.05 GB	F16	Half precision (16-bit) floating point
Qwen3-Reranker-4B.F32.gguf	16.1 GB	F32	Full precision (32-bit) floating point
Qwen3-Reranker-4B.Q2_K.gguf	1.67 GB	Q2_K	2-bit quantization with K-quant
Qwen3-Reranker-4B.Q3_K_L.gguf	2.24 GB	Q3_K_L	3-bit quantization (Large) with K-quant
Qwen3-Reranker-4B.Q3_K_M.gguf	2.08 GB	Q3_K_M	3-bit quantization (Medium) with K-quant
Qwen3-Reranker-4B.Q3_K_S.gguf	1.89 GB	Q3_K_S	3-bit quantization (Small) with K-quant
Qwen3-Reranker-4B.Q4_K_M.gguf	2.5 GB	Q4_K_M	4-bit quantization (Medium) with K-quant
Qwen3-Reranker-4B.Q4_K_S.gguf	2.38 GB	Q4_K_S	4-bit quantization (Small) with K-quant
Qwen3-Reranker-4B.Q5_K_M.gguf	2.89 GB	Q5_K_M	5-bit quantization (Medium) with K-quant
Qwen3-Reranker-4B.Q5_K_S.gguf	2.82 GB	Q5_K_S	5-bit quantization (Small) with K-quant
Qwen3-Reranker-4B.Q6_K.gguf	3.31 GB	Q6_K	6-bit quantization with K-quant
Qwen3-Reranker-4B.Q8_0.gguf	4.28 GB	Q8_0	8-bit quantization

Recommended Usage for Reranking Tasks

Q4_K_M or Q5_K_M: Optimal balance for most reranking applications
Q6_K or Q8_0: Higher precision for critical ranking accuracy
Q3_K_M: Good performance with reduced memory footprint
F16 or BF16: Maximum reranking precision, requires more VRAM
F32: Highest precision for research and benchmarking

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

prithivMLmods
/

Qwen3-Reranker-4B-F32-GGUF

Qwen3-Reranker-4B-F32-GGUF

Model Files

Recommended Usage for Reranking Tasks

Quants Usage

Model tree for prithivMLmods/Qwen3-Reranker-4B-F32-GGUF