--- language: - en license: apache-2.0 tags: - reranker - cross-encoder - sequence-classification - vllm base_model: Qwen/Qwen3-Reranker-4B pipeline_tag: text-classification --- # Qwen3-Reranker-4B-seq-cls-vllm-fixed This is a fixed version of the Qwen3-Reranker-4B model converted to sequence classification format, optimized for use with vLLM. ## Model Description This model is a pre-converted version of [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) that: - Has been converted from CausalLM to SequenceClassification architecture - Includes proper configuration for vLLM compatibility - Provides ~75,000x reduction in classification head size - Offers ~150,000x fewer operations per token compared to using the full LM head ## Key Improvements The original converted model ([tomaarsen/Qwen3-Reranker-4B-seq-cls](https://huggingface.co/tomaarsen/Qwen3-Reranker-4B-seq-cls)) was missing critical vLLM configuration attributes. This version adds: ```json { "classifier_from_token": ["no", "yes"], "method": "from_2_way_softmax", "use_pad_token": false, "is_original_qwen3_reranker": false } ``` These configurations are essential for vLLM to properly handle the pre-converted weights. ## Usage with vLLM ```bash vllm serve danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed \ --task score \ --served-model-name qwen3-reranker-4b \ --disable-log-requests ``` ### Python Example ```python from vllm import LLM llm = LLM( model="danielchalef/Qwen3-Reranker-4B-seq-cls-vllm-fixed", task="score" ) queries = ["What is the capital of France?"] documents = ["Paris is the capital of France."] outputs = llm.score(queries, documents) scores = [output.outputs.score for output in outputs] print(scores) ``` ## Performance This model performs identically to the original Qwen3-Reranker-4B when used with proper configuration, while providing significant efficiency improvements: - **Memory**: ~600MB → ~8KB for classification head - **Compute**: 151,936 logits → 1 logit per forward pass - **Speed**: Faster inference due to reduced computation ## Technical Details - **Architecture**: Qwen3ForSequenceClassification - **Base Model**: Qwen/Qwen3-Reranker-4B - **Conversion Method**: from_2_way_softmax (yes_logit - no_logit) - **Model Size**: 4B parameters - **Task**: Reranking/Scoring ## Citation If you use this model, please cite the original Qwen3-Reranker: ```bibtex @misc{qwen3reranker2024, title={Qwen3-Reranker}, author={Qwen Team}, year={2024}, publisher={Hugging Face} } ``` ## License Apache 2.0 (inherited from the base model)