--- language: multilingual license: mit tags: - onnx - optimum - text-embedding - onnxruntime - opset19 - sentence-similarity - gpu - optimized datasets: - mmarco pipeline_tag: sentence-similarity --- # gte-multilingual-reranker-base-onnx-op19-opt-gpu This model is an ONNX version of [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) using ONNX opset 19. ## Model Details - **Framework**: ONNX Runtime - **ONNX Opset**: 19 - **Task**: sentence-similarity - **Target Device**: GPU - **Optimized**: Yes - **Original Model**: [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) - **Exported On**: 2025-03-31 - **Author** This model was modified by [Jaro](https://www.linkedin.com/in/jaroai/) ## Environment and Package Versions | Package | Version | | --- | --- | | transformers | 4.48.3 | | optimum | 1.24.0 | | onnx | 1.17.0 | | onnxruntime | 1.21.0 | | torch | 2.5.1 | | numpy | 1.26.4 | | huggingface_hub | 0.28.1 | | python | 3.12.9 | | system | Darwin 24.3.0 | ### Applied Optimizations | Optimization | Setting | | --- | --- | | Graph Optimization Level | Extended | | Optimize for GPU | Yes | | Use FP16 | No | | Transformers Specific Optimizations Enabled | Yes | | Gelu Fusion Enabled | Yes | | Layer Norm Fusion Enabled | Yes | | Attention Fusion Enabled | Yes | | Skip Layer Norm Fusion Enabled | Yes | | Gelu Approximation Enabled | Yes | ## Usage ```python from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer # Load model and tokenizer model = ORTModelForSequenceClassification.from_pretrained("onnx") tokenizer = AutoTokenizer.from_pretrained("onnx") # Prepare input text = "Your text here" inputs = tokenizer(text, return_tensors="pt") # Run inference outputs = model(**inputs) ``` ## Export Process This model was exported to ONNX format using the Optimum library from Hugging Face with opset 19. Graph optimization was applied during export, targeting GPU devices. ## Performance ONNX Runtime models generally offer better inference speed compared to native PyTorch models, especially when deployed to production environments.