JustJaro's picture
Update README.md
4c801f1 verified
metadata
language: multilingual
license: mit
tags:
  - onnx
  - optimum
  - text-embedding
  - onnxruntime
  - opset19
  - sentence-similarity
  - gpu
  - optimized
datasets:
  - mmarco
pipeline_tag: sentence-similarity

gte-multilingual-reranker-base-onnx-op19-opt-gpu

This model is an ONNX version of Alibaba-NLP/gte-multilingual-reranker-base using ONNX opset 19.

Model Details

Environment and Package Versions

Package Version
transformers 4.48.3
optimum 1.24.0
onnx 1.17.0
onnxruntime 1.21.0
torch 2.5.1
numpy 1.26.4
huggingface_hub 0.28.1
python 3.12.9
system Darwin 24.3.0

Applied Optimizations

Optimization Setting
Graph Optimization Level Extended
Optimize for GPU Yes
Use FP16 No
Transformers Specific Optimizations Enabled Yes
Gelu Fusion Enabled Yes
Layer Norm Fusion Enabled Yes
Attention Fusion Enabled Yes
Skip Layer Norm Fusion Enabled Yes
Gelu Approximation Enabled Yes

Usage

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

# Load model and tokenizer
model = ORTModelForSequenceClassification.from_pretrained("onnx")
tokenizer = AutoTokenizer.from_pretrained("onnx")

# Prepare input
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt")

# Run inference
outputs = model(**inputs)

Export Process

This model was exported to ONNX format using the Optimum library from Hugging Face with opset 19. Graph optimization was applied during export, targeting GPU devices.

Performance

ONNX Runtime models generally offer better inference speed compared to native PyTorch models, especially when deployed to production environments.