Granite Embedding English R2 β INT8 (ONNX)
This is the INT8-quantized ONNX version of ibm-granite/granite-embedding-english-r2
.
It is optimized to run efficiently on CPU using π€ Optimum with ONNX Runtime.
- Embedding dimension: 768
- Precision: INT8 (dynamic quantization)
- Backend: ONNX Runtime
- Use case: text embeddings, semantic search, clustering, retrieval
π₯ Installation
pip install -U transformers optimum[onnxruntime]
π Usage
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForFeatureExtraction
repo_id = "yasserrmd/granite-embedding-r2-onnx"
# Load tokenizer + ONNX model
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = ORTModelForFeatureExtraction.from_pretrained(repo_id)
# Encode sentences
inputs = tokenizer(["Hello world", "Ω
Ψ±ΨΨ¨Ψ§Ω"], padding=True, return_tensors="pt")
outputs = model(**inputs)
# Apply mean pooling over tokens
embeddings = outputs.last_hidden_state.mean(dim=1)
print(embeddings.shape) # (2, 768)
β Notes
- Quantization reduces model size and makes inference faster on CPUs while preserving accuracy.
- Pooling strategy here is mean pooling; you can adapt CLS pooling or max pooling as needed.
- Works seamlessly with Hugging Face Hub +
optimum.onnxruntime
.
π References
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for yasserrmd/granite-embedding-r2-onnx
Base model
ibm-granite/granite-embedding-english-r2