Intent Classification ONNX Quantized
Quantized ONNX version for fast inference.
Usage
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized")
tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized")
text = "I want to book a flight"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
Performance
- ~4x smaller size
- 2-4x faster inference
- Minimal accuracy loss
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for pythn/intent-classification-onnx-quantized
Base model
BAAI/bge-small-en-v1.5
Finetuned
rbojja/intent-classification-small