Intent Classification ONNX Quantized

Quantized ONNX version for fast inference.

Usage

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer

model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized")
tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized")

text = "I want to book a flight"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

Performance

  • ~4x smaller size
  • 2-4x faster inference
  • Minimal accuracy loss
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pythn/intent-classification-onnx-quantized

Quantized
(1)
this model