--- library_name: optimum tags: - onnx - quantized - int8 - intent-classification base_model: rbojja/intent-classification-small --- # Intent Classification ONNX Quantized Quantized ONNX version for fast inference. ## Usage ```python from optimum.onnxruntime import ORTModelForFeatureExtraction from transformers import AutoTokenizer model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized") tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized") text = "I want to book a flight" inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) ``` ## Performance - ~4x smaller size - 2-4x faster inference - Minimal accuracy loss