pythn
/

intent-classification-onnx-quantized

intent-classification

Model card Files Files and versions

Intent Classification ONNX Quantized

Quantized ONNX version for fast inference.

Usage

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer

model = ORTModelForFeatureExtraction.from_pretrained("pythn/intent-classification-onnx-quantized")
tokenizer = AutoTokenizer.from_pretrained("pythn/intent-classification-onnx-quantized")

text = "I want to book a flight"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

Performance

~4x smaller size
2-4x faster inference
Minimal accuracy loss

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pythn/intent-classification-onnx-quantized

Base model

BAAI/bge-small-en-v1.5

Finetuned

rbojja/intent-classification-small

Quantized

(1)

this model