🧠 Unified Multilingual Distiluse Text Embedder (ONNX + Tokenizer Merged)

This is a highly optimized, quantized, and fully standalone model for generating sentence embeddings from multilingual text, including Ukrainian, English, Polish, and more.

Built upon distiluse-base-multilingual-cased-v2, the model has been:

  • πŸ” Merged with its tokenizer into a single ONNX file
  • βš™οΈ Extended with a custom preprocessing layer
  • ⚑ Quantized to INT8 and ARM64-ready
  • πŸ§ͺ Extensively tested across real-world NLP tasks
  • πŸ› οΈ Bug-fixed vs the original sentence-transformers quantized version that produced inaccurate cosine similarity

πŸš€ Key Features

  • 🧩 Single-file architecture: no need for external tokenizer, vocab, or transformers library.
  • ⚑ 93% faster inference on mobile compared to the original model.
  • πŸ—£οΈ Multilingual: robust across many languages, including low-resource ones.
  • 🧠 Output = pure embeddings: pass a string, get a 768-dim vector. That’s it.
  • πŸ› οΈ Ready for production: small, fast, accurate, and easy to integrate.
  • πŸ“± Ideal for edge-AI, mobile, and offline scenarios.

πŸ€– Author @vlad-m-dev Built for edge-ai/phone/tablet offline Telegram: https://t.me/dwight_schrute_engineer


🐍 Python Example

import numpy as np
import onnxruntime as ort
from onnxruntime_extensions import get_library_path

sess_options = ort.SessionOptions()
sess_options.register_custom_ops_library(get_library_path())

session = ort.InferenceSession(
    'model.onnx',
    sess_options=sess_options,
    providers=['CPUExecutionProvider']
)

input_feed = {"text": np.asarray(['something..'])}
outputs = session.run(None, input_feed)
embedding = outputs[0]

🐍 JS Example

const session = await InferenceSession.create(EMBEDDING_FULL_MODEL_PATH); 
const inputTensor = new Tensor('string', ['something..'], [1]); 
const feeds = { text: inputTensor };
const outputMap = await session.run(feeds);
const embedding = outputMap.text_embedding.data;
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vlad-m-dev/distiluse-base-multilingual-v2-merged-onnx