vlad-m-dev
/

distiluse-base-multilingual-v2-merged-onnx

Feature Extraction

sentence-transformers

sentence-embeddings

sentence-similarity

semantic-search

retrieval-augmented-generation

tokenizer-embedded

tokenizer-integrated

int8-quantization

mobile-inference

production-ready

deployment-ready

Model card Files Files and versions

vlad-m-dev commited on Jun 22

Commit

5275b57

·

verified ·

1 Parent(s): b8d6a72

Update README.md

Files changed (1) hide show

README.md +46 -1

README.md CHANGED Viewed

@@ -52,4 +52,49 @@ tags:
 - real-time
 - fasttext
 - distiluse
----

 - real-time
 - fasttext
 - distiluse
+---
+# 🧠 Unified Multilingual MiniLM Text Embedder (ONNX + Tokenizer Merged)
+This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more.
+Built upon `distiluse-base-multilingual-cased-v2`, the model has been:
+- 🔁 **Merged with its tokenizer** into a single ONNX file
+- ⚙️ **Extended with a custom preprocessing layer**
+- ⚡ **Quantized to INT8** and ARM64-ready
+- 🧪 **Extensively tested across real-world NLP tasks**
+- 🛠️ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity
+---
+## 🚀 Key Features
+- 🧩 **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library.
+- ⚡ **93% faster inference** on mobile compared to the original model.
+- 🗣️ **Multilingual**: robust across many languages, including low-resource ones.
+- 🧠 **Output = pure embeddings**: pass a string, get a 384-dim vector. That’s it.
+- 🛠️ **Ready for production**: small, fast, accurate, and easy to integrate.
+- 📱 **Ideal for edge-AI, mobile, and offline scenarios.**
+---
+## 🐍 Python Example
+```python
+import numpy as np
+import onnxruntime as ort
+from onnxruntime_extensions import get_library_path
+sess_options = ort.SessionOptions()
+sess_options.register_custom_ops_library(get_library_path())
+session = ort.InferenceSession(
+    'model.onnx',
+    sess_options=sess_options,
+    providers=['CPUExecutionProvider']
+)
+input_feed = {"text": np.asarray(['something..'])}
+outputs = session.run(None, input_feed)
+embedding = outputs[0]  # shape: [1, 384]