vlad-m-dev commited on
Commit
5275b57
·
verified ·
1 Parent(s): b8d6a72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -1
README.md CHANGED
@@ -52,4 +52,49 @@ tags:
52
  - real-time
53
  - fasttext
54
  - distiluse
55
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  - real-time
53
  - fasttext
54
  - distiluse
55
+
56
+ ---
57
+
58
+ # 🧠 Unified Multilingual MiniLM Text Embedder (ONNX + Tokenizer Merged)
59
+
60
+ This is a highly optimized, quantized, and fully standalone model for **generating sentence embeddings** from **multilingual text**, including Ukrainian, English, Polish, and more.
61
+
62
+ Built upon `distiluse-base-multilingual-cased-v2`, the model has been:
63
+
64
+ - 🔁 **Merged with its tokenizer** into a single ONNX file
65
+ - ⚙️ **Extended with a custom preprocessing layer**
66
+ - ⚡ **Quantized to INT8** and ARM64-ready
67
+ - 🧪 **Extensively tested across real-world NLP tasks**
68
+ - 🛠️ **Bug-fixed** vs the original `sentence-transformers` quantized version that produced inaccurate cosine similarity
69
+
70
+ ---
71
+
72
+ ## 🚀 Key Features
73
+
74
+ - 🧩 **Single-file architecture**: no need for external tokenizer, vocab, or `transformers` library.
75
+ - ⚡ **93% faster inference** on mobile compared to the original model.
76
+ - 🗣️ **Multilingual**: robust across many languages, including low-resource ones.
77
+ - 🧠 **Output = pure embeddings**: pass a string, get a 384-dim vector. That’s it.
78
+ - 🛠️ **Ready for production**: small, fast, accurate, and easy to integrate.
79
+ - 📱 **Ideal for edge-AI, mobile, and offline scenarios.**
80
+
81
+ ---
82
+
83
+ ## 🐍 Python Example
84
+ ```python
85
+ import numpy as np
86
+ import onnxruntime as ort
87
+ from onnxruntime_extensions import get_library_path
88
+
89
+ sess_options = ort.SessionOptions()
90
+ sess_options.register_custom_ops_library(get_library_path())
91
+
92
+ session = ort.InferenceSession(
93
+ 'model.onnx',
94
+ sess_options=sess_options,
95
+ providers=['CPUExecutionProvider']
96
+ )
97
+
98
+ input_feed = {"text": np.asarray(['something..'])}
99
+ outputs = session.run(None, input_feed)
100
+ embedding = outputs[0] # shape: [1, 384]