yasserrmd commited on
Commit
44f4692
·
verified ·
1 Parent(s): 10b72b0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: optimum.onnxruntime
3
+ tags:
4
+ - onnx
5
+ - int8
6
+ - quantization
7
+ - embeddings
8
+ - cpu
9
+ pipeline_tag: feature-extraction
10
+ license: apache-2.0
11
+ base_model: ibm-granite/granite-embedding-english-r2
12
+ ---
13
+
14
+ # Granite Embedding English R2 — INT8 (ONNX)
15
+
16
+ This is the **INT8-quantized ONNX version** of [`ibm-granite/granite-embedding-english-r2`](https://huggingface.co/ibm-granite/granite-embedding-english-r2).
17
+ It is optimized to run efficiently on **CPU** using [🤗 Optimum](https://huggingface.co/docs/optimum) with ONNX Runtime.
18
+
19
+ - **Embedding dimension:** 768
20
+ - **Precision:** INT8 (dynamic quantization)
21
+ - **Backend:** ONNX Runtime
22
+ - **Use case:** text embeddings, semantic search, clustering, retrieval
23
+
24
+ ---
25
+
26
+ ## 📥 Installation
27
+
28
+ ```bash
29
+ pip install -U transformers optimum[onnxruntime]
30
+ ````
31
+
32
+ ---
33
+
34
+ ## 🚀 Usage
35
+
36
+ ```python
37
+ from transformers import AutoTokenizer
38
+ from optimum.onnxruntime import ORTModelForFeatureExtraction
39
+
40
+ repo_id = "yasserrmd/granite-embedding-r2-onnx"
41
+
42
+ # Load tokenizer + ONNX model
43
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
44
+ model = ORTModelForFeatureExtraction.from_pretrained(repo_id)
45
+
46
+ # Encode sentences
47
+ inputs = tokenizer(["Hello world", "مرحباً"], padding=True, return_tensors="pt")
48
+ outputs = model(**inputs)
49
+
50
+ # Apply mean pooling over tokens
51
+ embeddings = outputs.last_hidden_state.mean(dim=1)
52
+ print(embeddings.shape) # (2, 768)
53
+ ```
54
+
55
+ ---
56
+
57
+ ## ✅ Notes
58
+
59
+ * Quantization reduces model size and makes inference faster on CPUs while preserving accuracy.
60
+ * Pooling strategy here is **mean pooling**; you can adapt CLS pooling or max pooling as needed.
61
+ * Works seamlessly with **Hugging Face Hub** + `optimum.onnxruntime`.
62
+
63
+ ---
64
+
65
+ ## 📚 References
66
+
67
+ * [Original Granite Embedding English R2](https://huggingface.co/ibm-granite/granite-embedding-english-r2)
68
+ * [Optimum ONNX Runtime docs](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)
69
+