Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +129 -0
config.json +25 -0
model-quant.onnx +3 -0
model.onnx +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+library_name: sentence-transformers
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- sentence-similarity
+- quantized
+- onnx
+- clustering
+model-index:
+- name: sentence-transformers/all-MiniLM-L6-v2-quantized
+  results:
+  - task:
+      type: semantic-similarity
+      name: Semantic Similarity
+    dataset:
+      type: semantic-similarity
+      name: Semantic Similarity
+    metrics:
+    - type: similarity
+      value: 0.95+
+      name: Cosine Similarity (vs Original)
+---
+# Quantized SentenceTransformer: all-MiniLM-L6-v2
+This is a quantized version of the popular [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model, optimized for production deployment.
+## Model Details
+- **Base Model**: sentence-transformers/all-MiniLM-L6-v2
+- **Quantization**: INT8 dynamic quantization using ONNX Runtime
+- **Size Reduction**: ~75% smaller than the original model
+- **Performance**: 95%+ similarity to original model embeddings
+- **Format**: ONNX
+## Files
+- `model-quant.onnx`: Quantized INT8 model (recommended for production)
+- `model.onnx`: Original FP32 ONNX model
+## Usage
+### With ONNX Runtime (Recommended)
+```python
+import onnxruntime as ort
+import numpy as np
+from transformers import AutoTokenizer
+# Load the quantized model
+session = ort.InferenceSession("model-quant.onnx")
+tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
+def encode_text(text):
+    # Tokenize
+    inputs = tokenizer(text, return_tensors="np", padding=True, truncation=True, max_length=512)
+    # Run inference
+    outputs = session.run(None, {
+        "input_ids": inputs["input_ids"],
+        "attention_mask": inputs["attention_mask"]
+    })
+    # Apply mean pooling
+    last_hidden_state = outputs[0]
+    attention_mask_expanded = np.expand_dims(inputs["attention_mask"], -1)
+    attention_mask_expanded = np.broadcast_to(attention_mask_expanded, last_hidden_state.shape)
+    masked_embeddings = last_hidden_state * attention_mask_expanded
+    summed = np.sum(masked_embeddings, axis=1)
+    summed_mask = np.sum(attention_mask_expanded, axis=1)
+    embedding = summed / np.maximum(summed_mask, 1e-9)
+    return embedding[0]
+# Example usage
+text = "I love this product!"
+embedding = encode_text(text)
+print(f"Embedding shape: {embedding.shape}")
+```
+### With SentenceTransformers (Original)
+For comparison with the original model:
+```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
+embedding = model.encode("I love this product!")
+```
+## Performance Comparison
+| Model | Size | Inference Speed | Memory Usage | Similarity to Original |
+|-------|------|----------------|--------------|----------------------|
+| Original | ~90MB | 1.0x | 1.0x | 100% |
+| Quantized | ~23MB | 1.2-1.5x | 0.6x | 95%+ |
+## Use Cases
+- **Text Clustering**: Group similar texts together
+- **Semantic Search**: Find semantically similar documents
+- **Recommendation Systems**: Content-based recommendations
+- **Duplicate Detection**: Find near-duplicate texts
+## Technical Details
+- **Embedding Dimension**: 384
+- **Max Sequence Length**: 512 tokens
+- **Quantization Method**: Dynamic INT8 quantization
+- **Framework**: ONNX Runtime
+## Citation
+If you use this model, please cite the original work:
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "http://arxiv.org/abs/1908.10084",
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
+  "architectures": [
+    "BertModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "gradient_checkpointing": false,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 6,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "transformers_version": "4.21.2",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model-quant.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6adafce0ae0bfaa3efb575628f4aaf625df8e7ff63d4592c39998b0a85eaa1fa
+size 22931635

model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d43542072f44bc4ece1945b4b613222fb8870ef670cbd973d850c0f8dcbe49f4
+size 90422640