JustJaro commited on
Commit
c82efa8
·
verified ·
1 Parent(s): 3108a82

Upload ONNX model with opset 14

Browse files
Files changed (3) hide show
  1. README.md +11 -16
  2. optimization_report.json +2 -2
  3. upload_info.json +6 -0
README.md CHANGED
@@ -4,12 +4,10 @@ license: mit
4
  tags:
5
  - onnx
6
  - optimum
7
- - quantized
8
- - none
9
  - text-embedding
10
  - onnxruntime
11
  - opset14
12
- - text-classification
13
  - gpu
14
  - optimized
15
  datasets:
@@ -17,20 +15,19 @@ datasets:
17
  pipeline_tag: sentence-similarity
18
  ---
19
 
20
- # gte-multilingual-reranker-base-onnx-op14-opt-gpu-fpnone-quantized
21
 
22
- This model is a quantized ONNX version of [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) using ONNX opset 14.
23
 
24
  ## Model Details
25
 
26
- - **Quantization Type**: FPnone
27
  - **ONNX Opset**: 14
28
- - **Task**: text-classification
29
  - **Target Device**: GPU
30
  - **Optimized**: Yes
31
- - **Framework**: ONNX Runtime
32
  - **Original Model**: [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base)
33
- - **Quantized On**: 2025-03-27
34
 
35
  ## Environment and Package Versions
36
 
@@ -80,14 +77,12 @@ inputs = tokenizer(text, return_tensors="pt")
80
  outputs = model(**inputs)
81
  ```
82
 
83
- ## Quantization Process
84
 
85
- This model was quantized using ONNX Runtime with none quantization.
86
- The quantization was performed using the Optimum library from Hugging Face with opset 14.
87
  Graph optimization was applied during export, targeting GPU devices.
88
 
 
89
 
90
- ## Performance Comparison
91
-
92
- Quantized models generally offer better inference speed with a slight trade-off in accuracy.
93
- This FPnone quantized model should provide significantly faster inference than the original model.
 
4
  tags:
5
  - onnx
6
  - optimum
 
 
7
  - text-embedding
8
  - onnxruntime
9
  - opset14
10
+ - sentence-similarity
11
  - gpu
12
  - optimized
13
  datasets:
 
15
  pipeline_tag: sentence-similarity
16
  ---
17
 
18
+ # gte-multilingual-reranker-base-onnx-op14-opt-gpu
19
 
20
+ This model is an ONNX version of [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) using ONNX opset 14.
21
 
22
  ## Model Details
23
 
24
+ - **Framework**: ONNX Runtime
25
  - **ONNX Opset**: 14
26
+ - **Task**: sentence-similarity
27
  - **Target Device**: GPU
28
  - **Optimized**: Yes
 
29
  - **Original Model**: [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base)
30
+ - **Exported On**: 2025-03-27
31
 
32
  ## Environment and Package Versions
33
 
 
77
  outputs = model(**inputs)
78
  ```
79
 
80
+ ## Export Process
81
 
82
+ This model was exported to ONNX format using the Optimum library from Hugging Face with opset 14.
 
83
  Graph optimization was applied during export, targeting GPU devices.
84
 
85
+ ## Performance
86
 
87
+ ONNX Runtime models generally offer better inference speed compared to native PyTorch models,
88
+ especially when deployed to production environments.
 
 
optimization_report.json CHANGED
@@ -12,7 +12,7 @@
12
  },
13
  "failed_optimizations": {},
14
  "model_name": "Alibaba-NLP/gte-multilingual-reranker-base",
15
- "task": "text-classification",
16
  "target_device": "GPU",
17
- "timestamp": "2025-03-27T13:48:31.719934"
18
  }
 
12
  },
13
  "failed_optimizations": {},
14
  "model_name": "Alibaba-NLP/gte-multilingual-reranker-base",
15
+ "task": "sentence-similarity",
16
  "target_device": "GPU",
17
+ "timestamp": "2025-03-27T17:40:48.059814"
18
  }
upload_info.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "hf_repo": "JustJaro/gte-multilingual-reranker-base-onnx-op14-opt-gpu",
3
+ "upload_date": "2025-03-27T13:50:59.930292",
4
+ "upload_success": true,
5
+ "model_url": "https://huggingface.co/JustJaro/gte-multilingual-reranker-base-onnx-op14-opt-gpu"
6
+ }