license: mit
language:
- en
BGE-large-en-v1.5-rag-int8-static
A quantized version of BAAI/BGE-large-en-v1.5 quantized with Intel® Neural Compressor and compatible with Optimum-Intel.
The model can be used with Optimum-Intel API and as a standalone model or as an embedder or ranker module as part of fastRAG RAG pipeline.
Technical details
Quantized using post-training static quantization.
Calibration set | qasper (with 100 random samples)" |
Quantization tool | Optimum-Intel |
Backend | IPEX |
Original model | BAAI/BGE-large-en-v1.5 |
Instructions how to reproduce the quantized model can be found here.
Evaluation - MTEB
Model performance on the Massive Text Embedding Benchmark (MTEB) retrieval and reranking tasks.
INT8 |
FP32 |
% diff | |
---|---|---|---|
Reranking | 0.5997 | 0.6003 | -0.108% |
Retrieval | 0.5346 | 0.5429 | -1.53% |
Usage
Using with Optimum-intel
See Optimum-intel installation page for instructions how to install. Or run:
pip install -U optimum[neural-compressor, ipex] intel-extension-for-transformers
Loading a model:
from optimum.intel import IPEXModel
model = IPEXModel.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")
Running inference:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")
inputs = tokenizer(sentences, return_tensors='pt')
with torch.no_grad():
outputs = model(**inputs)
# get the vector of [CLS]
embedded = model_output[0][:, 0]
Using with a fastRAG RAG pipeline
Get started with installing fastRAG as instructed here.
Below is an example for loading the model into a ranker node that embeds and re-ranks all the documents it gets in the node input of a pipeline.
from fastrag.rankers import QuantizedBiEncoderRanker
ranker = QuantizedBiEncoderRanker("Intel/bge-large-en-v1.5-rag-int8-static")
and plugging it into a pipeline
from haystack import Pipeline
p = Pipeline()
p.add_node(component=retriever, name="retriever", inputs=["Query"])
p.add_node(component=ranker, name="ranker", inputs=["retriever"])
See a more complete example notebook here.