metadata

license: mit
language:
  - en

BGE-large-en-v1.5-rag-int8-static

A quantized version of BAAI/BGE-large-en-v1.5 quantized with Intel® Neural Compressor and compatible with Optimum-Intel.

The model can be used with Optimum-Intel API and as a standalone model or as an embedder or ranker module as part of fastRAG RAG pipeline.

Technical details

Quantized using post-training static quantization.


Calibration set	qasper (with 100 random samples)"
Quantization tool	Optimum-Intel
Backend	`IPEX`
Original model	BAAI/BGE-large-en-v1.5

Instructions how to reproduce the quantized model can be found here.

Evaluation - MTEB

Model performance on the Massive Text Embedding Benchmark (MTEB) retrieval and reranking tasks.

	`INT8`	`FP32`	% diff
Reranking	0.5997	0.6003	-0.108%
Retrieval	0.5346	0.5429	-1.53%

Usage

Using with Optimum-intel

See Optimum-intel installation page for instructions how to install. Or run:

pip install -U optimum[neural-compressor, ipex] intel-extension-for-transformers

Loading a model:

from optimum.intel import IPEXModel

model = IPEXModel.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")

Running inference:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Intel/bge-large-en-v1.5-rag-int8-static")

inputs = tokenizer(sentences, return_tensors='pt')

with torch.no_grad():
    outputs = model(**inputs)
    # get the vector of [CLS]
    embedded = model_output[0][:, 0]

Using with a fastRAG RAG pipeline

Get started with installing fastRAG as instructed here.

Below is an example for loading the model into a ranker node that embeds and re-ranks all the documents it gets in the node input of a pipeline.

from fastrag.rankers import QuantizedBiEncoderRanker

ranker = QuantizedBiEncoderRanker("Intel/bge-large-en-v1.5-rag-int8-static")

and plugging it into a pipeline


from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="retriever", inputs=["Query"])
p.add_node(component=ranker, name="ranker", inputs=["retriever"])

See a more complete example notebook here.