Here's a python package that you can use to index, query, and rank your documents with SPLADE models from sentence-transformers.
splade-index
: https://github.com/rasyosef/splade-index
SPLADE-Index⚡
SPLADE-Index is an ultrafast index for SPLADE sparse retrieval models implemented in pure Python and powered by Scipy sparse matrices. It is built on top of the BM25s library.
Installation
You can install splade-index
with pip:
pip install splade-index
Recommended (but optional) dependencies:
pip install "jax[cpu]"
Quickstart
Here is a simple example of how to use splade-index
:
from sentence_transformers import SparseEncoder
from splade_index import SPLADE
model = SparseEncoder("rasyosef/splade-tiny")
corpus = [
"a cat is a feline and likes to purr",
"a dog is the human's best friend and loves to play",
"a bird is a beautiful animal that can fly",
"a fish is a creature that lives in water and swims",
]
retriever = SPLADE()
retriever.index(model=model, documents=corpus)
queries = ["does the fish purr like a cat?"]
results = retriever.retrieve(queries, k=2)
doc_ids, result_docs, scores = results.doc_ids, results.documents, results.scores
for i in range(doc_ids.shape[1]):
doc_id, doc, score = doc_ids[0, i], result_docs[0, i], scores[0, i]
print(f"Rank {i+1} (score: {score:.2f}) (doc_id: {doc_id}): {doc}")
retriever.save("animal_index_splade")
import splade_index
reloaded_retriever = splade_index.SPLADE.load("animal_index_splade", model=model)