sparse-encoder (Sentence Transformers

Organization Card

SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings.

Install the Sentence Transformers library.

pip install -U sentence-transformers

The usage is as simple as:

from sentence_transformers import SparseEncoder

# 1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

# The sentences to encode
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

# 2. Calculate sparse embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522] - sparse representation with vocabulary size dimensions

# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[   35.629,     9.154,     0.098],
#         [    9.154,    27.478,     0.019],
#         [    0.098,     0.019,    29.553]])

# 4. Check sparsity stats
stats = SparseEncoder.sparsity(embeddings)
print(f"Sparsity: {stats['sparsity_ratio']:.2%}")
# Sparsity: 99.84%

Hugging Face makes it easy to collaboratively build and showcase your Sentence Transformers models! You can collaborate with your organization, upload and showcase your own models in your profile ❤️

Documentation

Push your Sentence Transformers models to the Hub ❤️

Find all SparseEncoder models on the 🤗 Hub

To upload your SparseEncoder models to the Hugging Face Hub, log in with huggingface-cli login and use the push_to_hub method within the Sentence Transformers library.

from sentence_transformers import SparseEncoder

# Load or train a model
model = SparseEncoder(...)
# Push to Hub
model.push_to_hub("my_new_model")

Note that this repository hosts for now only examples of sparse-encoder models from the SentenceTransformers package that can be easily reproduced with the different training script examples.

More details at Sparse Encoder > Training Examples for the examples scripts and Sparse Encoder > Pretrained Models for the community pre-trained models, that you can also found for some of them in the following collections.