AI & ML interests
In the following you find models tuned to be used for sentence / text sparse embedding generation. They can be used with the sentence-transformers package and are result of small examples of the package documentation.
Recent Activity
SentenceTransformers 🤗 is a Python framework for state-of-the-art sentence, text and image embeddings.
Install the Sentence Transformers library.
pip install -U sentence-transformers
The usage is as simple as:
from sentence_transformers import SparseEncoder
# 1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
# 2. Calculate sparse embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 30522] - sparse representation with vocabulary size dimensions
# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 35.629, 9.154, 0.098],
# [ 9.154, 27.478, 0.019],
# [ 0.098, 0.019, 29.553]])
# 4. Check sparsity stats
stats = SparseEncoder.sparsity(embeddings)
print(f"Sparsity: {stats['sparsity_ratio']:.2%}")
# Sparsity: 99.84%
Hugging Face makes it easy to collaboratively build and showcase your Sentence Transformers models! You can collaborate with your organization, upload and showcase your own models in your profile ❤️



To upload your SparseEncoder models to the Hugging Face Hub, log in with huggingface-cli login
and use the push_to_hub
method within the Sentence Transformers library.
from sentence_transformers import SparseEncoder
# Load or train a model
model = SparseEncoder(...)
# Push to Hub
model.push_to_hub("my_new_model")
Note that this repository hosts for now only examples of sparse-encoder models from the SentenceTransformers package that can be easily reproduced with the different training script examples.
More details at Sparse Encoder > Training Examples for the examples scripts and Sparse Encoder > Pretrained Models for the community pre-trained models, that you can also found for some of them in the following collections.
-
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill
Feature Extraction • 0.1B • Updated • 1.83M • 13 -
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill
Feature Extraction • 0.1B • Updated • 976 • 6 -
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
Feature Extraction • 0.1B • Updated • 190 • 4 -
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini
Feature Extraction • 0.0B • Updated • 98 • 3
-
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill
Feature Extraction • 0.1B • Updated • 1.83M • 13 -
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill
Feature Extraction • 0.1B • Updated • 976 • 6 -
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
Feature Extraction • 0.1B • Updated • 190 • 4 -
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini
Feature Extraction • 0.0B • Updated • 98 • 3
models
11

sparse-encoder/splade-robbert-dutch-base-v1

sparse-encoder/example-inference-free-splade-distilbert-base-uncased-nq

sparse-encoder/example-splade-distilbert-base-uncased-msmarco-mrl

sparse-encoder/example-splade-co-condenser-marco-msmarco-mse-margin

sparse-encoder/example-splade-cocondenser-ensembledistil-sts

sparse-encoder/example-splade-cocondenser-ensembledistil-nli

sparse-encoder/example-splade-distilbert-base-uncased-quora-duplicates

sparse-encoder/example-splade-distilbert-base-uncased-nq

sparse-encoder/example-splade-distilbert-base-uncased-gooaq
