Yosef Worku Alemneh

rasyosef

AI & ML interests

Pretraining, Supervised Fine Tuning, Direct Preference Optimization, Retrieval Augmented Generation (RAG), Function Calling

Recent Activity

updated a model about 18 hours ago

rasyosef/splade-small

updated a collection about 19 hours ago

SPLADE-Tiny-MSMARCO

published a model about 19 hours ago

rasyosef/splade-small

View all activity

Organizations

updated a model about 18 hours ago

rasyosef/splade-small

Feature Extraction • 0.0B • Updated about 18 hours ago

updated a collection about 19 hours ago

SPLADE-Tiny-MSMARCO

Collection

SPLADE sparse retrieval models based on BERT-Tiny (4M) and BERT-Mini (11M) distilled from a Cross-Encoder on the MSMARCO dataset • 3 items • Updated about 19 hours ago

published a model about 19 hours ago

rasyosef/splade-small

Feature Extraction • 0.0B • Updated about 18 hours ago

updated a model 15 days ago

rasyosef/Llama-3.2-400M-Amharic

Text Generation • 0.4B • Updated 15 days ago • 101 • 3

updated 2 models 20 days ago

rasyosef/splade-mini

Feature Extraction • 0.0B • Updated 20 days ago • 246 • 2

rasyosef/splade-tiny

Feature Extraction • 0.0B • Updated 20 days ago • 190 • 2

commented on Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 22 days ago

Here's a python package that you can use to index, query, and rank your documents with SPLADE models from sentence-transformers.

splade-index: https://github.com/rasyosef/splade-index

SPLADE-Index⚡

SPLADE-Index is an ultrafast index for SPLADE sparse retrieval models implemented in pure Python and powered by Scipy sparse matrices. It is built on top of the BM25s library.

Installation

You can install splade-index with pip:

pip install splade-index

Recommended (but optional) dependencies:

# To speed up the top-k selection process, you can install `jax`
pip install "jax[cpu]"

Quickstart

Here is a simple example of how to use splade-index:

from sentence_transformers import SparseEncoder
from splade_index import SPLADE

# Download a SPLADE model from the 🤗 Hub
model = SparseEncoder("rasyosef/splade-tiny")

# Create your corpus here
corpus = [
    "a cat is a feline and likes to purr",
    "a dog is the human's best friend and loves to play",
    "a bird is a beautiful animal that can fly",
    "a fish is a creature that lives in water and swims",
]

# Create the SPLADE retriever and index the corpus
retriever = SPLADE()
retriever.index(model=model, documents=corpus)

# Query the corpus
queries = ["does the fish purr like a cat?"]

# Get top-k results as a tuple of (doc ids, documents, scores). All three are arrays of shape (n_queries, k).
results = retriever.retrieve(queries, k=2)
doc_ids, result_docs, scores = results.doc_ids, results.documents, results.scores

for i in range(doc_ids.shape[1]):
    doc_id, doc, score = doc_ids[0, i], result_docs[0, i], scores[0, i]
    print(f"Rank {i+1} (score: {score:.2f}) (doc_id: {doc_id}): {doc}")

# You can save the index to a directory
retriever.save("animal_index_splade")

# ...and load it when you need it
import splade_index

reloaded_retriever = splade_index.SPLADE.load("animal_index_splade", model=model)