SauerkrautLM-Multi-ColBERT-15m

This model is an ultra-compact Late Interaction retriever that leverages:

Pretraining with over 8.2 billion tokens in a two-phase approach (4.6B multilingual + 3.6B English tokens). Knowledge Distillation from state-of-the-art reranker models during pretraining. Extreme compression with just 15M parameters – optimized for edge deployment and resource-constrained environments.

🎯 Core Features and Innovations:

  • Two-Phase Pretraining Strategy:

    • Phase 1: 4,641,714,000 tokens of multilingual data covering 7 European languages
    • Phase 2: 3,620,166,317 tokens of high-quality English data for enhanced performance
    • Total: Over 8.2 billion tokens of pretrained knowledge
  • Advanced Knowledge Distillation: Learning from powerful reranker models throughout the pretraining process

  • Ultra-Efficient Architecture: With extreme parameter compression to just 15M, enabling deployment on edge devices and mobile platforms

💪 The Foundation Model: Tiny but Mighty

With 15 million parameters – that's less than 1/500th the size of some competing models – SauerkrautLM-Multi-ColBERT-15m represents the extreme frontier of efficient pretraining:

  • 500× smaller than 7B+ parameter models
  • 10× smaller than typical BERT models
  • Comparable to SBERT-scale encoders in size
  • Trained on 8.2 billion tokens

This ultra-compact architecture combined with pretraining creates a powerful foundation for downstream applications.

Model Overview

Model: VAGOsolutions/SauerkrautLM-Multi-ColBERT-15m
Type: Pretrained foundation model for Late Interaction retrieval
Architecture: PyLate / ColBERT (Late Interaction)
Languages: Multilingual (optimized for 7 European languages: German, English, Spanish, French, Italian, Dutch, Portuguese)
License: Apache 2.0
Model Size: 15M parameters Training Data: 8.2B tokens (4.6B multilingual + 3.6B English)

Model Description

  • Model Type: PyLate model with innovative Late Interaction architecture
  • Document Length: 2048 tokens (8× longer than traditional BERT models)
  • Query Length: 256 tokens (optimized for complex, multi-part queries)
  • Output Dimensionality: 128 tokens (efficient vector representation)
  • Similarity Function: MaxSim (enables precise token-level matching)
  • Training Method: Two-phase knowledge distillation from reranker models

Architecture

ColBERT(
  (0): Transformer(CompressedModernBertModel)
  (1): Dense(384 -> 128 dim, no bias)
)

🔬 Technical Innovations in Detail

Two-Phase Pretraining: Building Multilingual then English Excellence

Our 15M parameter model undergoes sophisticated two-phase pretraining:

Phase 1: Multilingual Foundation (4.6B tokens)

  • Data Volume: 4,641,714,000 tokens across 7 European languages
  • Languages: Balanced representation of German, English, Spanish, French, Italian, Dutch, and Portuguese
  • Objective: Build robust multilingual understanding and cross-lingual capabilities

Phase 2: English Enhancement (3.6B tokens)

  • Data Volume: 3,620,166,317 high-quality English tokens
  • Focus: Enhance English performance while maintaining multilingual capabilities
  • Result: State-of-the-art English retrieval without sacrificing other languages

Knowledge Distillation Throughout Pretraining

Unlike typical pretraining, we leverage continuous knowledge distillation:

  • Teacher Models: State-of-the-art reranker models guide the learning process
  • Distillation Objective: Learn optimal ranking patterns from the ground up
  • Efficiency Gain: Achieves superior performance with 500× fewer parameters

Ultra-Compact Design

SauerkrautLM-Multi-ColBERT-15m achieves extreme efficiency through:

  • Ultra-Compact Architecture (~15 M params)
  • Deep-yet-slim BERT — 10 layers, hidden_size = 288
  • Many heads — 12 attention heads (24-dim each) for fine-grained reasoning in a narrow model
  • Edge-ready — small footprint optimized for mobile and IoT deployment

This architecture enables Late Interaction Retrieval on devices previously unable to run neural search models.


🔬 Benchmarks: Foundation Model Performance

Despite its microscopic size, SauerkrautLM-Multi-ColBERT-15m delivers impressive multilingual retrieval performance, demonstrating the power of massive pretraining at extreme compression ratios.

NanoBEIR Europe (multilingual retrieval)

Average nDCG@10 across seven European languages, showing strong multilingual capabilities from our two-phase pretraining:

Language nDCG@10 Performance Notes
en 51.09 Enhanced by Phase 2 English pretraining
de 32.91 Strong german language performance
es 35.34 Robust spanish language capabilities
fr 33.84 Consistent cross-lingual transfer
it 34.26 Balanced multilingual representation
nl 32.60 Effective on lower-resource languages
pt 33.81 Maintains quality across language families

Key Observations:

  • English Excellence: The two-phase training strategy yields exceptional English performance (51.09) while maintaining strong multilingual capabilities
  • Balanced Multilingual: Non-English languages show consistent performance (32-35 nDCG@10), demonstrating effective multilingual pretraining
  • Token Efficiency: With 8.2B training tokens on just 15M parameters, the model achieves remarkable data efficiency

Why SauerkrautLM-Multi-ColBERT-15m Matters as a Foundation Model

  • Unprecedented Efficiency: 8.2 billion tokens of knowledge compressed into just 15M parameters
  • True Multilingual Foundation: Native support for 7 European languages from pretraining
  • Ready for Fine-tuning: Ideal base model for task-specific adaptations
  • Edge-First Design: Deploy powerful neural retrieval on smartphones, IoT devices, and embedded systems
  • Cost-Effective Scaling: Train specialized models without massive compute requirements

This pretrained model serves as an ideal foundation for:

  • Domain-specific retrieval systems
  • Multilingual search applications
  • Resource-constrained deployments
  • Rapid prototyping and experimentation

Real-World Applications

The combination of massive pretraining and extreme efficiency enables:

  1. Mobile-First Search: Deploy directly on smartphones without cloud dependencies
  2. Multilingual Products: Single model serving users across 7 languages
  3. Privacy-Preserving Search: On-device retrieval for sensitive documents
  4. IoT and Edge AI: Neural search on resource-constrained devices
  5. Rapid Deployment: Fine-tune for specific domains in hours, not days

📈 Summary: The Power of Efficient Pretraining

SauerkrautLM-Multi-ColBERT-15m demonstrates that strong pretraining can be achieved at extreme compression ratios. By training on 8.2 billion tokens across two phases, we've created a model that:

  • Packs unprecedented knowledge into just 15M parameters (547 tokens per parameter!)
  • Delivers strong multilingual performance across 7 European languages
  • Achieves exceptional English retrieval (51.09 nDCG@10) through targeted enhancement
  • Enables new deployment scenarios on edge devices and mobile platforms
  • Provides an ideal foundation for task-specific fine-tuning

This model proves that thoughtful pretraining strategies can create powerful, efficient foundation models for the age of edge computing.


PyLate

This is a PyLate model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

Indexing documents

First, load the ColBERT model and initialize the Voyager index, then encode and index your documents:

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path="VAGOsolutions/SauerkrautLM-Multi-ColBERT-15m",
)

# Step 2: Initialize the Voyager index
index = indexes.Voyager(
    index_folder="pylate-index",
    index_name="index",
    override=True,  # This overwrites the existing index if any
)

# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["document 1 text", "document 2 text", "document 3 text"]

documents_embeddings = model.encode(
    documents,
    batch_size=32,
    is_query=False,  # Ensure that it is set to False to indicate that these are documents, not queries
    show_progress_bar=True,
)

# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
    documents_ids=documents_ids,
    documents_embeddings=documents_embeddings,
)

Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:

# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.Voyager(
    index_folder="pylate-index",
    index_name="index",
)

Retrieving top-k documents for queries

Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:

# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)

# Step 2: Encode the queries
queries_embeddings = model.encode(
    ["query for document 3", "query for document 1"],
    batch_size=32,
    is_query=True,  #  # Ensure that it is set to False to indicate that these are queries
    show_progress_bar=True,
)

# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
    queries_embeddings=queries_embeddings,
    k=10,  # Retrieve the top 10 matches for each query
)

Reranking

If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:

from pylate import rank, models

queries = [
    "query A",
    "query B",
]

documents = [
    ["document A", "document B"],
    ["document 1", "document C", "document B"],
]

documents_ids = [
    [1, 2],
    [1, 3, 2],
]

model = models.ColBERT(
    model_name_or_path="VAGOsolutions/SauerkrautLM-Multi-ColBERT-15m",
)

queries_embeddings = model.encode(
    queries,
    is_query=True,
)

documents_embeddings = model.encode(
    documents,
    is_query=False,
)

reranked_documents = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)

Citation

BibTeX

SauerkrautLM‑Multi‑ColBERT-15m

@misc{SauerkrautLM-Multi-ColBERT-15m,
  title={SauerkrautLM-Multi-ColBERT-15m},
  author={David Golchinfar},
  url={https://huggingface.co/VAGOsolutions/SauerkrautLM-Multi-ColBERT-15m},
  year={2025}
}

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
  title = {Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks},
  author = {Reimers, Nils and Gurevych, Iryna},
  booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  month = {11},
  year = {2019},
  publisher = {Association for Computational Linguistics},
  url = {https://arxiv.org/abs/1908.10084}
}

PyLate

@misc{PyLate,
  title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
  author={Chaffin, Antoine and Sourty, Raphaël},
  url={https://github.com/lightonai/pylate},
  year={2024}
}

Acknowledgements

We thank the PyLate team for providing the training framework that made this work possible.

Downloads last month
18
Safetensors
Model size
15.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VAGOsolutions/SauerkrautLM-Multi-ColBERT-15m

Finetunes
1 model

Collection including VAGOsolutions/SauerkrautLM-Multi-ColBERT-15m