|
--- |
|
tags: |
|
- ColBERT |
|
- PyLate |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- multilingual |
|
- late-interaction |
|
- retrieval |
|
- bright |
|
- loss:Distillation |
|
pipeline_tag: sentence-similarity |
|
library_name: PyLate |
|
license: apache-2.0 |
|
base_model: |
|
- DavidGF/SauerkrautLM-Multi-ModernColBERT |
|
--- |
|
<img src="https://vago-solutions.ai/wp-content/uploads/2025/08/SauerkrautLM-Multi-Reason-ModernColBERT.png" width="500" height="auto"> |
|
|
|
# SauerkrautLM-Multi-Reason-ModernColBERT |
|
|
|
This model is the first publicly available Late Interaction retriever that integrates: |
|
|
|
Knowledge Distillation from strong synthetic data (200k samples generated with Qwen/Qwen3-32B-AWQ and scored by a high-performing reranker). |
|
LaserRMT compression, making it the first known ColBERT-style retriever to benefit from low-rank approximation. |
|
|
|
### 🎯 Core Features and Innovations: |
|
|
|
- **Next-Generation Knowledge Distillation**: By utilizing 200,000 synthetically generated, high-quality training examples (created with `Qwen/Qwen3-32B-AWQ` and scored by a state-of-the-art reranker), our model learns complex reasoning patterns from models **54× its size**. |
|
|
|
- **Groundbreaking LaserRMT Compression**: As the first known **ColBERT-style retriever to benefit from low-rank approximation** |
|
|
|
### 💪 David vs. Goliath: Small but Mighty |
|
|
|
With only **149 million parameters** – that's **less than 1/45th the size** of some competing models – SauerkrautLM achieves or exceeds the performance of: |
|
- Models with **over 7 billion parameters** (47× larger than ours) |
|
- Proprietary API-based solutions from major tech companies |
|
- Specialized reasoning models like ReasonIR-8B (54× larger) |
|
|
|
This exceptional efficiency makes it the ideal choice for production environments where resource consumption and latency are critical factors. |
|
|
|
|
|
|
|
## Model Overview |
|
|
|
**Model:** `VAGOsolutions/SauerkrautLM-Multi-Reason-ModernColBERT`\ |
|
**Base:** Fine-tuned from [VAGOsolutions/SauerkrautLM-Multi-ModernColBERT](https://huggingface.co/VAGOsolutions/SauerkrautLM-Multi-ModernColBERT) using knowledge distillation and LaserRMT\ |
|
**Architecture:** PyLate / ColBERT (Late Interaction)\ |
|
**Languages:** Multilingual (optimized for 7 European languages: German, English, Spanish, French, Italian, Dutch, Portuguese)\ |
|
**License:** Apache 2.0\ |
|
**Model Size:** 149M parameters |
|
**Efficiency Ratio:** Up to **54× smaller** than comparable performing models |
|
|
|
### Model Description |
|
- **Model Type:** PyLate model with innovative Late Interaction architecture |
|
- **Document Length:** 8192 tokens (32× longer than traditional BERT models) |
|
- **Query Length:** 256 tokens (optimized for complex, multi-part queries) |
|
- **Output Dimensionality:** 128 tokens (efficient vector representation) |
|
- **Similarity Function:** MaxSim (enables precise token-level matching) |
|
- **Training Loss:** Knowledge Distillation (PyLate) |
|
|
|
### Architecture |
|
|
|
``` |
|
ColBERT( |
|
(0): Transformer(ModernBertModel) |
|
(1): Dense(768 -> 128 dim, no bias) |
|
) |
|
``` |
|
|
|
## 🔬 Technical Innovations in Detail |
|
|
|
### Knowledge Distillation: The Student Surpassing the Master |
|
|
|
|
|
1. **Synthetic Data Generation**: 200,000 high-quality query-document pairs generated using the `Qwen/Qwen3-32B-AWQ` model (32 billion parameters) based on the [ReasonIR approach](https://huggingface.co/datasets/reasonir/reasonir-data) |
|
2. **Quality Assurance**: Each pair evaluated and filtered by a state-of-the-art reranker |
|
3. **Distillation Process**: The compact ModernColBERT model learns to replicate the ranking patterns of large models |
|
|
|
|
|
### LaserRMT: Revolution in Model Compression |
|
|
|
As the **first ColBERT-based retrieval model with Low-Rank approximation**, SauerkrautLM sets new standards: |
|
|
|
|
|
This technology combines the advantages of Late Interaction Retrieval (precise token-level matching) with the efficiency of compact models. |
|
|
|
--- |
|
|
|
## 🔬 Benchmarks: David vs. Goliath Performance |
|
|
|
Our comprehensive evaluation demonstrates that model size is not destiny. Despite being **47-54× smaller** than competing models, SauerkrautLM consistently delivers superior or comparable performance across challenging reasoning and multilingual retrieval tasks. |
|
|
|
### BRIGHT Benchmark (English, reasoning‑focused retrieval) |
|
|
|
The [BRIGHT benchmark](https://huggingface.co/datasets/xlangai/BRIGHT) is designed to evaluate **reasoning‑intensive retrieval**. All scores are nDCG\@10. SauerkrautLM (≈149 M parameters) is compared with dense and proprietary baselines as well as the original and re‑evaluated Reason‑ModernColBERT model. |
|
|
|
| Model / Metric | Biology | Earth | Economics | Psychology | Robotics | Stackoverflow | Sustainable | Leetcode | Pony | AoPS | Theorem‑Q | Theorem‑T | Mean StackEx | Mean coding | Mean theorem | Full Mean | |
|
| ---------------------------------------- | --------- | --------- | --------- | ---------- | -------- | ------------- | ----------- | --------- | --------- | --------- | --------- | --------- | ------------ | ----------- | ------------ | --------- | |
|
| **BM25** | 18.90 | 27.20 | 14.90 | 12.50 | 13.60 | 18.40 | 15.00 | 24.40 | 7.90 | 6.20 | 10.40 | 4.90 | 17.21 | 16.15 | 7.17 | 14.53 | |
|
| **< 1 B OS** | | | | | | | | | | | | | | | | | |
|
| BGE | 11.70 | 24.60 | 16.60 | 17.50 | 11.70 | 10.80 | 13.30 | 26.70 | 5.70 | 6.00 | 13.00 | 6.90 | 15.17 | 16.20 | 8.63 | 13.71 | |
|
| Inst‑L | 15.20 | 21.20 | 14.70 | 22.30 | 11.40 | 13.30 | 13.50 | 19.50 | 1.30 | 8.10 | 20.90 | 9.10 | 15.94 | 10.40 | 12.70 | 14.21 | |
|
| SBERT | 15.10 | 20.40 | 16.60 | 22.70 | 8.20 | 11.00 | 15.30 | 26.40 | 7.00 | 5.30 | 20.00 | 10.80 | 15.61 | 16.70 | 12.03 | 14.90 | |
|
| **> 1 B OS** | | | | | | | | | | | | | | | | | |
|
| E5 | 18.60 | 26.00 | 15.50 | 15.80 | 16.30 | 11.20 | 18.10 | 28.70 | 4.90 | 7.10 | 26.10 | 26.80 | 17.36 | 16.80 | 20.00 | 17.93 | |
|
| SFR | 19.10 | 26.70 | 17.80 | 19.00 | 16.30 | 14.40 | 19.20 | 27.40 | 2.00 | 7.40 | 24.30 | 26.00 | 18.93 | 14.70 | 19.23 | 18.30 | |
|
| Inst‑XL | 21.60 | 34.30 | 22.40 | 27.40 | 18.20 | 21.20 | 19.10 | 27.50 | 5.00 | 8.50 | 15.60 | 5.90 | 23.46 | 16.25 | 10.00 | 18.89 | |
|
| GritLM | 24.80 | 32.30 | 18.90 | 19.80 | 17.10 | 13.60 | 17.80 | 29.90 | 22.00 | 8.80 | 25.20 | 21.20 | 20.61 | 25.95 | 18.40 | 20.95 | |
|
| Qwen | 30.60 | 36.40 | 17.80 | 24.60 | 13.20 | 22.20 | 14.80 | 25.50 | 9.90 | 14.40 | 27.80 | 32.90 | 22.80 | 17.70 | 25.03 | **22.51** | |
|
| **Proprietary** | | | | | | | | | | | | | | | | | |
|
| Cohere | 18.70 | 28.40 | 20.40 | 21.60 | 16.30 | 18.30 | 17.60 | 26.80 | 1.90 | 6.30 | 15.70 | 7.20 | 20.19 | 14.35 | 9.73 | 16.60 | |
|
| OpenAI | 23.30 | 26.70 | 19.50 | 27.60 | 12.80 | 14.30 | 20.50 | 23.60 | 2.40 | 8.50 | 23.50 | 11.70 | 20.67 | 13.00 | 14.57 | 17.87 | |
|
| Voyage | 23.10 | 25.40 | 19.90 | 24.90 | 10.80 | 16.80 | 15.40 | 30.60 | 1.50 | 7.50 | 27.40 | 11.60 | 19.47 | 16.05 | 15.50 | 17.91 | |
|
| Google | 22.70 | 34.80 | 19.60 | 27.80 | 15.70 | 20.10 | 17.10 | 29.60 | 3.60 | 9.30 | 23.80 | 15.90 | 22.54 | 16.60 | 16.33 | 20.00 | |
|
| **ReasonIR data** | | | | | | | | | | | | | | | | | |
|
| ReasonIR‑8B | 26.20 | 31.40 | 23.30 | 30.00 | 18.00 | 23.90 | 20.50 | 35.00 | 10.50 | 14.70 | 31.90 | 27.20 | 24.76 | 22.75 | 24.60 | **24.38** | |
|
| Reason‑ModernColBERT (149 M) reported | 33.25 | 41.02 | 24.93 | 30.73 | 21.12 | 20.62 | 20.31 | 31.07 | 8.51 | 9.17 | 19.51 | 11.24 | 27.43 | 19.79 | 15.38 | **22.62** | |
|
| Reason‑ModernColBERT (149 M) our eval\*\* | 34.28 | 41.53 | 19.96 | 27.02 | 21.15 | 23.62 | 17.21 | 26.61 | 1.32 | 7.30 | 19.79 | 9.70 | 27.93 | 13.97 | 12.26 | 20.79 | |
|
| **SauerkrautLM Reasoning data** | | | | | | | | | | | | | | | | | |
|
| **SauerkrautLM-Multi-Reason-ModernColBERT (149 M)** | **36.92** | **45.53** | 19.47 | **27.04** | 19.35 | **25.31** | **20.78** | **29.74** | **12.54** | **10.52** | 14.62 | 7.65 | **28.94** | **21.14** | 10.93 | **22.45** | |
|
| SauerkrautLM‑Reason‑EuroColBERT (210 M) | 38.16 | 39.43 | 16.99 | 24.49 | 17.50 | 17.60 | 20.72 | 29.10 | 13.57 | 12.04 | 10.43 | 4.95 | 25.70 | 21.33 | 9.14 | 20.42 | |
|
| SauerkrautLM‑Reason‑Multi‑ColBERT (15 M) | 23.33 | 23.78 | 10.53 | 9.03 | 10.28 | 10.88 | 13.13 | 18.10 | 15.86 | 1.75 | 4.29 | 0.81 | 14.64 | 16.98 | 2.28 | 11.81 | |
|
|
|
**Evaluation note:** our re‑evaluation of Reason‑ModernColBERT uses the **same query‑length settings** from the original Lighton repo; the instructions for the originally reported scores are not public. |
|
|
|
|
|
#### ⚖️ Relative Efficiency |
|
|
|
With **149 M parameters**, SauerkrautLM surpasses several ≥7 B dense and proprietary retrievers on reasoning‑centric tasks. |
|
|
|
### BRIGHT Benchmark (German, reasoning‑focused retrieval) |
|
|
|
All scores are nDCG\@10. |
|
|
|
| Model / Metric | Biology | Earth | Economics | Psychology | Robotics | Stackoverflow | Sustainable | Leetcode | Pony | AoPS | Theorem‑Q | Theorem‑T | Mean StackEx | Mean coding | Mean theorem | Full Mean | |
|
| --------------------------------------------------- | --------- | --------- | --------- | ---------- | --------- | ------------- | ----------- | --------- | --------- | -------- | --------- | --------- | ------------ | ----------- | ------------ | --------- | |
|
| **SauerkrautLM‑Multi‑Reason‑ModernColBERT (149 M)** | 28.00 | **34.71** | **12.90** | 17.98 | **13.67** | **19.64** | 17.70 | 11.66 | **15.49** | 7.27 | 6.76 | 1.32 | **21.15** | 13.57 | 5.11 | **15.59** | |
|
| SauerkrautLM‑Reason‑EuroColBERT (210 M) | **31.09** | 31.48 | 11.95 | **18.39** | 11.25 | 14.43 | **20.26** | **25.67** | 12.15 | **9.58** | **8.15** | **2.76** | 19.76 | **18.91** | **6.83** | **16.43** | |
|
| SauerkrautLM‑Reason‑Multi‑ColBERT (15 M) | 15.37 | 20.11 | 7.36 | 7.07 | 4.24 | 4.71 | 7.67 | 0.77 | 6.31 | 3.81 | 0.76 | 0.00 | 9.81 | 3.54 | 1.52 | 6.51 | |
|
|
|
> **Observation:** Our 149 M flagship dominates most German domains (Biology, Earth, Sustainable, Mean StackExchange) while the 210 M EuroColBERT secures the **highest Full‑Mean (16.43)**, especially on coding and theorem sub‑tasks. |
|
|
|
--- |
|
|
|
### NanoBEIR Europe (multilingual retrieval) |
|
|
|
Average nDCG\@10 across the seven languages we evaluated: |
|
|
|
| Language | nDCG\@10 | |
|
| -------- | -------- | |
|
| de | 50.74 | |
|
| en | 67.32 | |
|
| es | 53.82 | |
|
| fr | 53.94 | |
|
| it | 53.19 | |
|
| nl | 51.49 | |
|
| pt | 53.07 | |
|
|
|
|
|
--- |
|
|
|
### Why SauerkrautLM Matters for Production |
|
|
|
- **Outperforms proprietary APIs**: beats Cohere, OpenAI, Voyage and Google on BRIGHT Full Mean while remaining fully open‑source under a permissive **Apache 2.0** license. |
|
- **Highest *****Mean StackExchange***** score** of all evaluated models (28.94) — crucial for reasoning‑heavy Q&A communities. |
|
- **Full parameter range**: from the tiny **15 M** Multi‑ColBERT (competitive with SBERT‑scale encoders) to the robust 210 M EuroColBERT variant. |
|
- **Matches or exceeds** models 10–50× larger (e.g. ReasonIR‑8B, GritLM, Qwen). |
|
- **Strong multilingual coverage** across seven European languages without language‑specific fine‑tuning. |
|
|
|
We translated both **BRIGHT** and **NanoBEIR** into seven European languages to rigorously evaluate multilingual retrieval capabilities. |
|
|
|
Below is a **scatter plot** that visualises model size (millions of parameters) against BRIGHT Full‑Mean nDCG\@10. SauerkrautLM models occupy the best trade‑off region—smallest models with top‑tier reasoning performance. |
|
<img src="https://vago-solutions.ai/wp-content/uploads/2025/08/Image-graph-2.jpeg"> |
|
|
|
|
|
### Real-World Impact |
|
|
|
The efficiency gains translate to tangible benefits: |
|
|
|
1. **Democratized AI**: Run state-of-the-art retrieval on consumer hardware |
|
2. **Edge Deployment**: Enable on-device search for privacy-sensitive applications |
|
3. **Massive Scale**: Index billions of documents at a fraction of traditional costs |
|
|
|
## 📈 Summary: The New Efficiency Paradigm |
|
|
|
SauerkrautLM-Multi-Reason-ModernColBERT represents a paradigm shift in retrieval model design. By combining cutting-edge knowledge distillation with innovative LaserRMT compression, we've created a model that: |
|
|
|
- **Delivers 99.7% of the performance** of 7B parameter models while being **47× smaller** |
|
- **Outperforms all major proprietary APIs** (OpenAI, Cohere, Google, Voyage) on reasoning tasks |
|
- **Runs on consumer hardware** (4GB GPU) instead of requiring enterprise infrastructure (80GB+) |
|
- **Reduces deployment costs by 50×** |
|
- **Achieves the highest StackExchange score** (28.94) of any evaluated model |
|
|
|
This breakthrough demonstrates that with the right techniques, compact models can match or exceed the capabilities of models orders of magnitude larger, democratizing access to state-of-the-art retrieval technology. |
|
|
|
--- |
|
|
|
# PyLate |
|
|
|
This is a [PyLate](https://github.com/lightonai/pylate) model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. |
|
|
|
|
|
## Usage |
|
First install the PyLate library: |
|
|
|
```bash |
|
pip install -U pylate |
|
``` |
|
|
|
### Retrieval |
|
|
|
PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval. |
|
|
|
#### Indexing documents |
|
|
|
First, load the ColBERT model and initialize the Voyager index, then encode and index your documents: |
|
|
|
```python |
|
from pylate import indexes, models, retrieve |
|
|
|
# Step 1: Load the ColBERT model |
|
model = models.ColBERT( |
|
model_name_or_path=pylate_model_id, |
|
) |
|
|
|
# Step 2: Initialize the Voyager index |
|
index = indexes.Voyager( |
|
index_folder="pylate-index", |
|
index_name="index", |
|
override=True, # This overwrites the existing index if any |
|
) |
|
|
|
# Step 3: Encode the documents |
|
documents_ids = ["1", "2", "3"] |
|
documents = ["document 1 text", "document 2 text", "document 3 text"] |
|
|
|
documents_embeddings = model.encode( |
|
documents, |
|
batch_size=32, |
|
is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries |
|
show_progress_bar=True, |
|
) |
|
|
|
# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids |
|
index.add_documents( |
|
documents_ids=documents_ids, |
|
documents_embeddings=documents_embeddings, |
|
) |
|
``` |
|
|
|
Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it: |
|
|
|
```python |
|
# To load an index, simply instantiate it with the correct folder/name and without overriding it |
|
index = indexes.Voyager( |
|
index_folder="pylate-index", |
|
index_name="index", |
|
) |
|
``` |
|
|
|
#### Retrieving top-k documents for queries |
|
|
|
Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. |
|
To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores: |
|
|
|
```python |
|
# Step 1: Initialize the ColBERT retriever |
|
retriever = retrieve.ColBERT(index=index) |
|
|
|
# Step 2: Encode the queries |
|
queries_embeddings = model.encode( |
|
["query for document 3", "query for document 1"], |
|
batch_size=32, |
|
is_query=True, # # Ensure that it is set to False to indicate that these are queries |
|
show_progress_bar=True, |
|
) |
|
|
|
# Step 3: Retrieve top-k documents |
|
scores = retriever.retrieve( |
|
queries_embeddings=queries_embeddings, |
|
k=10, # Retrieve the top 10 matches for each query |
|
) |
|
``` |
|
|
|
### Reranking |
|
If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank: |
|
|
|
```python |
|
from pylate import rank, models |
|
|
|
queries = [ |
|
"query A", |
|
"query B", |
|
] |
|
|
|
documents = [ |
|
["document A", "document B"], |
|
["document 1", "document C", "document B"], |
|
] |
|
|
|
documents_ids = [ |
|
[1, 2], |
|
[1, 3, 2], |
|
] |
|
|
|
model = models.ColBERT( |
|
model_name_or_path=pylate_model_id, |
|
) |
|
|
|
queries_embeddings = model.encode( |
|
queries, |
|
is_query=True, |
|
) |
|
|
|
documents_embeddings = model.encode( |
|
documents, |
|
is_query=False, |
|
) |
|
|
|
reranked_documents = rank.rerank( |
|
documents_ids=documents_ids, |
|
queries_embeddings=queries_embeddings, |
|
documents_embeddings=documents_embeddings, |
|
) |
|
``` |
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### SauerkrautLM‑Multi‑Reason‑ModernColBERT |
|
|
|
```bibtex |
|
@misc{SauerkrautLM-Multi-Reason-ModernColBERT, |
|
title={SauerkrautLM-Multi-Reason-ModernColBERT}, |
|
author={David Golchinfar}, |
|
url={https://huggingface.co/VAGOsolutions/SauerkrautLM-Multi-Reason-ModernColBERT}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
#### GTE‑ModernColBERT |
|
|
|
```bibtex |
|
@misc{GTE-ModernColBERT, |
|
title={GTE-ModernColBERT}, |
|
author={Chaffin, Antoine}, |
|
url={https://huggingface.co/lightonai/GTE-ModernColBERT-v1}, |
|
year={2025} |
|
} |
|
``` |
|
|
|
#### Sentence Transformers |
|
|
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = {Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks}, |
|
author = {Reimers, Nils and Gurevych, Iryna}, |
|
booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing}, |
|
month = {11}, |
|
year = {2019}, |
|
publisher = {Association for Computational Linguistics}, |
|
url = {https://arxiv.org/abs/1908.10084} |
|
} |
|
``` |
|
|
|
#### PyLate |
|
|
|
```bibtex |
|
@misc{PyLate, |
|
title={PyLate: Flexible Training and Retrieval for Late Interaction Models}, |
|
author={Chaffin, Antoine and Sourty, Raphaël}, |
|
url={https://github.com/lightonai/pylate}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
|
|
## Acknowledgements |
|
We thank Antoine Chaffin (LightOn AI) for helpful discussions and for clarifying evaluation settings for Reason‑ModernColBERT, and the PyLate team for providing the training framework that made this work possible. |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |