LettuceDetect: German Hallucination Detection Model

LettuceDetect Logo

Model Name: lettucedect-210m-eurobert-de-v1 Organization: KRLabsOrg
Github: https://github.com/KRLabsOrg/LettuceDetect

Overview

LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on EuroBERT-210M, which has been specifically chosen for its extended context support (up to 8192 tokens) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.

This is our German base model utilizing EuroBERT-210M architecture

Model Details

Architecture: EuroBERT-210M with extended context support (up to 8192 tokens)
Task: Token Classification / Hallucination Detection
Training Dataset: RagTruth-DE (translated from the original RAGTruth dataset)
Language: German

How It Works

The model is trained to identify tokens in the German answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated.

Usage

Installation

Install the 'lettucedetect' repository

pip install lettucedetect

Using the model

from lettucedetect.models.inference import HallucinationDetector

# For a transformer-based approach:
detector = HallucinationDetector(
    method="transformer", 
    model_path="KRLabsOrg/lettucedect-210m-eurobert-de-v1",
    lang="de",
    trust_remote_code=True
)

contexts = ["Frankreich ist ein Land in Europa. Die Hauptstadt von Frankreich ist Paris. Die Bevölkerung Frankreichs beträgt 67 Millionen."]
question = "Was ist die Hauptstadt von Frankreich? Wie groß ist die Bevölkerung Frankreichs?"
answer = "Die Hauptstadt von Frankreich ist Paris. Die Bevölkerung Frankreichs beträgt 69 Millionen."

# Get span-level predictions indicating which parts of the answer are considered hallucinated.
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predictions:", predictions)

# Predictions: [{'start': 41, 'end': 88, 'confidence': 0.9647353219985962, 'text': ' Die Bevölkerung Frankreichs beträgt 69 Millionen.'}]

Performance

Results on Translated RAGTruth-DE

We evaluate our German models on translated versions of the RAGTruth dataset. The EuroBERT-210M German model achieves an F1 score of 66.70%, significantly outperforming prompt-based methods like GPT-4.1-mini (60.91%).

For detailed performance metrics across different languages, see the table below:

Language	Model	Precision (%)	Recall (%)	F1 (%)	GPT-4.1-mini F1 (%)	Δ F1 (%)
German	EuroBERT-210M	66.70	66.70	66.70	60.91	+5.79
German	EuroBERT-610M	77.04	72.96	74.95	60.91	+14.04

While the 610M variant achieves higher performance, the 210M model offers a good balance between accuracy and computational efficiency, processing examples approximately 3× faster.

Manual Validation

We performed additional validation on a manually reviewed set of 300 examples covering all task types from the data (QA, summarization, data-to-text). The EuroBERT-210M German model maintained strong performance with an F1 score of 68.32% on this curated dataset.

Model	Precision (%)	Recall (%)	F1 (%)
EuroBERT-210M	68.32	68.32	68.32

Citing

If you use the model or the tool, please cite the following paper:

@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}

KRLabsOrg
/

lettucedect-210m-eurobert-de-v1

LettuceDetect: German Hallucination Detection Model

Overview

Model Details

How It Works

Usage

Installation

Using the model

Performance

Manual Validation

Citing

Model tree for KRLabsOrg/lettucedect-210m-eurobert-de-v1

Dataset used to train KRLabsOrg/lettucedect-210m-eurobert-de-v1

Space using KRLabsOrg/lettucedect-210m-eurobert-de-v1 1

Collection including KRLabsOrg/lettucedect-210m-eurobert-de-v1

Multilingual Hallucination Detection