LettuceDetect: French Hallucination Detection Model

LettuceDetect Logo

Model Name: KRLabsOrg/lettucedect-610m-eurobert-fr-v1 Organization: KRLabsOrg
Github: https://github.com/KRLabsOrg/LettuceDetect

Overview

LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on EuroBERT-610M, which has been specifically chosen for its extended context support (up to 8192 tokens) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.

This is our French large model utilizing EuroBERT-610M architecture

Model Details

Architecture: EuroBERT-610M with extended context support (up to 8192 tokens)
Task: Token Classification / Hallucination Detection
Training Dataset: RagTruth-FR (translated from the original RAGTruth dataset)
Language: French

How It Works

The model is trained to identify tokens in the French answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated.

Usage

Installation

Install the 'lettucedetect' repository

pip install lettucedetect

Using the model

from lettucedetect.models.inference import HallucinationDetector

# For a transformer-based approach:
detector = HallucinationDetector(
    method="transformer", 
    model_path="KRLabsOrg/lettucedect-610m-eurobert-fr-v1",
    lang="fr",
    trust_remote_code=True
)

contexts = ["La France est un pays d'Europe. La capitale de la France est Paris. La population de la France est de 67 millions."]
question = "Quelle est la capitale de la France? Quelle est la population de la France?"
answer = "La capitale de la France est Paris. La population de la France est de 69 millions."

# Get span-level predictions indicating which parts of the answer are considered hallucinated.
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Prédictions:", predictions)

# Prédictions: [{'start': 36, 'end': 81, 'confidence': 0.9726481437683105, 'text': ' La population de la France est de 69 millions.'}]

Performance

Results on Translated RAGTruth-FR

We evaluate our French models on translated versions of the RAGTruth dataset. The EuroBERT-610M French model achieves an F1 score of 73.13%, significantly outperforming prompt-based methods like GPT-4.1-mini (62.37%) with a substantial improvement of +10.76 percentage points.

For detailed performance metrics, see the table below:

Language	Model	Precision (%)	Recall (%)	F1 (%)	GPT-4.1-mini F1 (%)	Δ F1 (%)
French	EuroBERT-210M	58.86	74.34	65.70	62.37	+3.33
French	EuroBERT-610M	67.08	80.38	73.13	62.37	+10.76

The 610M model offers the best performance with over 7 percentage points improvement in F1 score compared to the 210M model. It particularly excels in recall, detecting more hallucinations with an 80.38% recall rate.

Citing

If you use the model or the tool, please cite the following paper:

@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}