LettuceDetect: Spanish Hallucination Detection Model

LettuceDetect Logo

Model Name: lettucedect-210m-eurobert-es-v1 Organization: KRLabsOrg
Github: https://github.com/KRLabsOrg/LettuceDetect

Overview

LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on EuroBERT-210M, which has been specifically chosen for its extended context support (up to 8192 tokens) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.

This is our Spanish base model utilizing EuroBERT-210M architecture

Model Details

  • Architecture: EuroBERT-210M with extended context support (up to 8192 tokens)
  • Task: Token Classification / Hallucination Detection
  • Training Dataset: RagTruth-ES (translated from the original RAGTruth dataset)
  • Language: Spanish

How It Works

The model is trained to identify tokens in the Spanish answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated.

Usage

Installation

Install the 'lettucedetect' repository

pip install lettucedetect

Using the model

from lettucedetect.models.inference import HallucinationDetector

# For a transformer-based approach:
detector = HallucinationDetector(
    method="transformer", 
    model_path="KRLabsOrg/lettucedect-210m-eurobert-es-v1",
    lang="es",
    trust_remote_code=True
)

contexts = ["Francia es un país de Europa. La capital de Francia es París. La población de Francia es de 67 millones."]
question = "¿Cuál es la capital de Francia? ¿Cuál es la población de Francia?"
answer = "La capital de Francia es París. La población de Francia es de 69 millones."

# Get span-level predictions indicating which parts of the answer are considered hallucinated.
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("Predicciones:", predictions)

# Predicciones: [{'start': 33, 'end': 76, 'confidence': 0.9215637326240539, 'text': ' La población de Francia es de 69 millones.'}]

Performance

Results on Translated RAGTruth-ES

We evaluate our Spanish models on translated versions of the RAGTruth dataset. The EuroBERT-210M Spanish model achieves an F1 score of 71.38%, outperforming prompt-based methods like GPT-4.1-mini (62.40%) with an improvement of +8.98 percentage points.

For detailed performance metrics, see the table below:

Language Model Precision (%) Recall (%) F1 (%) GPT-4.1-mini F1 (%) Δ F1 (%)
Spanish EuroBERT-210M 69.48 73.38 71.38 62.40 +8.98
Spanish EuroBERT-610M 76.32 70.41 73.25 62.40 +10.85

While the 610M variant achieves higher F1 score, the 210M model offers a good balance between accuracy and computational efficiency, processing examples approximately 3× faster. It also shows particularly strong recall performance.

Citing

If you use the model or the tool, please cite the following paper:

@misc{Kovacs:2025,
      title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, 
      author={Ádám Kovács and Gábor Recski},
      year={2025},
      eprint={2502.17125},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.17125}, 
}
Downloads last month
0
Safetensors
Model size
212M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KRLabsOrg/lettucedect-210m-eurobert-es-v1

Finetuned
(33)
this model

Dataset used to train KRLabsOrg/lettucedect-210m-eurobert-es-v1

Space using KRLabsOrg/lettucedect-210m-eurobert-es-v1 1

Collection including KRLabsOrg/lettucedect-210m-eurobert-es-v1