--- license: mit language: - es base_model: - EuroBERT/EuroBERT-610m pipeline_tag: token-classification tags: - token classification - hallucination detection - transformers - question answer datasets: - KRLabsOrg/ragtruth-es-translated --- # LettuceDetect: Spanish Hallucination Detection Model

LettuceDetect Logo

**Model Name:** lettucedetect-610m-eurobert-es-v1 **Organization:** KRLabsOrg **Github:** https://github.com/KRLabsOrg/LettuceDetect ## Overview LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on **EuroBERT-610M**, which has been specifically chosen for its extended context support (up to **8192 tokens**) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context. **This is our Spanish large model utilizing EuroBERT-610M architecture** ## Model Details - **Architecture:** EuroBERT-610M with extended context support (up to 8192 tokens) - **Task:** Token Classification / Hallucination Detection - **Training Dataset:** RagTruth-ES (translated from the original RAGTruth dataset) - **Language:** Spanish ## How It Works The model is trained to identify tokens in the Spanish answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated. ## Usage ### Installation Install the 'lettucedetect' repository ```bash pip install lettucedetect ``` ### Using the model ```python from lettucedetect.models.inference import HallucinationDetector # For a transformer-based approach: detector = HallucinationDetector( method="transformer", model_path="KRLabsOrg/lettucedect-610m-eurobert-es-v1", lang="es", trust_remote_code=True ) contexts = ["Francia es un país de Europa. La capital de Francia es París. La población de Francia es de 67 millones."] question = "¿Cuál es la capital de Francia? ¿Cuál es la población de Francia?" answer = "La capital de Francia es París. La población de Francia es de 69 millones." # Get span-level predictions indicating which parts of the answer are considered hallucinated. predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans") print("Predicciones:", predictions) # Predicciones: [{'start': 33, 'end': 76, 'confidence': 0.9598274827003479, 'text': ' La población de Francia es de 69 millones.'}] ``` ## Performance **Results on Translated RAGTruth-ES** We evaluate our Spanish models on translated versions of the [RAGTruth](https://aclanthology.org/2024.acl-long.585/) dataset. The EuroBERT-610M Spanish model achieves an F1 score of 73.25%, significantly outperforming prompt-based methods like GPT-4.1-mini (62.40%) with a substantial improvement of +10.85 percentage points. For detailed performance metrics, see the table below: | Language | Model | Precision (%) | Recall (%) | F1 (%) | GPT-4.1-mini F1 (%) | Δ F1 (%) | |----------|-----------------|---------------|------------|--------|---------------------|----------| | Spanish | EuroBERT-210M | 69.48 | 73.38 | 71.38 | 62.40 | +8.98 | | Spanish | EuroBERT-610M | **76.32** | 70.41 | **73.25** | 62.40 | **+10.85** | The 610M model offers the best overall performance with nearly 2 percentage points improvement in F1 score compared to the 210M model. It particularly excels in precision, with a 76.32% precision rate that significantly reduces false positive hallucination detections. ## Citing If you use the model or the tool, please cite the following paper: ```bibtex @misc{Kovacs:2025, title={LettuceDetect: A Hallucination Detection Framework for RAG Applications}, author={Ádám Kovács and Gábor Recski}, year={2025}, eprint={2502.17125}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.17125}, } ```