LettuceDetect: Chinese Hallucination Detection Model
Model Name: lettucedect-610m-eurobert-cn-v1
Organization: KRLabsOrg
Github: https://github.com/KRLabsOrg/LettuceDetect
Overview
LettuceDetect is a transformer-based model for hallucination detection on context and answer pairs, designed for multilingual Retrieval-Augmented Generation (RAG) applications. This model is built on EuroBERT-610M, which has been specifically chosen for its extended context support (up to 8192 tokens) and strong multilingual capabilities. This long-context capability is critical for tasks where detailed and extensive documents need to be processed to accurately determine if an answer is supported by the provided context.
This is our Chinese large model utilizing EuroBERT-610M architecture
Model Details
- Architecture: EuroBERT-610M with extended context support (up to 8192 tokens)
- Task: Token Classification / Hallucination Detection
- Training Dataset: RagTruth-CN (translated from the original RAGTruth dataset)
- Language: Chinese
How It Works
The model is trained to identify tokens in the Chinese answer text that are not supported by the given context. During inference, the model returns token-level predictions which are then aggregated into spans. This allows users to see exactly which parts of the answer are considered hallucinated.
Usage
Installation
Install the 'lettucedetect' repository
pip install lettucedetect
Using the model
from lettucedetect.models.inference import HallucinationDetector
# For a transformer-based approach:
detector = HallucinationDetector(
method="transformer",
model_path="KRLabsOrg/lettucedect-610m-eurobert-cn-v1",
lang="cn",
trust_remote_code=True
)
contexts = ["长城是中国古代的伟大防御工程,全长超过21,000公里。它的建造始于公元前7世纪,历经多个朝代。"]
question = "长城有多长?它是什么时候建造的?"
answer = "长城全长约50,000公里。它的建造始于公元前3世纪,仅在秦朝时期。"
# Get span-level predictions indicating which parts of the answer are considered hallucinated.
predictions = detector.predict(context=contexts, question=question, answer=answer, output_format="spans")
print("预测:", predictions)
# 预测: [{'start': 4, 'end': 16, 'confidence': 0.89, 'text': '全长约50,000公里'}, {'start': 20, 'end': 41, 'confidence': 0.91, 'text': '建造始于公元前3世纪,仅在秦朝时期'}]
Performance
Results on Translated RAGTruth-CN
We evaluate our Chinese models on translated versions of the RAGTruth dataset. The EuroBERT-610M Chinese model achieves an F1 score of 77.27%, significantly outperforming prompt-based methods like GPT-4.1-mini (60.23%).
For detailed performance metrics across different languages, see the table below:
Language | Model | Precision (%) | Recall (%) | F1 (%) | GPT-4.1-mini F1 (%) | Δ F1 (%) |
---|---|---|---|---|---|---|
Chinese | EuroBERT-210M | 75.46 | 73.38 | 74.41 | 60.23 | +14.18 |
Chinese | EuroBERT-610M | 78.90 | 75.72 | 77.27 | 60.23 | +17.04 |
The 610M variant achieves significantly higher performance, with a +17.04% improvement over the GPT-4.1-mini baseline - the largest improvement among all languages. While this model requires more computational resources than the 210M variant, it delivers superior hallucination detection capability.
Citing
If you use the model or the tool, please cite the following paper:
@misc{Kovacs:2025,
title={LettuceDetect: A Hallucination Detection Framework for RAG Applications},
author={Ádám Kovács and Gábor Recski},
year={2025},
eprint={2502.17125},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.17125},
}
- Downloads last month
- 0
Model tree for KRLabsOrg/lettucedect-610m-eurobert-cn-v1
Base model
EuroBERT/EuroBERT-610m