Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection
Abstract
The use of carefully crafted hallucinations in a curriculum learning approach within the DPO alignment procedure significantly enhances LLMs' hallucination detection abilities.
Aligning large language models (LLMs) to accurately detect hallucinations remains a significant challenge due to the sophisticated nature of hallucinated text. Recognizing that hallucinated samples typically exhibit higher deceptive quality than traditional negative samples, we use these carefully engineered hallucinations as negative examples in the DPO alignment procedure. Our method incorporates a curriculum learning strategy, gradually transitioning the training from easier samples, identified based on the greatest reduction in probability scores from independent fact checking models, to progressively harder ones. This structured difficulty scaling ensures stable and incremental learning. Experimental evaluation demonstrates that our HaluCheck models, trained with curriculum DPO approach and high quality negative samples, significantly improves model performance across various metrics, achieving improvements of upto 24% on difficult benchmarks like MedHallu and HaluEval. Additionally, HaluCheck models demonstrate robustness in zero-shot settings, significantly outperforming larger state-of-the-art models across various benchmarks.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Osiris: A Lightweight Open-Source Hallucination Detection System (2025)
- Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models (2025)
- Evaluating Evaluation Metrics - The Mirage of Hallucination Detection (2025)
- Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards (2025)
- Finetune-RAG: Fine-Tuning Language Models to Resist Hallucination in Retrieval-Augmented Generation (2025)
- Robust Hallucination Detection in LLMs via Adaptive Token Selection (2025)
- Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper