Hallucinations Can Improve Large Language Models in Drug Discovery
Abstract
Concerns about hallucinations in Large Language Models (LLMs) have been raised by researchers, yet their potential in areas where creativity is vital, such as drug discovery, merits exploration. In this paper, we come up with the hypothesis that hallucinations can improve LLMs in drug discovery. To verify this hypothesis, we use LLMs to describe the SMILES string of molecules in natural language and then incorporate these descriptions as part of the prompt to address specific tasks in drug discovery. Evaluated on seven LLMs and five classification tasks, our findings confirm the hypothesis: LLMs can achieve better performance with text containing hallucinations. Notably, Llama-3.1-8B achieves an 18.35% gain in ROC-AUC compared to the baseline without hallucination. Furthermore, hallucinations generated by GPT-4o provide the most consistent improvements across models. Additionally, we conduct empirical analyses and a case study to investigate key factors affecting performance and the underlying reasons. Our research sheds light on the potential use of hallucinations for LLMs and offers new perspectives for future research leveraging LLMs in drug discovery.
Community
Is hallucination always bad? 🤔
We tested 7 large language models (LLMs) and discovered that hallucinations can actually enhance LLM performance in drug discovery! 💊✨
Check out our findings to learn how hallucinations might not always be a drawback—but an advantage in certain applications! 🚀
Contrary to popular belief, hallucinations will exist-- rather structural hallucinations will always persist. I call it the gradient mismatch between the formal and informal world. This is the first line of research I've seen thats recognize what hallucinations are in actuality, computational imagination. The same as human imagination, it must be filtered and "grounded". Great work guys. These are important insights.
Thank you!
Excellent findings!
It's also a pleasure to read the paper, very well written 😀
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech (2025)
- Voice Biomarker Analysis and Automated Severity Classification of Dysarthric Speech in a Multilingual Context (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Did you mean the Mistral 8b? Throughout the paper you refer to a Ministral model, which I think might be a typo
Thanks for checking our paper! Just to clarify, “Ministral” is not a typo. The full name is mistralai/Ministral-8B-Instruct-2410: https://huggingface.co/mistralai/Ministral-8B-Instruct-2410.
We've actually listed all the model names and links in Appendix A, Table 2. :)
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper