arxiv:2501.13824

Hallucinations Can Improve Large Language Models in Drug Discovery

Published on Jan 23

· Submitted by

shuzyuan on Jan 24

Upvote

Authors:

Shuzhou Yuan ,

Abstract

Concerns about hallucinations in Large Language Models (LLMs) have been raised by researchers, yet their potential in areas where creativity is vital, such as drug discovery, merits exploration. In this paper, we come up with the hypothesis that hallucinations can improve LLMs in drug discovery. To verify this hypothesis, we use LLMs to describe the SMILES string of molecules in natural language and then incorporate these descriptions as part of the prompt to address specific tasks in drug discovery. Evaluated on seven LLMs and five classification tasks, our findings confirm the hypothesis: LLMs can achieve better performance with text containing hallucinations. Notably, Llama-3.1-8B achieves an 18.35% gain in ROC-AUC compared to the baseline without hallucination. Furthermore, hallucinations generated by GPT-4o provide the most consistent improvements across models. Additionally, we conduct empirical analyses and a case study to investigate key factors affecting performance and the underlying reasons. Our research sheds light on the potential use of hallucinations for LLMs and offers new perspectives for future research leveraging LLMs in drug discovery.

View arXiv page View PDF Add to collection

Community

shuzyuan

Paper author Paper submitter 5 days ago

Is hallucination always bad? 🤔
We tested 7 large language models (LLMs) and discovered that hallucinations can actually enhance LLM performance in drug discovery! 💊✨

Check out our findings to learn how hallucinations might not always be a drawback—but an advantage in certain applications! 🚀

browniepoints

5 days ago

Contrary to popular belief, hallucinations will exist-- rather structural hallucinations will always persist. I call it the gradient mismatch between the formal and informal world. This is the first line of research I've seen thats recognize what hallucinations are in actuality, computational imagination. The same as human imagination, it must be filtered and "grounded". Great work guys. These are important insights.

shuzyuan

Paper author 5 days ago

Thank you!

Ddz5431

5 days ago

Excellent findings!
It's also a pleasure to read the paper, very well written 😀

librarian-bot

5 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

andrew-cartwheel

5 days ago

Did you mean the Mistral 8b? Throughout the paper you refer to a Ministral model, which I think might be a typo

shuzyuan

Paper author 4 days ago

•

edited 4 days ago

Thanks for checking our paper! Just to clarify, “Ministral” is not a typo. The full name is mistralai/Ministral-8B-Instruct-2410: https://huggingface.co/mistralai/Ministral-8B-Instruct-2410.

We've actually listed all the model names and links in Appendix A, Table 2. :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2501.13824 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2501.13824 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2501.13824 in a Space README.md to link it from this page.

Hallucinations Can Improve Large Language Models in Drug Discovery

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 1