README.md · GM07/medhal-1b-base at main

metadata

license: apache-2.0
datasets:
  - GM07/medhal
language:
  - en
base_model:
  - meta-llama/Llama-3.2-1B
tags:
  - medical
  - hallucination
  - evaluator

Description

This model was trained to detect hallucinated content in medical texts. Based on the Llama-3.2-1B model, it was fine-tuned using the MedHal dataset. Given a context and a statement, the model predicts whether the statement is factual or not and generates an explanation if it's not factual. The context is optional as the statement can be about the general medical knowledge.

Prompt format

The model uses the following prompt format for generation:

### Task Description
- You will evaluate whether a medical statement is factually accurate.
- The statement may reference a provided context.
- Respond with "YES" if the statement is factually correct or "NO" if it contains inaccuracies.
- In order to answer YES, everything in the statement must be supported by the context.
- In order to answer NO, there must be at least one piece of information in the statement that is not supported by the context.

### Context
{context}

### Statement
{statement}

### Factual

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer('GM07/medhal-1b-base')
model = AutoModelForCausalLM('GM07/medhal-1b-base')

context = 'The patient is a 80 years old man who was admitted for heart problems.'
statement = 'The patient is 16 years old.'

prompt = f"""### Task Description
- You will evaluate whether a medical statement is factually accurate.
- The statement may reference a provided context.
- Respond with "YES" if the statement is factually correct or "NO" if it contains inaccuracies.
- In order to answer YES, everything in the statement must be supported by the context.
- In order to answer NO, there must be at least one piece of information in the statement that is not supported by the context.

### Context
{context}

### Statement
{statement}

### Factual
"""

inputs = tokenizer([prompt], padding=True, truncation=True, return_tensors='pt')
inputs = {k: v.to(model.device) for k, v in inputs.items()}

results = model.generate(**inputs, max_new_tokens=128)
output = tokenizer.batch_decode(results)
print(output)

Citation

If you find this model useful in your work, please cite the model as follows:

@misc{mehenni2025medhalevaluationdatasetmedical,
      title={MedHal: An Evaluation Dataset for Medical Hallucination Detection}, 
      author={Gaya Mehenni and Amal Zouaq},
      year={2025},
      eprint={2504.08596},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.08596}, 
}