TD-HallOumi-3B: Llama 3.2 3B for Hallucination Detection / Claim Verification
This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct
specifically adapted for Claim Verification and Hallucination Detection. It assesses whether claims made in a response are supported by a given context document.
This work is inspired by and utilizes datasets developed for the HallOumi project by Oumi AI, which aims to build trust in AI systems by enabling verifiable outputs. This 3B parameter model is provided by the TEEN-DIFFERENT community.
Performance
Evaluated on the oumi-ai/oumi-groundedness-benchmark for Hallucination Detection (Macro F1 Score):
- TD-HallOumi-3B* achieves 68.00% Macro F1.
- Highly Efficient: This 3B parameter model outperforms larger models like Open AI o1, Llama 3.1 405B and Gemini 1.5 Pro.
- Competitive: Ranks closely behind Claude Sonnet 3.5 (69.60%).
This model offers strong hallucination detection capabilities with significantly fewer parameters than many alternatives.
Model Details
- Base Model: meta-llama/Llama-3.2-3B-Instruct
- Fine-tuning Task: Given a context document and a response (containing one or more claims), the model predicts whether each claim is
<|supported|>
or<|unsupported|>
by the context. - Model Output Format: The model is trained to output specific tags (
<|supported|>
or<|unsupported|>
) indicating the verification status of claims presented to it within a structured prompt format (see How to Use section). - Language: English
Training Data
This model was fine-tuned using Supervised Fine-Tuning (SFT) on a mixture of datasets curated by Oumi AI, designed for claim verification tasks:
- oumi-ai/oumi-anli-subset: Based on ANLI prompts with responses generated by Llama-3.1-405B. (License: CC-BY-NC-4.0)
- oumi-ai/oumi-c2d-d2c-subset: Based on C2D-and-D2C-MiniCheck prompts with responses generated by Llama-3.1-405B. (License: Llama 3.1 Community License)
- oumi-ai/oumi-synthetic-claims: Synthetically generated claims and verification labels based on documents, using Llama-3.1-405B. (License: Llama 3.1 Community License)
- oumi-ai/oumi-synthetic-document-claims: Synthetically generated documents, requests (QA, Summarization), responses, and verification labels using Llama-3.1-405B. Details the prompt structure involving
<document>
,<request>
, and<|supported|>
/<|unsupported|>
tags. (License: Llama 3.1 Community License)
The combined training data utilizes the messages
column formatted for conversational SFT.
Training Procedure
- Framework: TRL (Transformer Reinforcement Learning library) SFT Trainer.
- Adapter Method: Low-Rank Adaptation (LoRA) was used during the fine-tuning process with the following parameters:
lora_r
: 64lora_alpha
: 128lora_dropout
: 0.05lora_target_modules
:q_proj
,k_proj
,v_proj
,o_proj
,gate_proj
,up_proj
,down_proj
- Final Model: Although trained with LoRA, the final saved model artifact hosted here contains the fully merged weights, incorporating the LoRA adaptations into the base model for easier deployment.
- Key Hyperparameters:
- Epochs: 1
- Learning Rate: 4.0e-5 (Cosine schedule with 100 warmup steps)
- Optimizer: AdamW (fused)
- Batch Size (per device): 2
- Gradient Accumulation Steps: 8 (Effective Batch Size = 16 * num_devices)
- Weight Decay: 0.01
- Max Sequence Length: 8192
- Precision:
bfloat16
- Gradient Checkpointing: Enabled (
use_reentrant=False
)
- Tokenizer: The base Llama 3.2 tokenizer was used, with a special pad token
<|finetune_right_pad_id|>
added during training. The tokenizer files included in this repository reflect this.
The full training configuration can be found found at: Github
Evaluation
- Benchmark: The model's performance on claim verification can be evaluated using the oumi-ai/oumi-groundedness-benchmark. This benchmark was developed by Oumi AI specifically for evaluating hallucination detection models and includes diverse documents, requests, and AI-generated responses with verification labels.
- Metrics: Standard metrics for this task include Macro F1 score and Balanced Accuracy between the "SUPPORTED" (0) and "UNSUPPORTED" (1) classes.
- Reference Performance: The Oumi AI HallOumi-8B model achieved 77.2% Macro F1 on this benchmark. Performance of this 3B model may vary.
The full training configuration can be found found at: Github
Intended Use
This model is designed for claim verification against provided context documents. Its primary use case is detecting hallucinations or unsupported statements in text generated by LLMs (or human-written text) when compared against a source document.
It is not intended as a general-purpose chatbot or for tasks outside of groundedness verification.
Limitations and Bias
Inherited Bias: As a fine-tune of meta-llama/Llama-3.2-3B-Instruct, this model may inherit biases present in the base model related to its training data.
Synthetic Data Bias: The fine-tuning data, largely generated using Llama-3.1-405B-Instruct, may contain biases or limitations characteristic of the generating model.
Specificity: The model is specialized for the claim verification task format it was trained on. Performance may degrade on significantly different prompt structures or tasks.
Context Dependence: Verification accuracy is entirely dependent on the quality and relevance of the provided context document. The model cannot verify claims against general world knowledge not present in the context.
Subtlety: While trained with subtlety in mind (as per the HallOumi project goals), complex or highly nuanced claims might still be challenging to verify correctly.
Citation
If you use this model, please consider citing the base model, the datasets, the Oumi AI HallOumi project, and this specific fine-tuned model artifact:
This Fine-tuned Model (TD-HallOumi-3B):
@misc{teen_d_halloumi_3b_2024,
author = {Tarun Reddi},
title = {TD-HallOumi-3B: Fine-tuned Llama-3.2-3B-Instruct for Claim Verification},
month = {April},
year = {2025},
url = {\url{https://huggingface.co/TEEN-D/TD-HallOumi-3B}}
}
Base Model:
@misc{meta2024llama32,
title = {Introducing Llama 3.2: The Next Generation of Open Weights AI Models},
author = {Meta AI},
year = {2024},
url = {https://ai.meta.com/blog/llama-3-2-ai-models/}
}
Datasets:
@misc{oumiANLISubset,
author = {Jeremiah Greer},
title = {Oumi ANLI Subset},
month = {March},
year = {2025},
url = {https://huggingface.co/datasets/oumi-ai/oumi-anli-subset}
}
@misc{oumiC2DAndD2CSubset,
author = {Jeremiah Greer},
title = {Oumi C2D and D2C Subset},
month = {March},
year = {2025},
url = {https://huggingface.co/datasets/oumi-ai/oumi-c2d-d2c-subset}
}
@misc{oumiSyntheticClaims,
author = {Jeremiah Greer},
title = {Oumi Synthetic Claims},
month = {March},
year = {2025},
url = {https://huggingface.co/datasets/oumi-ai/oumi-synthetic-claims}
}
@misc{oumiSyntheticDocumentClaims,
author = {Jeremiah Greer},
title = {Oumi Synthetic Document Claims},
month = {March},
year = {2025},
url = {https://huggingface.co/datasets/oumi-ai/oumi-synthetic-document-claims}
}
@misc{oumiGroundednessBenchmark,
author = {Jeremiah Greer},
title = {Oumi Groundedness Benchmark},
month = {March},
year = {2025},
url = {https://huggingface.co/datasets/oumi-ai/oumi-groundedness-benchmark}
}
Oumi Platform & HallOumi Project:
@software{oumi2025,
author = {Oumi Community},
title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
month = {January},
year = {2025},
url = {https://github.com/oumi-ai/oumi}
}
@article{halloumi2025,
author = {Greer, Jeremiah and Koukoumidis, Manos and Aisopos, Konstantinos and Schuler, Michael},
title = {Introducing HallOumi: A State-of-the-Art Claim-Verification Model},
journal = {Oumi AI Blog},
year = {2025},
month = {April},
url = {https://oumi.ai/blog/posts/introducing-halloumi}
}
- Downloads last month
- 6