--- license: cc-by-nc-4.0 tags: - llama3.2 - hallucination-detection - claim-verification - groundedness - oumi - sft - teen-d language: en pipeline_tag: text-classification base_model: meta-llama/Llama-3.2-3B-Instruct datasets: - oumi-ai/oumi-anli-subset - oumi-ai/oumi-c2d-d2c-subset - oumi-ai/oumi-synthetic-claims - oumi-ai/oumi-synthetic-document-claims metrics: - accuracy - f1 --- # TD-HallOumi-3B: Llama 3.2 3B for Hallucination Detection / Claim Verification This model is a fine-tuned version of `meta-llama/Llama-3.2-3B-Instruct` specifically adapted for **Claim Verification** and **Hallucination Detection**. It assesses whether claims made in a response are supported by a given context document. This work is inspired by and utilizes datasets developed for the [HallOumi project by Oumi AI](https://oumi.ai/blog/posts/introducing-halloumi), which aims to build trust in AI systems by enabling verifiable outputs. This 3B parameter model is provided by the **TEEN-DIFFERENT** community. ## Performance Evaluated on the [oumi-ai/oumi-groundedness-benchmark](https://huggingface.co/datasets/oumi-ai/oumi-groundedness-benchmark) for Hallucination Detection (Macro F1 Score): ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f8f07735a0a9fc54553c67/PkpHwSBoNxNFgE5KJr08y.png) * **TD-HallOumi-3B\*** achieves **68.00%** Macro F1. * **Highly Efficient:** This 3B parameter model outperforms larger models like Open AI o1, Llama 3.1 405B and Gemini 1.5 Pro. * **Competitive:** Ranks closely behind Claude Sonnet 3.5 (69.60%). This model offers strong hallucination detection capabilities with significantly fewer parameters than many alternatives. ## Model Details * **Base Model:** [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) * **Fine-tuning Task:** Given a context document and a response (containing one or more claims), the model predicts whether each claim is `<|supported|>` or `<|unsupported|>` by the context. * **Model Output Format:** The model is trained to output specific tags (`<|supported|>` or `<|unsupported|>`) indicating the verification status of claims presented to it within a structured prompt format (see How to Use section). * **Language:** English ## Training Data This model was fine-tuned using Supervised Fine-Tuning (SFT) on a mixture of datasets curated by Oumi AI, designed for claim verification tasks: * **[oumi-ai/oumi-anli-subset](https://huggingface.co/datasets/oumi-ai/oumi-anli-subset):** Based on ANLI prompts with responses generated by Llama-3.1-405B. (License: CC-BY-NC-4.0) * **[oumi-ai/oumi-c2d-d2c-subset](https://huggingface.co/datasets/oumi-ai/oumi-c2d-d2c-subset):** Based on C2D-and-D2C-MiniCheck prompts with responses generated by Llama-3.1-405B. (License: Llama 3.1 Community License) * **[oumi-ai/oumi-synthetic-claims](https://huggingface.co/datasets/oumi-ai/oumi-synthetic-claims):** Synthetically generated claims and verification labels based on documents, using Llama-3.1-405B. (License: Llama 3.1 Community License) * **[oumi-ai/oumi-synthetic-document-claims](https://huggingface.co/datasets/oumi-ai/oumi-synthetic-document-claims):** Synthetically generated documents, requests (QA, Summarization), responses, and verification labels using Llama-3.1-405B. Details the prompt structure involving ``, ``, and `<|supported|>`/`<|unsupported|>` tags. (License: Llama 3.1 Community License) The combined training data utilizes the `messages` column formatted for conversational SFT. ## Training Procedure * **Framework:** TRL (Transformer Reinforcement Learning library) SFT Trainer. * **Adapter Method:** Low-Rank Adaptation (LoRA) was used during the fine-tuning process with the following parameters: * `lora_r`: 64 * `lora_alpha`: 128 * `lora_dropout`: 0.05 * `lora_target_modules`: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` * **Final Model:** Although trained with LoRA, the **final saved model artifact hosted here contains the fully merged weights**, incorporating the LoRA adaptations into the base model for easier deployment. * **Key Hyperparameters:** * Epochs: 1 * Learning Rate: 4.0e-5 (Cosine schedule with 100 warmup steps) * Optimizer: AdamW (fused) * Batch Size (per device): 2 * Gradient Accumulation Steps: 8 (Effective Batch Size = 16 * num_devices) * Weight Decay: 0.01 * Max Sequence Length: 8192 * Precision: `bfloat16` * Gradient Checkpointing: Enabled (`use_reentrant=False`) * **Tokenizer:** The base Llama 3.2 tokenizer was used, with a special pad token `<|finetune_right_pad_id|>` added during training. The tokenizer files included in this repository reflect this. The full training configuration can be found found at: [Github](https://github.com/REDDITARUN/halloumi-3b) ## Evaluation * **Benchmark:** The model's performance on claim verification can be evaluated using the [oumi-ai/oumi-groundedness-benchmark](https://huggingface.co/datasets/oumi-ai/oumi-groundedness-benchmark). This benchmark was developed by Oumi AI specifically for evaluating hallucination detection models and includes diverse documents, requests, and AI-generated responses with verification labels. * **Metrics:** Standard metrics for this task include Macro F1 score and Balanced Accuracy between the "SUPPORTED" (0) and "UNSUPPORTED" (1) classes. * **Reference Performance:** The [Oumi AI HallOumi-8B model](https://huggingface.co/oumi-ai/HallOumi-8B) achieved 77.2% Macro F1 on this benchmark. Performance of this 3B model may vary. The full training configuration can be found found at: [Github](https://github.com/REDDITARUN/halloumi-3b) ## Intended Use This model is designed for claim verification against provided context documents. Its primary use case is detecting hallucinations or unsupported statements in text generated by LLMs (or human-written text) when compared against a source document. It is not intended as a general-purpose chatbot or for tasks outside of groundedness verification. ### Limitations and Bias - Inherited Bias: As a fine-tune of meta-llama/Llama-3.2-3B-Instruct, this model may inherit biases present in the base model related to its training data. - Synthetic Data Bias: The fine-tuning data, largely generated using Llama-3.1-405B-Instruct, may contain biases or limitations characteristic of the generating model. - Specificity: The model is specialized for the claim verification task format it was trained on. Performance may degrade on significantly different prompt structures or tasks. - Context Dependence: Verification accuracy is entirely dependent on the quality and relevance of the provided context document. The model cannot verify claims against general world knowledge not present in the context. - Subtlety: While trained with subtlety in mind (as per the HallOumi project goals), complex or highly nuanced claims might still be challenging to verify correctly. ## Citation If you use this model, please consider citing the base model, the datasets, the Oumi AI HallOumi project, and this specific fine-tuned model artifact: **This Fine-tuned Model (TD-HallOumi-3B):** ```bibtex @misc{teen_d_halloumi_3b_2024, author = {Tarun Reddi and Teen Different}, title = {TD-HallOumi-3B: Fine-tuned Llama-3.2-3B-Instruct for Claim Verification}, month = {April}, year = {2025}, url = {\url{https://huggingface.co/TEEN-D/TD-HallOumi-3B}} } ``` Base Model: ``` @misc{meta2024llama32, title = {Introducing Llama 3.2: The Next Generation of Open Weights AI Models}, author = {Meta AI}, year = {2024}, url = {https://ai.meta.com/blog/llama-3-2-ai-models/} } ``` Datasets: ``` @misc{oumiANLISubset, author = {Jeremiah Greer}, title = {Oumi ANLI Subset}, month = {March}, year = {2025}, url = {https://huggingface.co/datasets/oumi-ai/oumi-anli-subset} } @misc{oumiC2DAndD2CSubset, author = {Jeremiah Greer}, title = {Oumi C2D and D2C Subset}, month = {March}, year = {2025}, url = {https://huggingface.co/datasets/oumi-ai/oumi-c2d-d2c-subset} } @misc{oumiSyntheticClaims, author = {Jeremiah Greer}, title = {Oumi Synthetic Claims}, month = {March}, year = {2025}, url = {https://huggingface.co/datasets/oumi-ai/oumi-synthetic-claims} } @misc{oumiSyntheticDocumentClaims, author = {Jeremiah Greer}, title = {Oumi Synthetic Document Claims}, month = {March}, year = {2025}, url = {https://huggingface.co/datasets/oumi-ai/oumi-synthetic-document-claims} } @misc{oumiGroundednessBenchmark, author = {Jeremiah Greer}, title = {Oumi Groundedness Benchmark}, month = {March}, year = {2025}, url = {https://huggingface.co/datasets/oumi-ai/oumi-groundedness-benchmark} } ``` Oumi Platform & HallOumi Project: ``` @software{oumi2025, author = {Oumi Community}, title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models}, month = {January}, year = {2025}, url = {https://github.com/oumi-ai/oumi} } @article{halloumi2025, author = {Greer, Jeremiah and Koukoumidis, Manos and Aisopos, Konstantinos and Schuler, Michael}, title = {Introducing HallOumi: A State-of-the-Art Claim-Verification Model}, journal = {Oumi AI Blog}, year = {2025}, month = {April}, url = {https://oumi.ai/blog/posts/introducing-halloumi} } ```