File size: 8,499 Bytes
c23b92c 971a8ad c23b92c 8e99c13 e5970b2 c23b92c 8e99c13 c23b92c 6b81ec8 c23b92c bb6848e c23b92c 798e6c1 c23b92c a23316a 3056822 a23316a c23b92c bb6848e c509cc3 267d375 6b81ec8 c23b92c bb6848e 8320f09 c23b92c bb6848e c23b92c bb6848e 8320f09 bb6848e c23b92c bb6848e c23b92c bb6848e 6b81ec8 bb6848e c23b92c 6b81ec8 c23b92c bb6848e fa4488b bb6848e 3b13119 c23b92c bb6848e c23b92c bb6848e c23b92c bb6848e c23b92c 713fb29 c23b92c bb6848e c23b92c bb6848e c23b92c bb6848e c23b92c bb6848e c23b92c 3b13119 c23b92c 8a61b59 3b13119 6b81ec8 c23b92c bb6848e c23b92c bb6848e c23b92c bb6848e 3d75a66 bb6848e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
---
library_name: transformers
license: cc-by-nc-4.0
datasets:
- oumi-ai/oumi-anli-subset
- oumi-ai/oumi-c2d-d2c-subset
- oumi-ai/oumi-synthetic-claims
- oumi-ai/oumi-synthetic-document-claims
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
[](https://github.com/oumi-ai/oumi)
[](https://github.com/oumi-ai/oumi)
[](https://oumi.ai/docs/en/latest/index.html)
[](https://oumi.ai/blog)
[](https://discord.gg/oumi)
# oumi-ai/HallOumi-8B
<!-- Provide a quick summary of what the model is/does. -->
Introducing **HallOumi-8B**, a **SOTA hallucination detection model**, outperforming DeepSeek R1, OpenAI o1, Google Gemini 1.5 Pro, and Claude Sonnet 3.5 at only **8 billion parameters!**
Give HallOumi a try now!
* Demo: https://oumi.ai/halloumi-demo
* Github: https://github.com/oumi-ai/oumi/tree/main/configs/projects/halloumi
| Model | Macro F1 Score | Open? | Model Size |
| --------------------- | -------------- | ----------------- | ---------- |
| **HallOumi-8B** | **77.2% ± 2.2%** | Truly Open Source | 8B |
| Claude Sonnet 3.5 | 69.6% ± 2.8% | Closed | ?? |
| OpenAI o1-preview | 65.9% ± 2.3% | Closed | ?? |
| DeepSeek R1 | 61.6% ± 2.5% | Open Weights | 671B |
| Llama 3.1 405B | 58.8% ± 2.4% | Open Weights | 405B |
| Google Gemini 1.5 Pro | 48.2% ± 1.8% | Closed | ?? |
**HallOumi**, the hallucination detection model built with Oumi, is a system built specifically to enable per-sentence verification of any content (either AI or human-generated) with **sentence-level citations** and **human-readable explanations.**
For example, when given one or more context documents, as well as an AI-generated summary, HallOumi goes through every claim being made in the summary and identifies:
* A determination whether that particular statement is **supported or unsupported** by the provided context combined with a **confidence score**.
* The **relevant context sentences** associated with that claim to facilitate human review.
* An **explanation** describing why a particular claim is supported or unsupported to boost human review accuracy. Some hallucinations may be nuanced and hard for humans to catch without help.
## Hallucinations
Hallucinations are often cited as the most important issue with being able to deploy generative models in numerous commercial and personal applications, and for good reason:
* [Lawyers sanctioned for briefing where ChatGPT cited 6 fictitious cases](https://www.reuters.com/legal/new-york-lawyers-sanctioned-using-fake-chatgpt-cases-legal-brief-2023-06-22/)
* [Air Canada required to honor refund policy made up by its AI support chatbot](https://www.wired.com/story/air-canada-chatbot-refund-policy/)
* [AI suggesting users should make glue pizza and eat rocks](https://www.bbc.com/news/articles/cd11gzejgz4o)
It ultimately comes down to an issue of **trust** — generative models are trained to produce outputs which are **probabilistically likely**, but not necessarily **true**.
While such tools are useful in the right hands, being unable to trust them prevents AI from being adopted more broadly,
where it can be utilized safely and responsibly.
## Building Trust with Verifiability
To be able to begin trusting AI systems, we have to be able to verify their outputs. To verify, we specifically mean that we need to:
* Understand the **truthfulness** of a particular statement produced by any model.
* Understand what **information supports that statement’s truth** (or lack thereof).
* Have **full traceability** connecting the statement to that information.
Missing any one of these aspects results in a system that cannot be verified and therefore cannot be trusted.
However, this is not enough, as we have to be capable of doing these things in a way that is **meticulous**, **scalable**, and **human-readable**.
With explanations, confidence scores, and citations, all at an affordable model size, HallOumi takes us towards a more grounded, trustworthy future for AI.
- **Developed by:** [Oumi AI](https://oumi.ai/)
- **Model type:** Small Language Model
- **Language(s) (NLP):** English
- **License:** [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en) (due to ANLI data falling under the same license)
- **Finetuned from model:** [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- **Demo:** [HallOumi Demo](https://oumi.ai/halloumi-demo)
---
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Use to verify claims/detect hallucinations in scenarios where a known source of truth is available.
Demo: https://oumi.ai/halloumi-demo
Example prompt:
```
EXAMPLE_CONTEXT = """<|context|><|s1|><This is sentence 1 of the document.><end||s><|s2|><This is sentence 2 of the document.><end||s><end||context>"""
EXAMPLE_REQUEST = """<|request|><Make one or more claims about information in the documents.><end||request>"""
EXAMPLE_RESPONSE = """<|response|><|r1|><This is sentence 1 of the claims/response.><end||r><|r2|><This is sentence 2 of the claims/response.><end||r><end||response>"""
messages = [
{'role': 'user', 'content': f"{EXAMPLE_CONTEXT}{EXAMPLE_REQUEST}{EXAMPLE_RESPONSE}",
]
```
## Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
Smaller LLMs have limited capabilities and should be used with caution. Avoid using this model for purposes outside of claim verification.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
This model was finetuned with Llama-3.1-405B-Instruct data on top of a Llama-3.1-8B-Instruct model, so any biases or risks associated with those models may be present.
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Training data:
- [oumi-ai/oumi-synthetic-document-claims](https://huggingface.co/datasets/oumi-ai/oumi-synthetic-document-claims)
- [oumi-ai/oumi-synthetic-claims](https://huggingface.co/datasets/oumi-ai/oumi-synthetic-claims)
- [oumi-ai/oumi-anli-subset](https://huggingface.co/datasets/oumi-ai/oumi-anli-subset)
- [oumi-ai/oumi-c2d-d2c-subset](https://huggingface.co/datasets/oumi-ai/oumi-c2d-d2c-subset)
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
For information on training, see https://oumi.ai/halloumi
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
Follow along with our notebook on how to evaluate hallucination with HallOumi and other popular models:
https://github.com/oumi-ai/oumi/blob/main/configs/projects/halloumi/halloumi_eval_notebook.ipynb
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
- **Hardware Type:** H100
- **Hours used:** 32 (4 * 8 GPUs)
- **Cloud Provider:** Google Cloud Platform
- **Compute Region:** us-east5
- **Carbon Emitted:** 2.8 kg
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
```
@misc{oumiHalloumi8B,
author = {Jeremy Greer, Konstantinos Aisopos, Panos Achlioptas, Michael Schuler, Oussama Elachqar, Emmanouil Koukoumidis},
title = {HallOumi-8B},
month = {March},
year = {2025},
url = {https://huggingface.co/oumi-ai/HallOumi-8B}
}
@software{oumi2025,
author = {Oumi Community},
title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
month = {January},
year = {2025},
url = {https://github.com/oumi-ai/oumi}
}
``` |